Return True or False if values in a column contain elements from a list of strings R-CodePudding

I want to check if there is a close match between column values and a list of strings. There is rarely a perfect match so %in% is no good. I'd rather err on the side of caution than miss something, however I'd rather avoid matching potential patterns within each individual word

For example

List:

Tenis PLC
Green Company Limited
(DCC) Darth Company Creditors

Dataframe

ID.  Company Name
10.  Ten LTD
12.  Green Company (GC) LTD
23   MCC
48.  DARTH

Return

False
True
False
True

EDIT: I should mention I have now cleaned the data a little to make it all lowercase and remove any brackets

CodePudding user response：

To regenerate your data:

l = list(tolower(c('Tenis PLC',
     'Green Company Limited',
     '(DCC) Darth Company Creditors')))

tmp_df = data.frame(Company_Name=c(tolower(c('Ten LTD', 'Green Company (GC) LTD', 'MCC',
        'DARTH'))))

Solution:

Get all the substring divided by space:

    split1 = unlist(strsplit(unlist(l), ' '))

Find whether or not any of the values in Company_name contains them (assuming this is what you meant):

    sapply(tmp_df$Company_Name,
           function(x) {sum(unlist(strsplit(x, ' ')) %in% split1) >= 1})

EDIT:

To keep items in split1 with at least 3 characters:

    split1[sapply(split1, nchar) > 3]