mark one dataframe if pattern found in another dataframe-CodePudding

I have two dataframes, and I want to mark the second one if the first one contains a pattern. Very large of rows (>10000's)

date      | items 
20100605  | apple is red 
20110606  | orange is orange 
20120607  | apple is green

B: shorter with a few hundred rows.

id   |  color
123  |  is Red
234  |  not orange
235  |  is green

Result would be to flag all columns in B if pattern found in A, possibly adding a column to B like

B:
id   |  color       | found
123  |  is Red      | true
234  |  not orange  | false
235  |  is green    | true

thinking of something like, dfB['found'] = dfB['color'].isin(dfA['items']) but don't see any way to ignore case. Also, with this approach it will change true to false. Don't want to change those which are already set true. Also, I believe it's inefficient to loop large dataframes more than once. Running through A once and marking B would be better way but not sure how to achieve that using isin(). Any other ways? Especially ignoring case sensitivity of pattern.

CodePudding user response：

You can use something like this:

df2['check'] = df2['color'].apply(lambda x: True if any(x.casefold() in i.casefold() for i in df['items']) else False)

or you can use str.contains:

df2['check'] = df2['color'].str.contains('|'.join(df['items'].str.split(" ").str[1]   ' '   df['items'].str.split(" ").str[2]),case=False)

#get second and third words