Home > database >  mark one dataframe if pattern found in another dataframe
mark one dataframe if pattern found in another dataframe

Time:01-06

I have two dataframes, and I want to mark the second one if the first one contains a pattern. Very large of rows (>10000's)

date      | items 
20100605  | apple is red 
20110606  | orange is orange 
20120607  | apple is green

B: shorter with a few hundred rows.

id   |  color
123  |  is Red
234  |  not orange
235  |  is green

Result would be to flag all columns in B if pattern found in A, possibly adding a column to B like

B:
id   |  color       | found
123  |  is Red      | true
234  |  not orange  | false
235  |  is green    | true

thinking of something like, dfB['found'] = dfB['color'].isin(dfA['items']) but don't see any way to ignore case. Also, with this approach it will change true to false. Don't want to change those which are already set true. Also, I believe it's inefficient to loop large dataframes more than once. Running through A once and marking B would be better way but not sure how to achieve that using isin(). Any other ways? Especially ignoring case sensitivity of pattern.

CodePudding user response:

You can use something like this:

df2['check'] = df2['color'].apply(lambda x: True if any(x.casefold() in i.casefold() for i in df['items']) else False)

or you can use str.contains:

df2['check'] = df2['color'].str.contains('|'.join(df['items'].str.split(" ").str[1]   ' '   df['items'].str.split(" ").str[2]),case=False)

#get second and third words
  • Related