I have two dataframes: df1:
id name city age found_in_df2
2 john ny 20
4 maria lima 9
9 susan cana 17
df2:
comment question response
i'm john are you there yes, i am
hello! jajajajaja go ahead!
please, go on hello susan no, i don't
maria i'm 9 years sure
I want to find values from column called 'name' from df1 that exist in any column in df2 (this value could be in any column and even inside a phrase). If the value is found, so I want to write 'yes' in column 'found_in_df2' of df1. Someone could help me with that?
CodePudding user response:
One approach could be as follows.
- Use
df.stack
to turndf2
into apd.Series
and combine withstr.contains
.
df1['found_in_df2'] = df1.name.apply(lambda x: any(df2.stack().str.contains(x)))\
.map({True:'yes',False:'no'})
print(df1)
id name city age found_in_df2
0 2 john ny 20 yes
1 4 maria lima 9 yes
2 9 susan cana 17 yes
CodePudding user response:
The below seems to solve the problem
import pandas as pd
def find_value(data, phrase):
for col in data.columns:
if any(data[col].str.contains(phrase)):
return 'yes'
return 'no'
df1 = pd.DataFrame([[2, 'john', 'ny', 20],
[4, 'maria', 'lima', 9],
[9, 'susan', 'cana', 17]],
columns=['id', 'name', 'city', 'age'])
df2 = pd.DataFrame([["i'm john", "are you there", "yes, i am"],
["hello!", "jajajajaja", "go ahead!"],
["please, go on", "hello susan", "no, i don't"],
["maria", "i'm 9 years", "sure"]],
columns=['comment', 'question', 'response'])
df1['found_in_df2'] = df1.name.apply(lambda x: find_value(df2, x))
CodePudding user response:
Use:
df1['match'] =\
df1['name'].str.contains('|'.join(df2.stack().tolist()).map({True:'yes', False:'no'})