I have two strings in dataframe. I need to check if any word from string2 is included in string1. ORIGINAL
I am using this split and it does not work.
((df['string1'].split()).eq(df['string2'].split())).any()
after running the code, it should return like this RESULT
I tried simple method and it works >>
TXT = 'HELLO WORLD, I AM TESTING'
TXT2 = 'HELLO TESTING'
X = TXT.split()
Y = TXT2.split()
any(i in X for i in Y)
--> python return "TRUE"
I don't know how to do this in dataframe and write additional column for the result
CodePudding user response:
If your dataframe isn't too big you can try with df.iterrows():
df = pd.DataFrame({'string1': ['Spam Ham Egg', 'Spam Bacon Spam'],
'string2': ['Beans Bacon Toast', 'Ham Egg Spam']})
result_list = []
for row in df.iterrows():
string1, string2 = row[1][0].split(), row[1][1].split()
result_list.append(any([element in string2 for element in string1]))
df.loc[:,'string_contained'] = result_list
CodePudding user response:
df['result'] = df.apply(lambda row: bool(set(row['string1'].split())
& set(row['string2'].split())),
axis=1)
Example:
df = pd.DataFrame({'string1': ['a b c', 'd e f', 'd e f'],
'string2': ['aa mn', 'a b c e', 'a b c ee']})
Result:
string1 string2 result
0 a b c aa mn False
1 d e f a b c e True
2 d e f a b c ee False