Here's my dataframe
Id Column_1 Column_2
1 United States United Tractor
2 Love of Fair Tales of Grim
3 Hotel Marriot Jakarta Marriot Hotel Jakarta
Here's my expected output
Id Column_1 Column_2 Word
1 United States United Tractor united
2 Love of Fair Tales of Grim of
3 Hotel Marriot Jakarta Marriot Hotel Jakarta hotel marriot
Data:
{'Id': [1, 2, 3],
'Column_1': ['United States', 'Love of Fair', 'Hotel Marriot Jakarta'],
'Column_2': ['United Tractor', 'Tales of Grim', 'Marriot Hotel Jakarta']}
CodePudding user response:
One option is to use set.intersection
in a list comprehension:
df['Word'] = [' '.join(set(x.lower().split()) & set(y.lower().split())) for x,y in zip(df['Column_1'], df['Column_2'])]
Another option is to stack
the columns; then use groupby.apply
on a lambda where we do set.intersection
:
df['Word'] = (df[['Column_1', 'Column_2']].stack().str.lower().str.split()
.groupby(level=0).apply(lambda x: ' '.join(set(x.iat[0]) & set(x.iat[1]))))
Output:
Id Column_1 Column_2 Word
0 1 United States United Tractor united
1 2 Love of Fair Tales of Grim of
2 3 Hotel Marriot Jakarta Marriot Hotel Jakarta hotel marriot jakarta
CodePudding user response:
quite similar but a bit different solution:
df['Word'] = (df[['Column_1', 'Column_2']].
applymap(lambda x: set(x.lower().split())).
apply(lambda x: ' '.join(x.Column_1 & x.Column_2),1))
>>> df
'''
Id Column_1 Column_2 Word
0 1 United States United Tractor united
1 2 Love of Fair Tales of Grim of
2 3 Hotel Marriot Jakarta Marriot Hotel Jakarta hotel marriot jakarta