I have a problem with python pandas dataframe problem. I have two dataframes with different contents. I want to output words that are not in dataframe 2 and store them on a new dataframe. Can someone help me in solving this problem using python pandas dataframe? Thankyouu...
Where dataframe 1 contains:
Tweet
Bismillah for tomorrow Amin
shared location
Replying to shahrilPng
It's time to finish what's been pending
up and parallel
When you run after your dream
And dataframe 2 contains:
Words
tomorrow
shared
location
time
finish
pending
parallel
run
after
dream
The output that i want
Results
Bismillah
for
Amin
Replying
to
shahrilPng
etc
CodePudding user response:
one way would be to turn the dataframes to a flatten set, find the differences and put them into a dtaframe
import pandas as pd
import numpy as np
df1_set = set(np.ravel(df1.values))
df2_set = set(np.ravel(df2.values))
pd.DataFrame(df1_set - df2_set).dropna()
CodePudding user response:
Split and explode your tweets
dataframe and check if each words is present in your words
dataframe:
# check function
not_in_list = lambda x: ~x.str.casefold().isin(df2['Words'].str.casefold())
out = df1['Tweet'].str.split().explode().loc[not_in_list] \
.drop_duplicates().reset_index(drop=True).to_frame('Results')
print(out)
# Output
Results
0 Bismillah
1 for
2 Amin
3 Replying
4 to
5 shahrilPng
6 It's
7 what's
8 been
9 up
10 and
11 When
12 you
13 your