Home > database >  How to make list of words that are not in another dataframe
How to make list of words that are not in another dataframe

Time:01-03

I have a problem with python pandas dataframe problem. I have two dataframes with different contents. I want to output words that are not in dataframe 2 and store them on a new dataframe. Can someone help me in solving this problem using python pandas dataframe? Thankyouu...

Where dataframe 1 contains:
Tweet
Bismillah for tomorrow Amin
shared location
Replying to shahrilPng
It's time to finish what's been pending
up and parallel
When you run after your dream

And dataframe 2 contains:
Words
tomorrow
shared
location
time
finish
pending
parallel
run
after
dream

The output that i want
Results
Bismillah
for
Amin
Replying
to
shahrilPng
etc

CodePudding user response:

one way would be to turn the dataframes to a flatten set, find the differences and put them into a dtaframe

import pandas as pd
import numpy as np
df1_set = set(np.ravel(df1.values))
df2_set = set(np.ravel(df2.values))
pd.DataFrame(df1_set - df2_set).dropna()  

CodePudding user response:

Split and explode your tweets dataframe and check if each words is present in your words dataframe:

# check function
not_in_list = lambda x: ~x.str.casefold().isin(df2['Words'].str.casefold())

out = df1['Tweet'].str.split().explode().loc[not_in_list] \
                  .drop_duplicates().reset_index(drop=True).to_frame('Results')
print(out)

# Output
       Results
0    Bismillah
1          for
2         Amin
3     Replying
4           to
5   shahrilPng
6         It's
7       what's
8         been
9           up
10         and
11        When
12         you
13        your
  • Related