Home > front end >  How to first filter the dataframe and then match with the same dataframe in pandas?
How to first filter the dataframe and then match with the same dataframe in pandas?

Time:10-06

I have two dataframes. df1 & df2. I want to filter df2 based on the values of df1 and then count the frequency of values and compare with df1 again.

Example:

df1:

Project_Number
S100
S100
S200
S300
S300
S300
S400
S400

df2:

Project_Number
S100
S200
S200
S300
S300
S300
S500

Now first filter the df2 based on the values of df1. Keep only those values which is present in df1 also.

df2_new:

Project_Number
S100
S200
S200
S300
S300
S300

Now take the frequency of both the dataframes-

df1['Count'] = df1['Project_Number'].map(df1['Project_Number'].value_counts())
df2_new['Count'] = df2_new['Project_Number'].map(df2_new['Project_Number'].value_counts())

df1-                              df2_new
Project_Number  Count             Project_Number   Count
S100            2                 S100             1
S200            1                 S200             2
S300            3                 S300             3
S400            2

Now take the difference between above 2 dataframes and print the result-

df_difference-

Project_Number  
S100           
S200
S400  

CodePudding user response:

To filter, try using isin

df2_new = df2[df2["Project_Number"].isin(df1["Project_Number"].unique().tolist())]

CodePudding user response:

Use Series.isin for filtering, then compare Series from Series.value_counts for not equal and convert sorted index to one column DataFrame:

df2_new = df2.loc[df2["Project_Number"].isin(df1["Project_Number"]),'Project_Number']

s1 = df1['Project_Number'].value_counts()
s2 = df2_new.value_counts()

df_difference = s1[s1.ne(s2)].sort_index().index.to_frame(name='Project_Number',index=False)
print (df_difference)
  Project_Number
0           S100
1           S200
2           S400
  • Related