I have two dataframes. df1 & df2. I want to filter df2 based on the values of df1 and then count the frequency of values and compare with df1 again.
Example:
df1:
Project_Number
S100
S100
S200
S300
S300
S300
S400
S400
df2:
Project_Number
S100
S200
S200
S300
S300
S300
S500
Now first filter the df2 based on the values of df1. Keep only those values which is present in df1 also.
df2_new:
Project_Number
S100
S200
S200
S300
S300
S300
Now take the frequency of both the dataframes-
df1['Count'] = df1['Project_Number'].map(df1['Project_Number'].value_counts())
df2_new['Count'] = df2_new['Project_Number'].map(df2_new['Project_Number'].value_counts())
df1- df2_new
Project_Number Count Project_Number Count
S100 2 S100 1
S200 1 S200 2
S300 3 S300 3
S400 2
Now take the difference between above 2 dataframes and print the result-
df_difference-
Project_Number
S100
S200
S400
CodePudding user response:
To filter, try using isin
df2_new = df2[df2["Project_Number"].isin(df1["Project_Number"].unique().tolist())]
CodePudding user response:
Use Series.isin
for filtering, then compare Series
from Series.value_counts
for not equal and convert sorted index to one column DataFrame
:
df2_new = df2.loc[df2["Project_Number"].isin(df1["Project_Number"]),'Project_Number']
s1 = df1['Project_Number'].value_counts()
s2 = df2_new.value_counts()
df_difference = s1[s1.ne(s2)].sort_index().index.to_frame(name='Project_Number',index=False)
print (df_difference)
Project_Number
0 S100
1 S200
2 S400