Home > Software design >  Comparing and Count Values from 2 (or More) Different Pandas Dataframes Based on Certain Conditions
Comparing and Count Values from 2 (or More) Different Pandas Dataframes Based on Certain Conditions

Time:01-04

Assuming that I have monthly datasets showing like these:

df1

company date act_call act_visit po
A 2022-10-01 Yes No No
B 2022-10-01 Yes No Yes
C 2022-10-01 No No No
B 2022-10-02 No Yes No
A 2022-10-02 No Yes Yes

df2

company date act_call act_visit po
D 2022-11-01 Yes No No
B 2022-11-01 Yes No Yes
C 2022-11-01 Yes Yes No
D 2022-11-02 No Yes No
A 2022-11-02 No Yes Yes

I want to compare the two dataframes and count several conditions:

  1. the number of company that exists in both dataframes.

  2. the number of company that exists in both dataframes that has at least one act_call as 'Yes' and act_visit as 'Yes' in df2, but has po as 'No' in df1.

For the 1st condition, I've tried using pandas.Dataframe.sum() and pandas.Dataframe.count_values() but they didn't give the results that I want.

For the 2nd condition, I tried using this code:

(((df1[['act_calling', 'act_visit']].eq('yes'))&(df2['po'].eq('no'))).groupby(df2['company_name']).any().all(axis = 1).sum())

but, I'm not sure that the code above will only count the company that exists in both dataframes.

The expected output is this:

  1. 3, (A, B, C)

  2. 1, (C)

I'm open to any suggestions. Thank u in advance!

CodePudding user response:

companies that exists in both dfs

pd.merge(df1, df2, on="company", how="inner")['company'].unique()
Out[40]: array(['A', 'B', 'C'], dtype=object)
len(pd.merge(df1, df2, on="company", how="inner")['company'].unique())
Out[41]: 3

2nd part

df3 = pd.merge(df1, df2, on="company", how="inner")
df3[(df3.act_call_y == 'Yes') & (df3.act_visit_y == 'Yes') & (df3.po_x=='No')]
Out[54]: 
  company      date_x act_call_x act_visit_x po_x      date_y act_call_y act_visit_y po_y
4       C  01/10/2022         No          No   No  2022-11-01        Yes         Yes   No

len(df3[(df3.act_call_y == 'Yes') & (df3.act_visit_y == 'Yes') & (df3.po_x=='No')])
Out[56]: 1

CodePudding user response:

the number of company that exists in both dataframes.

set(df1['company']).intersection(df2['company'])
# {'A', 'B', 'C'}

len(set(df1['company']).intersection(df2['company']))
# 3

the number of company that exists in both dataframes that has at least one act_call as 'Yes' and act_visit as 'Yes' in df2, but has po as 'No' in df1.

s1 = df1['po'].eq('No').groupby(df1['company']).any()

d2 = {'act_call': 'Yes', 'act_visit': 'Yes'}
s2 = df2[list(d2)].eq(d2).groupby(df2['company']).any().all(axis=1)

s = (s1&s2)
s = set(s[s].index)
# {'C'}

len(s)
# 1

CodePudding user response:

To See The Companies That Are In Both Data Frames

1st part

combined_dataframe1=df1[df2['company'].isin(df1['company'])]

combined_dataframe1['company']

2nd part

To see the company that satisfies your conditions

combined_dataframe2=df2[df2['company'].isin(df1['company'])]

joined_dataframe=pd.merge(combined_dataframe1,combined_dataframe2, on='company',how='outer')

As per your condition

final_dataframe=joined_dataframe[joined_dataframe.columns][joined_dataframe['po_x']=='n0'}[joined_dataframe['act_call_yes']=='yes'][joined_dataframe['act_visit_y']=='yes'] print(final_dataframe)

  • Related