Assuming that I have monthly datasets showing like these:

df1

company	date	act_call	act_visit	po
A	2022-10-01	Yes	No	No
B	2022-10-01	Yes	No	Yes
C	2022-10-01	No	No	No
B	2022-10-02	No	Yes	No
A	2022-10-02	No	Yes	Yes

df2

company	date	act_call	act_visit	po
D	2022-11-01	Yes	No	No
B	2022-11-01	Yes	No	Yes
C	2022-11-01	Yes	Yes	No
D	2022-11-02	No	Yes	No
A	2022-11-02	No	Yes	Yes

I want to compare the two dataframes and count several conditions:

the number of company that exists in both dataframes.
the number of company that exists in both dataframes that has at least one act_call as 'Yes' and act_visit as 'Yes' in df2, but has po as 'No' in df1.

For the 1st condition, I've tried using pandas.Dataframe.sum() and pandas.Dataframe.count_values() but they didn't give the results that I want.

For the 2nd condition, I tried using this code:

(((df1[['act_calling', 'act_visit']].eq('yes'))&(df2['po'].eq('no'))).groupby(df2['company_name']).any().all(axis = 1).sum())

but, I'm not sure that the code above will only count the company that exists in both dataframes.

The expected output is this:

3, (A, B, C)
1, (C)

I'm open to any suggestions. Thank u in advance!

CodePudding user response：

companies that exists in both dfs

pd.merge(df1, df2, on="company", how="inner")['company'].unique()
Out[40]: array(['A', 'B', 'C'], dtype=object)
len(pd.merge(df1, df2, on="company", how="inner")['company'].unique())
Out[41]: 3

2nd part

df3 = pd.merge(df1, df2, on="company", how="inner")
df3[(df3.act_call_y == 'Yes') & (df3.act_visit_y == 'Yes') & (df3.po_x=='No')]
Out[54]: 
  company      date_x act_call_x act_visit_x po_x      date_y act_call_y act_visit_y po_y
4       C  01/10/2022         No          No   No  2022-11-01        Yes         Yes   No

len(df3[(df3.act_call_y == 'Yes') & (df3.act_visit_y == 'Yes') & (df3.po_x=='No')])
Out[56]: 1

CodePudding user response：

the number of company that exists in both dataframes.

set(df1['company']).intersection(df2['company'])
# {'A', 'B', 'C'}

len(set(df1['company']).intersection(df2['company']))
# 3

the number of company that exists in both dataframes that has at least one `act_call` as 'Yes' and `act_visit` as 'Yes' in `df2`, but has `po` as 'No' in `df1`.

s1 = df1['po'].eq('No').groupby(df1['company']).any()

d2 = {'act_call': 'Yes', 'act_visit': 'Yes'}
s2 = df2[list(d2)].eq(d2).groupby(df2['company']).any().all(axis=1)

s = (s1&s2)
s = set(s[s].index)
# {'C'}

len(s)
# 1

CodePudding user response：

To See The Companies That Are In Both Data Frames

1st part

combined_dataframe1=df1[df2['company'].isin(df1['company'])]

combined_dataframe1['company']

2nd part

To see the company that satisfies your conditions

combined_dataframe2=df2[df2['company'].isin(df1['company'])]

joined_dataframe=pd.merge(combined_dataframe1,combined_dataframe2, on='company',how='outer')

As per your condition

final_dataframe=joined_dataframe[joined_dataframe.columns][joined_dataframe['po_x']=='n0'}[joined_dataframe['act_call_yes']=='yes'][joined_dataframe['act_visit_y']=='yes'] print(final_dataframe)

companies that exists in both dfs

2nd part

the number of company that exists in both dataframes.

the number of company that exists in both dataframes that has at least one act_call as 'Yes' and act_visit as 'Yes' in df2, but has po as 'No' in df1.

To See The Companies That Are In Both Data Frames

To see the company that satisfies your conditions

As per your condition

the number of company that exists in both dataframes that has at least one `act_call` as 'Yes' and `act_visit` as 'Yes' in `df2`, but has `po` as 'No' in `df1`.