Assuming that I have monthly datasets showing like these:
df1
company | date | act_call | act_visit | po |
---|---|---|---|---|
A | 2022-10-01 | Yes | No | No |
B | 2022-10-01 | Yes | No | Yes |
C | 2022-10-01 | No | No | No |
B | 2022-10-02 | No | Yes | No |
A | 2022-10-02 | No | Yes | Yes |
df2
company | date | act_call | act_visit | po |
---|---|---|---|---|
D | 2022-11-01 | Yes | No | No |
B | 2022-11-01 | Yes | No | Yes |
C | 2022-11-01 | Yes | Yes | No |
D | 2022-11-02 | No | Yes | No |
A | 2022-11-02 | No | Yes | Yes |
I want to compare the two dataframes and count several conditions:
the number of company that exists in both dataframes.
the number of company that exists in both dataframes that has at least one
act_call
as 'Yes' andact_visit
as 'Yes' indf2
, but haspo
as 'No' indf1
.
For the 1st condition, I've tried using pandas.Dataframe.sum()
and pandas.Dataframe.count_values()
but they didn't give the results that I want.
For the 2nd condition, I tried using this code:
(((df1[['act_calling', 'act_visit']].eq('yes'))&(df2['po'].eq('no'))).groupby(df2['company_name']).any().all(axis = 1).sum())
but, I'm not sure that the code above will only count the company that exists in both dataframes.
The expected output is this:
3, (A, B, C)
1, (C)
I'm open to any suggestions. Thank u in advance!
CodePudding user response:
companies that exists in both dfs
pd.merge(df1, df2, on="company", how="inner")['company'].unique()
Out[40]: array(['A', 'B', 'C'], dtype=object)
len(pd.merge(df1, df2, on="company", how="inner")['company'].unique())
Out[41]: 3
2nd part
df3 = pd.merge(df1, df2, on="company", how="inner")
df3[(df3.act_call_y == 'Yes') & (df3.act_visit_y == 'Yes') & (df3.po_x=='No')]
Out[54]:
company date_x act_call_x act_visit_x po_x date_y act_call_y act_visit_y po_y
4 C 01/10/2022 No No No 2022-11-01 Yes Yes No
len(df3[(df3.act_call_y == 'Yes') & (df3.act_visit_y == 'Yes') & (df3.po_x=='No')])
Out[56]: 1
CodePudding user response:
the number of company that exists in both dataframes.
set(df1['company']).intersection(df2['company'])
# {'A', 'B', 'C'}
len(set(df1['company']).intersection(df2['company']))
# 3
the number of company that exists in both dataframes that has at least one act_call
as 'Yes' and act_visit
as 'Yes' in df2
, but has po
as 'No' in df1
.
s1 = df1['po'].eq('No').groupby(df1['company']).any()
d2 = {'act_call': 'Yes', 'act_visit': 'Yes'}
s2 = df2[list(d2)].eq(d2).groupby(df2['company']).any().all(axis=1)
s = (s1&s2)
s = set(s[s].index)
# {'C'}
len(s)
# 1
CodePudding user response:
To See The Companies That Are In Both Data Frames
1st part
combined_dataframe1=df1[df2['company'].isin(df1['company'])]
combined_dataframe1['company']
2nd part
To see the company that satisfies your conditions
combined_dataframe2=df2[df2['company'].isin(df1['company'])]
joined_dataframe=pd.merge(combined_dataframe1,combined_dataframe2, on='company',how='outer')
As per your condition
final_dataframe=joined_dataframe[joined_dataframe.columns][joined_dataframe['po_x']=='n0'}[joined_dataframe['act_call_yes']=='yes'][joined_dataframe['act_visit_y']=='yes'] print(final_dataframe)