I have 3 data frame. 2 of them are only one column, another one is two columns. 3 dataframe looks like this
Itemcode Itemcode2 Itemcode count
AAC AAC AAC 2
AAB AAB AAB 1
AAG AAF AAT 0
AAD AAD AAD 1
AAC AAF 1
If I want to check how many occurence of Itemcode and Itemcode2, I can use value_counts()
.
Then if I want to combine them, I can not use pd.concat ([]) for the 3rd (it works if for 1 and 2 dataframe) because number of column is different.
How do I join these 3 data frame ?
This is the result I expected.
df df2 df3
AAC 2 1 2
AAB 1 1 1
AAT 0 0 0
AAD 1 1 1
AAF 0 1 1
Thanks
CodePudding user response:
You can certainly use pandas.concat
:
(pd.concat([df1['Itemcode'].value_counts(),
df2['Itemcode2'].value_counts(),
df3.set_index('Itemcode')
], axis=1, keys=['df1', 'df2', 'df3'])
.droplevel(1, axis=1)
.fillna(0, downcast='infer')
)
Alternative without value_counts
and with a pivot_table
:
(pd.concat([df1, df2.rename(columns={'Itemcode2': 'Itemcode'}), df3], keys=['df1', 'df2', 'df3'], names=['source'])
.reset_index(0)
.fillna({'count': 1})
.pivot_table(index='Itemcode', columns='source',
values='count', aggfunc='sum', fill_value=0)
)
Output:
df1 df2 df3
AAC 2 1 2
AAB 1 1 1
AAG 1 0 0
AAD 1 1 1
AAF 0 1 1
AAT 0 0 0
CodePudding user response:
df3 = df3.set_index('Itemcode').rename(columns={'count':'df3'})
a=df1.Itemcode.value_counts().rename('df1').to_frame()
b=df2.Itemcode2.value_counts().rename('df2').to_frame()
final=a.join(b).join(df3,how='right').fillna(0)
final
Itemcode df1 df2 df3
AAC 2.0 1.0 2
AAB 1.0 1.0 1
AAT 0.0 0.0 0
AAD 1.0 1.0 1
AAF 0.0 0.0 1