df1 = pd.DataFrame({'Customer_Name': ['foo', 'bar', 'baz', 'foo'],
'Customer_No': ['01', '02', '03', '04'],
'Jan': [1, 2, 3, 4]})
df2 = pd.DataFrame({'Customer_Name': ['foo', 'bar', 'baz', 'foo','bad','bag'],
'Customer_No': ['01', '02', '03', '01','05','06'],
'Feb': [5, 6, 7, 8, 9, 10]})
This below is the structure of df4
I would like to achieve (kindly note the customer values summed in the 'Feb'
and 'Jan'
columns and the distribution of the 'Customer No'
columns).
Desired Output:
df4 = pd.DataFrame({'Customer_Name': ['foo', 'bar', 'baz','bad','bag'],
'Customer_No': ['01', '02', '03', '04','05'],
'Jan': [5, 2, 3, 0, 0],
'Feb': [13, 6, 7, 3, 6]})
I tried
df4 = df1.merge(df2, on = ['Customer_Name','Customer_No'], how = 'outer')
df4.fillna(0, inplace=True)
df4.reset_index(drop=True)
df4['Jan'] = df4['Jan'].astype(int)
df4['Feb'] = df4['Feb'].astype(int)
df4 = df4[['Customer_Name','Customer_No','Jan','Feb']]
df4
CodePudding user response:
You are on the right track:
df4 = (
df1.merge(df2, on=["Customer_Name", "Customer_No"], how="outer")
.groupby(["Customer_No", "Customer_Name"])
.sum()
.astype("int")
.reset_index()
)