Merge two data frames and summing the values in columns-CodePudding

df1 = pd.DataFrame({'Customer_Name': ['foo', 'bar', 'baz', 'foo'],
                    'Customer_No': ['01', '02', '03', '04'],
                    'Jan': [1, 2, 3, 4]})
df2 = pd.DataFrame({'Customer_Name': ['foo', 'bar', 'baz', 'foo','bad','bag'],
                    'Customer_No': ['01', '02', '03', '01','05','06'],
                    'Feb': [5, 6, 7, 8, 9, 10]})

This below is the structure of df4 I would like to achieve (kindly note the customer values summed in the 'Feb' and 'Jan' columns and the distribution of the 'Customer No' columns).

Desired Output:

df4 = pd.DataFrame({'Customer_Name': ['foo', 'bar', 'baz','bad','bag'],
                    'Customer_No': ['01', '02', '03', '04','05'],
                    'Jan': [5, 2, 3, 0, 0],
                    'Feb': [13, 6, 7, 3, 6]})

I tried

df4 = df1.merge(df2, on = ['Customer_Name','Customer_No'], how = 'outer')
df4.fillna(0, inplace=True) 
df4.reset_index(drop=True)
df4['Jan'] = df4['Jan'].astype(int)
df4['Feb'] = df4['Feb'].astype(int)
df4 = df4[['Customer_Name','Customer_No','Jan','Feb']]
df4

CodePudding user response：

You are on the right track:

df4 = (
    df1.merge(df2, on=["Customer_Name", "Customer_No"], how="outer")
    .groupby(["Customer_No", "Customer_Name"])
    .sum()
    .astype("int")
    .reset_index()
)