Home > Enterprise >  How can I calculate the sum of two `pandas.DataFrame` based on `pandas.DataFrame.index`?
How can I calculate the sum of two `pandas.DataFrame` based on `pandas.DataFrame.index`?

Time:07-29

What I want to achieve

import pandas as pd

data = [[1, 2], [3, 4]]
index1 = ['I1', 'I2']
index2 = ['I1', 'I3']
columns = ['C1', 'C2']

df1 = pd.DataFrame(data, index=index1, columns=columns)
df2 = pd.DataFrame(data, index=index2, columns=columns)

print(df1)
#    C1  C2
#I1   1   2
#I2   3   4

print(df2)
#    C1  C2
#I1   1   2
#I3   3   4

print(...) # Calculate somehow
## !!!!!Expected Result!!!!!
#    C1  C2
#I1   2   4
#I2   3   4
#I3   3   4

The expected result is a dataframe whose values are like below.

  • I1: the sum of two dataframes because both df1 and df2 have a row named 'I1'.
  • I2: use the value of df1.loc['I2'] because df2 doesn't have this index.
  • I3: use the value of df2.loc['I3'] because df1 doesn't have this index.

What I tested

print(df1.add(df2, axis='index'))
#    C1  C2
#I1 2.0 4.0
#I2 NaN NaN
#I3 NaN NaN

print(pd.concat([df1, df2]))
#    C1  C2
#I1   1   2
#I2   3   4
#I1   1   2
#I3   3   4

print(df1   df2.values)
#    C1  C2
#I1   2   4
#I2   6   8

Could you help me get the expect result?

CodePudding user response:

Try using DataFrame.add()

df = df1.add(df2, fill_value=0)

dataframe matches your output but may need to fix dtypes you can use

df["C1"] = df["C1"].astype(np.int64)
 
df["C2"] = df["C2"].astype(np.int64)

for not using with numpy, use just int instead of np.int64 in the code

for documentation on this see Pandas Documentation

CodePudding user response:

Try chain with groupby

out = pd.concat([df1, df2]).groupby(level=0).sum()
Out[161]: 
    C1  C2
I1   2   4
I2   3   4
I3   3   4

CodePudding user response:

What you are looking for is the df.combine method This method combines both of your dataframes together with a given, function just like the docs show

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.combine.html

So basically what you need to do is the following,

func = lambda s1,s2: s1 s2
df3 = df1.combine(df2,func,fill_value=0)
print(df3)

This gives you a little more flexibility than add

CodePudding user response:

Here is one way to do it using combine_first, successively

df3=df3.combine_first(df1).combine_first(df2)
df3

     C1      C2
I1  2.0     4.0
I2  3.0     4.0
I3  3.0     4.0
  • Related