Home > Software engineering >  Python: How to combine two dataframes, while summing rows with identical indices and keeping unique
Python: How to combine two dataframes, while summing rows with identical indices and keeping unique

Time:06-23

I have two dataframes of different sizes (row-wise):

df1:
         col1  col2  col3  col4   ...
index1   a11   a12   a13   a14
index2   b11   b12   b13   b14
index4   c11   c12   c13   c14
index5   d11   d12   d13   d14


df2:
         col1  col2  col3  col4   ...
index1   e11   e12   e13   e14
index2   f11   f12   f13   f14
index3   g11   g12   g13   g14
index5   h11   h12   h13   h14
index6   i11   i12   i13   i14

Note, the column names in both dataframes are identical and there can be several thousand of them. The values a11-i14 are arbitrary integers. The indices are my ID's (integers). A large portion of them from df1 are identical with the ones in df2 (here: index1, index2 and index5) but not all of them. Both dataframes can contain unique indices. The indices in both dataframes are ordered.

I want to create a new dataframe "df3" which looks like this:

df3:
         col1      col2      col3      col4
index1   a11 e11   a12 e12   a13 e13   a14 e14
index2   b11 f11   b12 f12   b13 f13   b14 f14
index3   g11       g12       g13       g14
index4   c11       c12       c13       c14
index5   d11 h11   d12 h12   d13 h13   d14 h14
index6   i11       i12       i13       i14

I want to combine df1 and df2 in such a way, that rows with identical indices are summed (in this case index1, index2, index5). Otherwise, the unique rows are to be added unchanged. The order of the indices/rows is important.

How can I do that?

Any help is much appreciated!

CodePudding user response:

You can just do add:

df3 = df1.add(df2, fill_value=0)

or concat followed by groupby:

pd.concate([df1, df2]).groupby(level=0).sum()
  • Related