I have two dataframes of different sizes (row-wise):
df1:
col1 col2 col3 col4 ...
index1 a11 a12 a13 a14
index2 b11 b12 b13 b14
index4 c11 c12 c13 c14
index5 d11 d12 d13 d14
df2:
col1 col2 col3 col4 ...
index1 e11 e12 e13 e14
index2 f11 f12 f13 f14
index3 g11 g12 g13 g14
index5 h11 h12 h13 h14
index6 i11 i12 i13 i14
Note, the column names in both dataframes are identical and there can be several thousand of them. The values a11-i14 are arbitrary integers. The indices are my ID's (integers). A large portion of them from df1 are identical with the ones in df2 (here: index1, index2 and index5) but not all of them. Both dataframes can contain unique indices. The indices in both dataframes are ordered.
I want to create a new dataframe "df3" which looks like this:
df3:
col1 col2 col3 col4
index1 a11 e11 a12 e12 a13 e13 a14 e14
index2 b11 f11 b12 f12 b13 f13 b14 f14
index3 g11 g12 g13 g14
index4 c11 c12 c13 c14
index5 d11 h11 d12 h12 d13 h13 d14 h14
index6 i11 i12 i13 i14
I want to combine df1 and df2 in such a way, that rows with identical indices are summed (in this case index1, index2, index5). Otherwise, the unique rows are to be added unchanged. The order of the indices/rows is important.
How can I do that?
Any help is much appreciated!
CodePudding user response:
You can just do add
:
df3 = df1.add(df2, fill_value=0)
or concat
followed by groupby
:
pd.concate([df1, df2]).groupby(level=0).sum()