How to combine two dataframes with different number of both rows and columns using pandas-CodePudding

I have two data frames with different number of both rows and columns. Some indices and columns are common.

df1

df2

The result should be

  A  B  C  D  E
a 1  0  0  2  3
b 2  3  4  2  4
c 4  5  6  NA NA
d 1  NA 4  5  6
e 1  NA 2  3  4

I tried "join", "merge", and "concat" but it wasn't successful. Would you help me out?

Thanks.

CodePudding user response：

You can use combine_first

df = df1.combine_first(df2)

Result

   A    B  C    D    E
a  1  0.0  0  2.0  3.0
b  2  3.0  4  2.0  4.0
c  4  5.0  6  NaN  NaN
d  1  NaN  4  5.0  6.0
e  1  NaN  2  3.0  4.0

CodePudding user response：

You are looking for bind_rows from dplyr package :

df=data.frame(A=c(1,2,3),B=c(2,3,6))
df2=data.frame(A=c(1,1,1,2),B=c(4,5,7,4),C=c(1,1,1,1))
dplyr::bind_rows(df,df2)

Result :

CodePudding user response：

Here is a way to do that without using combine_first

pd.concat([df.stack(), df1.stack()]).reset_index().drop_duplicates().pivot("level_0", "level_1", 0)

Results:

level_1    A    B    C    D    E
level_0
a        1.0  0.0  0.0  2.0  3.0
b        2.0  3.0  4.0  2.0  4.0
c        4.0  5.0  6.0  NaN  NaN
d        1.0  NaN  4.0  5.0  6.0
e        1.0  NaN  2.0  3.0  4.0

CodePudding user response：

Another possible solution:

(pd.merge(df1.reset_index(), df2.reset_index(), how='outer')
 .set_index('index').rename_axis(None))

Output:

   A    B  C    D    E
a  1  0.0  0  2.0  3.0
b  2  3.0  4  2.0  4.0
c  4  5.0  6  NaN  NaN
d  1  NaN  4  5.0  6.0
e  1  NaN  2  3.0  4.0