Home > Software engineering >  How to combine two dataframes with different number of both rows and columns using pandas
How to combine two dataframes with different number of both rows and columns using pandas

Time:12-14

I have two data frames with different number of both rows and columns. Some indices and columns are common.

df1

  A B C
a 1 0 0
b 2 3 4
c 4 5 6

df2

  A C D E
a 1 0 2 3
b 2 4 2 4
d 1 4 5 6
e 1 2 3 4

The result should be

  A  B  C  D  E
a 1  0  0  2  3
b 2  3  4  2  4
c 4  5  6  NA NA
d 1  NA 4  5  6
e 1  NA 2  3  4

I tried "join", "merge", and "concat" but it wasn't successful. Would you help me out?

Thanks.

CodePudding user response:

You can use combine_first

df = df1.combine_first(df2)

Result

   A    B  C    D    E
a  1  0.0  0  2.0  3.0
b  2  3.0  4  2.0  4.0
c  4  5.0  6  NaN  NaN
d  1  NaN  4  5.0  6.0
e  1  NaN  2  3.0  4.0

CodePudding user response:

You are looking for bind_rows from dplyr package :

df=data.frame(A=c(1,2,3),B=c(2,3,6))
df2=data.frame(A=c(1,1,1,2),B=c(4,5,7,4),C=c(1,1,1,1))
dplyr::bind_rows(df,df2)

Result :

  A B  C
1 1 2 NA
2 2 3 NA
3 3 6 NA
4 1 4  1
5 1 5  1
6 1 7  1
7 2 4  1

CodePudding user response:

Here is a way to do that without using combine_first

pd.concat([df.stack(), df1.stack()]).reset_index().drop_duplicates().pivot("level_0", "level_1", 0)

Results:

level_1    A    B    C    D    E
level_0
a        1.0  0.0  0.0  2.0  3.0
b        2.0  3.0  4.0  2.0  4.0
c        4.0  5.0  6.0  NaN  NaN
d        1.0  NaN  4.0  5.0  6.0
e        1.0  NaN  2.0  3.0  4.0

CodePudding user response:

Another possible solution:

(pd.merge(df1.reset_index(), df2.reset_index(), how='outer')
 .set_index('index').rename_axis(None))

Output:

   A    B  C    D    E
a  1  0.0  0  2.0  3.0
b  2  3.0  4  2.0  4.0
c  4  5.0  6  NaN  NaN
d  1  NaN  4  5.0  6.0
e  1  NaN  2  3.0  4.0
  • Related