Home > Blockchain >  Concatenating two (or more) dataframes while adding column in python
Concatenating two (or more) dataframes while adding column in python

Time:08-24

I'm trying to generate the input dataset for multi-task learning, but I faced some problems during concatenating the datasets.
This is the example dataframes for merging.

import pandas as pd

df1 = pd.DataFrame({
    'sample': ['A', 'B', 'C', 'D'],
    'group': [1,0,1,0],
    'value': [123, 64, 534, 873]
})

df2 = pd.DataFrame({
    'sample': ['A', 'D', 'E'],
    'group': [1,1,0],
    'value': [372, 981, 23]
})

df1
# sample    group      value
#0     A        1        123
#1     B        0         64
#2     C        1        534
#3     D        0        873

df2
# sample    group      value
#0     A        1        372
#1     D        1        981
#2     E        0         23

Expected result

#group_x and group_y columns do not need to be the same name as the expected result.
df3
# sample    group_x   group_y      value
#0     A          1       NaN        123
#1     B          0       NaN         64
#2     C          1       NaN        534
#3     D          0       NaN        873
#4     A        NaN         1        372
#5     D        NaN         1        981
#6     E        NaN         0         23

I tried pd.concat and pd.merge for appending two dataframes because they share the same columns.
But 1) with pd.concat, I could not append group columns horizontally, and 2) pd.merge expand columns widely.
Is there any recommended function for this job?

CodePudding user response:

You need a outer merge but on columns ['sample', 'value']

out = df1.merge(df2, how='outer', on=['sample', 'value'])

print(out)

  sample  group_x  value  group_y
0      A      1.0    123      NaN
1      B      0.0     64      NaN
2      C      1.0    534      NaN
3      D      0.0    873      NaN
4      A      NaN    372      1.0
5      D      NaN    981      1.0
6      E      NaN     23      0.0
  • Related