Hi I have 2 dataframes as below:
key | Apple | Banana |
---|---|---|
abc | 1 | 12 |
bcd | 23 | 21 |
key | Train | Car |
---|---|---|
abc | 11 | 20 |
jkn | 2 | 19 |
I want to merge these 2 dataframes together with my key column so that I can get following table:
key | Train | Car | Banana | Apple |
---|---|---|---|---|
abc | 11 | 20 | 12 | 1 |
jkn | 2 | 19 | 0/NA | 0/NA |
bcd | 0/NA | 0/NA | 21 | 23 |
For columns where I don't have any record like for jkn / Apple
either 0 or NA should be printed.
Currently I tried using pd.concat
but am not exactly able to figure out how to get my desired result.
CodePudding user response:
Use pd.merge()
with how='outer'
, read further in documentation:
import pandas as pd
import io
data_string = """key Apple Banana
abc 1 12
bcd 23 21
"""
df1 = pd.read_csv(io.StringIO(data_string), sep='\s ')
data_string = """key Train Car
abc 11 20
jkn 2 19"""
df2 = pd.read_csv(io.StringIO(data_string), sep='\s ')
# Solution
df_result = pd.merge(df1, df2, on=['key'], how='outer')
print(df_result)
key Apple Banana Train Car
0 abc 1.0 12.0 11.0 20.0
1 bcd 23.0 21.0 NaN NaN
2 jkn NaN NaN 2.0 19.0
CodePudding user response:
Let's try concat
then groupby.sum
out = (pd.concat([df1, df2], ignore_index=True)
.groupby('key', as_index=False).sum())
print(out)
key Apple Banana Train Car
0 abc 1.0 12.0 11.0 20.0
1 bcd 23.0 21.0 0.0 0.0
2 jkn 0.0 0.0 2.0 19.0