Say two dataframe:
df1 = pd.DataFrame({'A': ['foo', 'bar', 'test'], 'b': [1, 2, 3], 'c': [3, 4, 5]})
df2 = pd.DataFrame({'A': ['foo', 'baz'], 'c': [2, 1]})
df1
A b c
0 foo 1 3
1 bar 2 4
2 test 3 5
df2
A c
0 foo 2
1 baz 1
After merging I want:
df1
A b c
0 foo 1 3
1 bar 2 4
2 test 3 5
3 baz NaN 1
If df1['A']
does not contain any thing from df2['A']
, only that row(s) from df2
need to be added to df1
. Ignore other columns when there is a match in col A
.
I tried pd.merge(df1, df2, on=['A'], how='outer')
, but this does not give expected output.
Additionally, for future reference, I also want to know how to get the below output:
df1
A b c
0 foo 1 2
1 bar 2 4
2 test 3 5
3 baz NaN 1
See the updated col c
value from df2
for foo
.
CodePudding user response:
Try with combine_first
out = df1.set_index('A').combine_first(df2.set_index('A')).reset_index()
A b c
0 bar 2.0 4
1 baz NaN 1
2 foo 1.0 3
3 test 3.0 5
CodePudding user response:
You are on the right track, you need to fill c
with values from the other dataframe:
(df1.merge(df2.rename(columns={'c': 'c_y'}), on='A', how = 'outer')
.assign(c = lambda df: df.c.fillna(df.c_y))
.drop(columns = 'c_y')
)
A b c
0 foo 1.0 3.0
1 bar 2.0 4.0
2 test 3.0 5.0
3 baz NaN 1.0
CodePudding user response:
You can use Series.isin
to check if df1['A']
values are in df2['A']
which a boolean Series use that in boolean indexing. Then use DataFrame.append
.
m = df2['A'].isin(df1['A'])
out = df1.append(df2[~m], ignore_index=True)
print(out)
# A b c
# 0 foo 1.0 3
# 1 bar 2.0 4
# 2 test 3.0 5
# 3 baz NaN 1