Home > Mobile >  Merge new rows from one dataframe to another
Merge new rows from one dataframe to another

Time:12-02

Say two dataframe:

df1 = pd.DataFrame({'A': ['foo', 'bar', 'test'], 'b': [1, 2, 3], 'c': [3, 4, 5]})
df2 = pd.DataFrame({'A': ['foo', 'baz'], 'c': [2, 1]})

df1
      A  b  c
0   foo  1  3
1   bar  2  4
2  test  3  5

df2
     A  c
0  foo  2
1  baz  1

After merging I want:

df1
      A  b    c
0   foo  1    3
1   bar  2    4
2  test  3    5
3   baz  NaN  1

If df1['A'] does not contain any thing from df2['A'], only that row(s) from df2 need to be added to df1. Ignore other columns when there is a match in col A.

I tried pd.merge(df1, df2, on=['A'], how='outer'), but this does not give expected output.

Additionally, for future reference, I also want to know how to get the below output:

df1
      A  b    c
0   foo  1    2
1   bar  2    4
2  test  3    5
3   baz  NaN  1

See the updated col c value from df2 for foo.

CodePudding user response:

Try with combine_first

out = df1.set_index('A').combine_first(df2.set_index('A')).reset_index()
      A    b  c
0   bar  2.0  4
1   baz  NaN  1
2   foo  1.0  3
3  test  3.0  5

CodePudding user response:

You are on the right track, you need to fill c with values from the other dataframe:

(df1.merge(df2.rename(columns={'c': 'c_y'}), on='A', how = 'outer')
    .assign(c = lambda df: df.c.fillna(df.c_y))
    .drop(columns = 'c_y')
)
      A    b    c
0   foo  1.0  3.0
1   bar  2.0  4.0
2  test  3.0  5.0
3   baz  NaN  1.0

CodePudding user response:

You can use Series.isin to check if df1['A'] values are in df2['A'] which a boolean Series use that in boolean indexing. Then use DataFrame.append.

m   = df2['A'].isin(df1['A'])
out = df1.append(df2[~m], ignore_index=True)
print(out)

#       A    b  c
# 0   foo  1.0  3
# 1   bar  2.0  4
# 2  test  3.0  5
# 3   baz  NaN  1
  • Related