How do I merge (insert) rows from one dataframe into another one in Pandas?-CodePudding

Let's suppose I have a following dataframe:

df = pd.DataFrame({'id': [1, 2, 3, 4, 5], 'val': [0, 0, 0, 0, 0]})

I want to modify the column val with values from another dataframes like these:

df1 = pd.DataFrame({'id': [2, 3], 'val': [1, 1]})
df2 = pd.DataFrame({'id': [1, 5], 'val': [2, 2]})

I need a function merge_values_into_df that would work in the way to provide the following result:

df = merge_values_into_df(df1, on='id', field='val')
df = merge_values_into_df(df2, on='id', field='val')
print(df)


   id  val
0   1    2
1   2    1
2   3    1
3   4    0
4   5    2

I need an efficient (by CPU and memory) solution because I want to apply the approach to huge dataframes.

CodePudding user response：

Use DataFrame.update with convert id to index in all DataFrames:

df = df.set_index('id')
df1 = df1.set_index('id')
df2 = df2.set_index('id')

df.update(df1)
df.update(df2)

df = df.reset_index()
print (df)
   id  val
0   1  2.0
1   2  1.0
2   3  1.0
3   4  0.0
4   5  2.0

CodePudding user response：

You can concat all dataframes and drop_duplicated same id by keeping the last occurrence of id.

out = pd.concat([df, df1, df2]).drop_duplicates('id', keep='last')
print(out.sort_values('id', ignore_index=True))

# Output
   id  val
0   1    2
1   2    1
2   3    1
3   4    0
4   5    2