Let's suppose I have a following dataframe:
df = pd.DataFrame({'id': [1, 2, 3, 4, 5], 'val': [0, 0, 0, 0, 0]})
I want to modify the column val
with values from another dataframes like these:
df1 = pd.DataFrame({'id': [2, 3], 'val': [1, 1]})
df2 = pd.DataFrame({'id': [1, 5], 'val': [2, 2]})
I need a function merge_values_into_df
that would work in the way to provide the following result:
df = merge_values_into_df(df1, on='id', field='val')
df = merge_values_into_df(df2, on='id', field='val')
print(df)
id val
0 1 2
1 2 1
2 3 1
3 4 0
4 5 2
I need an efficient (by CPU and memory) solution because I want to apply the approach to huge dataframes.
CodePudding user response:
Use DataFrame.update
with convert id
to index in all DataFrames:
df = df.set_index('id')
df1 = df1.set_index('id')
df2 = df2.set_index('id')
df.update(df1)
df.update(df2)
df = df.reset_index()
print (df)
id val
0 1 2.0
1 2 1.0
2 3 1.0
3 4 0.0
4 5 2.0
CodePudding user response:
You can concat
all dataframes and drop_duplicated
same id by keeping the last occurrence of id.
out = pd.concat([df, df1, df2]).drop_duplicates('id', keep='last')
print(out.sort_values('id', ignore_index=True))
# Output
id val
0 1 2
1 2 1
2 3 1
3 4 0
4 5 2