Home > database >  Merge two dataframe, keep the past data, and updating new value based on column
Merge two dataframe, keep the past data, and updating new value based on column

Time:01-08

Suppose I have an Existing DataFrame looks like this,

DF_1

Hour Value_1 Value_2
13:00 20 30
14:00 20 30
15:00 120 140
16:00 125 145
17:00 130 135
18:00 20 20

and I have an Updating DataFrame, looks like,

DF_2

Hour Value_1 Value_2
15:00 200 240
16:00 225 255
17:00 260 275

What I want to have is,

DF_FINAL

Hour Value_1 Value_2
13:00 20 30
14:00 20 30
15:00 200 240
16:00 225 255
17:00 260 275
18:00 20 20

I already tried by using merge, join, concat, and even Updating(), and also combine_first(). But I didn't get what I wanted.

Would anyone please explain the correct way?

CodePudding user response:

You can concatenate both dataframes then keep only last occurence of each hour then sort your dataframe:

out = (pd.concat([df1, df2])
         .drop_duplicates('Hour', keep='last')
         .sort_values('Hour', ignore_index=True))
print(out)

# Output
    Hour  Value_1  Value_2
0  13:00       20       30
1  14:00       20       30
2  15:00      200      240
3  16:00      225      255
4  17:00      260      275
5  18:00       20       20

CodePudding user response:

Another possible answer:

update_dict1 = dict(zip(df2['Hour'], df2['Value_1']))
update_dict2 = dict(zip(df2['Hour'], df2['Value_2']))

df3 = df1.copy()


df3['Value_1'] = df1.Hour.map(update_dict1)
df3['Value_2'] = df1.Hour.map(update_dict2)

mask = df3.Value_1.isna()

df3.loc[mask, 'Value_1'] = df1.loc[mask]['Value_1']
df3.loc[mask, 'Value_2'] = df1.loc[mask]['Value_2']

df3

    Hour  Value_1  Value_2
0  13:00     20.0     30.0
1  14:00     20.0     30.0
2  15:00    200.0    240.0
3  16:00    225.0    255.0
4  17:00    260.0    275.0
5  18:00     20.0     20.0
  • Related