Merge two dataframe, keep the past data, and updating new value based on column-CodePudding

Suppose I have an Existing DataFrame looks like this,

DF_1

Hour	Value_1	Value_2
13:00	20	30
14:00	20	30
15:00	120	140
16:00	125	145
17:00	130	135
18:00	20	20

and I have an Updating DataFrame, looks like,

DF_2

Hour	Value_1	Value_2
15:00	200	240
16:00	225	255
17:00	260	275

What I want to have is,

DF_FINAL

Hour	Value_1	Value_2
13:00	20	30
14:00	20	30
15:00	200	240
16:00	225	255
17:00	260	275
18:00	20	20

I already tried by using merge, join, concat, and even Updating(), and also combine_first(). But I didn't get what I wanted.

Would anyone please explain the correct way?

CodePudding user response：

You can concatenate both dataframes then keep only last occurence of each hour then sort your dataframe:

out = (pd.concat([df1, df2])
         .drop_duplicates('Hour', keep='last')
         .sort_values('Hour', ignore_index=True))
print(out)

# Output
    Hour  Value_1  Value_2
0  13:00       20       30
1  14:00       20       30
2  15:00      200      240
3  16:00      225      255
4  17:00      260      275
5  18:00       20       20

CodePudding user response：

Another possible answer:

update_dict1 = dict(zip(df2['Hour'], df2['Value_1']))
update_dict2 = dict(zip(df2['Hour'], df2['Value_2']))

df3 = df1.copy()


df3['Value_1'] = df1.Hour.map(update_dict1)
df3['Value_2'] = df1.Hour.map(update_dict2)

mask = df3.Value_1.isna()

df3.loc[mask, 'Value_1'] = df1.loc[mask]['Value_1']
df3.loc[mask, 'Value_2'] = df1.loc[mask]['Value_2']

df3

    Hour  Value_1  Value_2
0  13:00     20.0     30.0
1  14:00     20.0     30.0
2  15:00    200.0    240.0
3  16:00    225.0    255.0
4  17:00    260.0    275.0
5  18:00     20.0     20.0