Suppose I have an Existing DataFrame looks like this,
DF_1
Hour | Value_1 | Value_2 |
---|---|---|
13:00 | 20 | 30 |
14:00 | 20 | 30 |
15:00 | 120 | 140 |
16:00 | 125 | 145 |
17:00 | 130 | 135 |
18:00 | 20 | 20 |
and I have an Updating DataFrame, looks like,
DF_2
Hour | Value_1 | Value_2 |
---|---|---|
15:00 | 200 | 240 |
16:00 | 225 | 255 |
17:00 | 260 | 275 |
What I want to have is,
DF_FINAL
Hour | Value_1 | Value_2 |
---|---|---|
13:00 | 20 | 30 |
14:00 | 20 | 30 |
15:00 | 200 | 240 |
16:00 | 225 | 255 |
17:00 | 260 | 275 |
18:00 | 20 | 20 |
I already tried by using merge, join, concat, and even Updating(), and also combine_first(). But I didn't get what I wanted.
Would anyone please explain the correct way?
CodePudding user response:
You can concatenate both dataframes then keep only last occurence of each hour then sort your dataframe:
out = (pd.concat([df1, df2])
.drop_duplicates('Hour', keep='last')
.sort_values('Hour', ignore_index=True))
print(out)
# Output
Hour Value_1 Value_2
0 13:00 20 30
1 14:00 20 30
2 15:00 200 240
3 16:00 225 255
4 17:00 260 275
5 18:00 20 20
CodePudding user response:
Another possible answer:
update_dict1 = dict(zip(df2['Hour'], df2['Value_1']))
update_dict2 = dict(zip(df2['Hour'], df2['Value_2']))
df3 = df1.copy()
df3['Value_1'] = df1.Hour.map(update_dict1)
df3['Value_2'] = df1.Hour.map(update_dict2)
mask = df3.Value_1.isna()
df3.loc[mask, 'Value_1'] = df1.loc[mask]['Value_1']
df3.loc[mask, 'Value_2'] = df1.loc[mask]['Value_2']
df3
Hour Value_1 Value_2
0 13:00 20.0 30.0
1 14:00 20.0 30.0
2 15:00 200.0 240.0
3 16:00 225.0 255.0
4 17:00 260.0 275.0
5 18:00 20.0 20.0