I have two Pandas Dataframes:
First:
Timestamp | Label | Value | Another col 1 | another col 2 |
---|---|---|---|---|
2022-01-01 00:00 | A | 0 | ... | ... |
2022-01-01 00:00 | B | 0 | ... | ... |
2022-01-01 00:15 | B | 0 | ... | ... |
2022-01-01 00:30 | B | 0 | ... | ... |
2022-01-01 00:45 | B | 0 | ... | ... |
2022-01-01 01:00 | C | 0 | ... | ... |
2022-01-01 01:00 | D | 0 | ... | ... |
2022-01-01 01:15 | D | 0 | ... | ... |
... | ... | ... | ... | ... |
Second:
Timestamp | Label | Value | Unrelated column A | Unrelated column B |
---|---|---|---|---|
2022-01-01 00:00 | A | 20 | ... | ... |
2022-01-01 01:00 | C | 20 | ... | ... |
... | ... | ... | ... | ... |
What I would like to do is the following:
- Take all rows from the
Second
dataframe - Find the matching row (by
Timestamp
andLabel
) in theFirst
dataframe (while also ignoring rows for which I can not find the match) - Update the
First.Value
column with the values fromSecond.Value
column
The final result would be updated First
dataframe:
Timestamp | Label | Value | Another col 1 | another col 2 |
---|---|---|---|---|
2022-01-01 00:00 | A | 20 | ... | ... |
2022-01-01 00:00 | B | 0 | ... | ... |
2022-01-01 00:15 | B | 0 | ... | ... |
2022-01-01 00:30 | B | 0 | ... | ... |
2022-01-01 00:45 | B | 0 | ... | ... |
2022-01-01 01:00 | C | 20 | ... | ... |
2022-01-01 01:00 | D | 0 | ... | ... |
2022-01-01 01:15 | D | 0 | ... | ... |
... | ... | ... | ... | ... |
In SQL world I could do something like update with join. I have no idea how to do it in Pandas. I have looked at DataFrame.update and DataFrame.merge but they all seem to be oriented towards completely merging the two dataframes whereas I just want to copy certain values. It seems it should probably be possible with some sort of multi-index but it is way beyond my knowledge.
Any help is appreciated
Edit: I have been asked if this is a duplicate of Pandas Merging 101. I do not see how it is, I need to update the first DataSet inplace and this does not seem to be possible with merge
.
CodePudding user response:
First, let's merge these two tables according to 'Timestamp','Label'. The df2['Value'] values in the rows that match the first df will not be nan, otherwise they will all be nan.
After use np.where: if the df2['Value'] value in df2 is not nan then take that value, if nan then keep the df1['Value'] value in df1.
final=df.merge(df2,on=['Timestamp','Label'],how='left')
final['Value_x']=np.where(final['Value_y'].notnull(),final['Value_y'],final['Value_x'])
final=final.rename(columns={'Value_x':'Value'})
final=final[df.columns]
print(final)
'''
Timestamp Label Value Another col 1 another col 2
0 2022-01-01 00:00:00 A 20 ... ...
1 2022-01-01 00:00:00 B 0 ... ...
2 2022-01-01 00:15:00 B 0 ... ...
3 2022-01-01 00:30:00 B 0 ... ...
4 2022-01-01 00:45:00 B 0 ... ...
5 2022-01-01 01:00:00 C 20 ... ...
6 2022-01-01 01:00:00 D 0 ... ...
7 2022-01-01 01:15:00 D 0 ... ...
'''