Home > Back-end >  Update specific Pandas DataFrame column with values from another Dataframe column (equivalent to SQL
Update specific Pandas DataFrame column with values from another Dataframe column (equivalent to SQL

Time:10-19

I have two Pandas Dataframes:

First:

Timestamp Label Value Another col 1 another col 2
2022-01-01 00:00 A 0 ... ...
2022-01-01 00:00 B 0 ... ...
2022-01-01 00:15 B 0 ... ...
2022-01-01 00:30 B 0 ... ...
2022-01-01 00:45 B 0 ... ...
2022-01-01 01:00 C 0 ... ...
2022-01-01 01:00 D 0 ... ...
2022-01-01 01:15 D 0 ... ...
... ... ... ... ...

Second:

Timestamp Label Value Unrelated column A Unrelated column B
2022-01-01 00:00 A 20 ... ...
2022-01-01 01:00 C 20 ... ...
... ... ... ... ...

What I would like to do is the following:

  1. Take all rows from the Second dataframe
  2. Find the matching row (by Timestamp and Label) in the First dataframe (while also ignoring rows for which I can not find the match)
  3. Update the First.Value column with the values from Second.Value column

The final result would be updated First dataframe:

Timestamp Label Value Another col 1 another col 2
2022-01-01 00:00 A 20 ... ...
2022-01-01 00:00 B 0 ... ...
2022-01-01 00:15 B 0 ... ...
2022-01-01 00:30 B 0 ... ...
2022-01-01 00:45 B 0 ... ...
2022-01-01 01:00 C 20 ... ...
2022-01-01 01:00 D 0 ... ...
2022-01-01 01:15 D 0 ... ...
... ... ... ... ...

In SQL world I could do something like update with join. I have no idea how to do it in Pandas. I have looked at DataFrame.update and DataFrame.merge but they all seem to be oriented towards completely merging the two dataframes whereas I just want to copy certain values. It seems it should probably be possible with some sort of multi-index but it is way beyond my knowledge.

Any help is appreciated

Edit: I have been asked if this is a duplicate of Pandas Merging 101. I do not see how it is, I need to update the first DataSet inplace and this does not seem to be possible with merge.

CodePudding user response:

First, let's merge these two tables according to 'Timestamp','Label'. The df2['Value'] values ​​in the rows that match the first df will not be nan, otherwise they will all be nan.

After use np.where: if the df2['Value'] value in df2 is not nan then take that value, if nan then keep the df1['Value'] value in df1.

final=df.merge(df2,on=['Timestamp','Label'],how='left')
final['Value_x']=np.where(final['Value_y'].notnull(),final['Value_y'],final['Value_x'])
final=final.rename(columns={'Value_x':'Value'})
final=final[df.columns]
print(final)
'''
    Timestamp           Label   Value   Another col 1   another col 2
0   2022-01-01 00:00:00 A       20      ...             ...
1   2022-01-01 00:00:00 B       0       ...             ...
2   2022-01-01 00:15:00 B       0       ...             ...
3   2022-01-01 00:30:00 B       0       ...             ...
4   2022-01-01 00:45:00 B       0       ...             ...
5   2022-01-01 01:00:00 C       20      ...             ...
6   2022-01-01 01:00:00 D       0       ...             ...
7   2022-01-01 01:15:00 D       0       ...             ...

'''
  • Related