I have two dataframes that share some of the same columns, but one has more columns than the other. I would like to compare certain column values of the dataframes and update a column value for each row of one of the dataframes where both dataframes have the same value for certain columns. Ex:
df1: df2:
State Organization Date Tag Fine State Organization Date Fine
MD ABC 01/10/2021 901 0 MD ABC 01/10/2021 1000
MD ABC 01/10/2021 801 0 MD ABC 01/15/2021 6000
NJ DEF 02/10/2021 701 0 NJ DEF 02/10/2021 900
NJ DEF 02/10/2021 601 0
NJ DEF 02/10/2021 701 0
So in my particular case, if both dataframes share a row where state, organization, and date are the same, I would like to use df2's corresponding row's fine value to update df1's corresponding row's fine value. So:
df1:
State Organization Date Tag Fine
MD ABC 01/10/2021 901 1000
MD ABC 01/15/2021 801 6000
NJ DEF 02/10/2021 701 900
NJ DEF 02/10/2021 601 900
NJ DEF 02/10/2021 701 900
As you can see, the data frames do not have an equal number of columns, so I'm not sure if there's an easy way to do this without using iterrows. Any suggestions?
CodePudding user response:
try this:
idx = ['State', 'Organization', 'Date']
res = df1.set_index(idx).copy()
print(res)
>>>
Tag Fine
State Organization Date
MD ABC 01/10/2021 901 0
01/10/2021 801 0
NJ DEF 02/10/2021 701 0
02/10/2021 601 0
02/10/2021 701 0
df2 = df2.set_index(idx)
print(df2)
>>>
Fine
State Organization Date
MD ABC 01/10/2021 1000
01/15/2021 6000
NJ DEF 02/10/2021 900
res.update(df2)
print(res)
>>>
Tag Fine
State Organization Date
MD ABC 01/10/2021 901 1000.0
01/10/2021 801 1000.0
NJ DEF 02/10/2021 701 900.0
02/10/2021 601 900.0
02/10/2021 701 900.0
pd.__version__
>>>
'1.4.1'