Home > OS >  Pandas - comparing certain columns of two dataframes and updating rows of one if a condition is met
Pandas - comparing certain columns of two dataframes and updating rows of one if a condition is met

Time:03-08

I have two dataframes that share some of the same columns, but one has more columns than the other. I would like to compare certain column values of the dataframes and update a column value for each row of one of the dataframes where both dataframes have the same value for certain columns. Ex:

df1:                                        df2:
State Organization Date    Tag  Fine        State Organization Date    Fine  
MD    ABC      01/10/2021  901  0           MD    ABC      01/10/2021  1000
MD    ABC      01/10/2021  801  0           MD    ABC      01/15/2021  6000
NJ    DEF      02/10/2021  701  0           NJ    DEF      02/10/2021  900
NJ    DEF      02/10/2021  601  0
NJ    DEF      02/10/2021  701  0

So in my particular case, if both dataframes share a row where state, organization, and date are the same, I would like to use df2's corresponding row's fine value to update df1's corresponding row's fine value. So:

 df1:                
State Organization Date    Tag  Fine   
MD    ABC      01/10/2021  901  1000    
MD    ABC      01/15/2021  801  6000         
NJ    DEF      02/10/2021  701  900            
NJ    DEF      02/10/2021  601  900
NJ    DEF      02/10/2021  701  900

As you can see, the data frames do not have an equal number of columns, so I'm not sure if there's an easy way to do this without using iterrows. Any suggestions?

CodePudding user response:

try this:

idx = ['State', 'Organization', 'Date']
res = df1.set_index(idx).copy()
print(res)
>>>
                                        Tag   Fine
State       Organization    Date        
MD          ABC             01/10/2021  901   0
                            01/10/2021  801   0
NJ          DEF             02/10/2021  701   0
                            02/10/2021  601   0
                            02/10/2021  701   0

df2 = df2.set_index(idx)
print(df2)
>>>
                                        Fine
State       Organization    Date    
MD          ABC             01/10/2021  1000
                            01/15/2021  6000
NJ          DEF             02/10/2021  900

res.update(df2)
print(res)
>>>

                                         Tag    Fine
State       Organization    Date        
MD          ABC             01/10/2021   901    1000.0
                            01/10/2021   801    1000.0
NJ          DEF             02/10/2021   701    900.0
                            02/10/2021   601    900.0
                            02/10/2021   701    900.0

pd.__version__
>>>
'1.4.1'
  • Related