Home > OS >  Python pandas merge with condition and no duplicates
Python pandas merge with condition and no duplicates

Time:09-26

I have 2 dataframes derived from 2 excel files. The first is a sort of template where there is a column with a condition and the other has the same format but includes inputs for different time periods. I would like to create an output dataframe that basically creates a copy of the template populated with the inputs when the condition is met.

When I use something like df1.merge(df2.assign(Condition='yes'), on=['Condition'], how='left') I sort of get something in line with what I'm after but it contains duplicates. What could I do instead?

thanks

Example below

Code

df1={'reference':[1,2],'condition':['yes','no'],'31/12/2021':[0,0],'31/01/2022':[0,0]}
df1 = pd.DataFrame.from_dict(df1)

df2 = {'reference':[1,2],'condition':["",""],'31/12/2021':[101,231],'31/01/2022':[3423,3242]}
df2 = pd.DataFrame.from_dict(df2)

df1.merge(df2.assign(condition='yes'), on=['condition'], how='left')

Visual example

enter image description here

CodePudding user response:

You could use df.update for this:

# only `update` from column index `2` onwards: ['31/12/2021', '31/01/2022']
df2.update(df1.loc[df1.condition=='no', list(df1.columns)[2:]])
print(df2)

   reference condition  31/12/2021  31/01/2022
0          1                 101.0      3423.0
1          2                   0.0         0.0

Alternative solution using df.where:

df2.iloc[:,2:] = df2.iloc[:,2:].where(df1.condition=='yes',df1.iloc[:,2:])
print(df2)

   reference condition  31/12/2021  31/01/2022
0          1                   101        3423
1          2                     0           0
  • Related