I have 2 dataframes derived from 2 excel files. The first is a sort of template where there is a column with a condition and the other has the same format but includes inputs for different time periods. I would like to create an output dataframe that basically creates a copy of the template populated with the inputs when the condition is met.
When I use something like df1.merge(df2.assign(Condition='yes'), on=['Condition'], how='left') I sort of get something in line with what I'm after but it contains duplicates. What could I do instead?
thanks
Example below
Code
df1={'reference':[1,2],'condition':['yes','no'],'31/12/2021':[0,0],'31/01/2022':[0,0]}
df1 = pd.DataFrame.from_dict(df1)
df2 = {'reference':[1,2],'condition':["",""],'31/12/2021':[101,231],'31/01/2022':[3423,3242]}
df2 = pd.DataFrame.from_dict(df2)
df1.merge(df2.assign(condition='yes'), on=['condition'], how='left')
Visual example
CodePudding user response:
You could use df.update
for this:
# only `update` from column index `2` onwards: ['31/12/2021', '31/01/2022']
df2.update(df1.loc[df1.condition=='no', list(df1.columns)[2:]])
print(df2)
reference condition 31/12/2021 31/01/2022
0 1 101.0 3423.0
1 2 0.0 0.0
Alternative solution using df.where
:
df2.iloc[:,2:] = df2.iloc[:,2:].where(df1.condition=='yes',df1.iloc[:,2:])
print(df2)
reference condition 31/12/2021 31/01/2022
0 1 101 3423
1 2 0 0