import pandas as pd
data = {'Region': ['A', 'A', 'B', 'B', 'C', 'C'],
'Description': ['D1', 'D2', 'D1', 'D2', 'D1', 'D2'],
'Baseline 1':[1,2,3,4,5,6]
:
'Baseline N': [some numbers]
'Regime 1': [0,0,0,0,2,3]
:
'Regime N': [some numbers]
}
df1 = pd.DataFrame(data)
replace_data = {'Region':['A','B','C'],
'Values':[6,7,8]}
df2 = pd.DataFrame(replace_data)
I have a dataframe which has 'Region' as index and need to replace all the regions where description is 'D2' and **Regime 1 to N having 0 ** (any regime like column having 0) with the 'replace_data' dataframe value of the particular region.
So Regime 1 from above code example should have final value as [0,6,0,7,2,3] (Notice 0 replaced by 6 & 7 from the replace_data frame for region A and B). For each regime column, this need to be done. Rest of the columns (like baseline) should remain as it is.
I can run a loop for all regime like columns and replace but I feel that will be very inefficient and time consuming since I have 20 such columns and need to similar operations multiple time. Any efficient way to do this?
df.loc[((df['Description']=='D2'),regime_all_cols].replace({0:df2['Values'].values[0]})
This used to work when I had only 1 region but with multiple it doesn't work accurately.
CodePudding user response:
You can try to slice the portion of the DataFrame to replace and replace it with a filled version of the DataFrame
cols = df1.filter(like='Regime').columns.to_list()
d = dict(zip(replace_data['Region'], replace_data['Values']))
repl_df = df1.mask(df1.eq(0)).T.fillna(df1['Region'].map(d)).T
df1.loc[df1['Description'].eq('D2'), cols] = repl_df
alternative as loop:
mask = df1['Description'].eq('D2')
d = dict(zip(replace_data['Region'], replace_data['Values']))
repl = df1['Region'].map(d)
for c in df1.filter(like='Regime'):
S = df1.loc[mask, c]
df1.loc[mask, c] = S.mask(S.eq(0)).fillna(repl)
output:
Region Description Baseline 1 Baseline N Regime 1 Regime N
0 A D1 1 9 0 9
1 A D2 2 9 6 9
2 B D1 3 9 0 9
3 B D2 4 9 7 9
4 C D1 5 9 2 9
5 C D2 6 9 3.0 9
CodePudding user response:
Try this, first create a mapping_series from replace_data, then select columns with 'Regime' using filter
, and apply a lambda function that replace
0 with NaN and fillna
with map
of mapping_series based on df1['Region'] :
import pandas as pd
data = {'Region': ['A', 'A', 'B', 'B', 'C', 'C'],
'Description': ['D1', 'D2', 'D1', 'D2', 'D1', 'D2'],
'Baseline 1':[1,2,3,4,5,6],
'Baseline 2':[6,5,4,3,2,1],
'Regime 1': [0,0,0,0,2,3],
'Regime 2': [3,2,0,0,0,0],
}
df1 = pd.DataFrame(data)
replace_data = {'Region':['A','B','C'],
'Values':[6,7,8]}
# mapping_series = pd.DataFrame(replace_data).set_index('Region')['Values']
# I actually like the way @mozway defined his mapping
# using a dataframe is overkill here just zip and dict.
mapping_series = dict(zip(replace_data['Region'], replace_data['Values']))
df1.assign(**df1.filter(like='Regime')
.apply(lambda x: x.mask(df1['Description'].eq('D2') & (x.eq(0)))
.fillna(df1['Region'].map(mapping_series))))
Output:
Region Description Baseline 1 Baseline 2 Regime 1 Regime 2
0 A D1 1 6 0.0 3.0
1 A D2 2 5 6.0 2.0
2 B D1 3 4 0.0 0.0
3 B D2 4 3 7.0 7.0
4 C D1 5 2 2.0 0.0
5 C D2 6 1 3.0 8.0