Home > Back-end >  How to replace 0 in multiple columns and specific row with values from another dataframe
How to replace 0 in multiple columns and specific row with values from another dataframe

Time:03-23

import pandas as pd

data = {'Region': ['A', 'A', 'B', 'B', 'C', 'C'],
        'Description': ['D1', 'D2', 'D1', 'D2', 'D1', 'D2'],
        'Baseline 1':[1,2,3,4,5,6]
        :
        'Baseline N': [some numbers]
        'Regime 1': [0,0,0,0,2,3]
        :
        'Regime N': [some numbers]

        }

df1 = pd.DataFrame(data)

replace_data = {'Region':['A','B','C'],
                 'Values':[6,7,8]}

df2 = pd.DataFrame(replace_data)

I have a dataframe which has 'Region' as index and need to replace all the regions where description is 'D2' and **Regime 1 to N having 0 ** (any regime like column having 0) with the 'replace_data' dataframe value of the particular region.

So Regime 1 from above code example should have final value as [0,6,0,7,2,3] (Notice 0 replaced by 6 & 7 from the replace_data frame for region A and B). For each regime column, this need to be done. Rest of the columns (like baseline) should remain as it is.

I can run a loop for all regime like columns and replace but I feel that will be very inefficient and time consuming since I have 20 such columns and need to similar operations multiple time. Any efficient way to do this?

df.loc[((df['Description']=='D2'),regime_all_cols].replace({0:df2['Values'].values[0]})

This used to work when I had only 1 region but with multiple it doesn't work accurately.

CodePudding user response:

You can try to slice the portion of the DataFrame to replace and replace it with a filled version of the DataFrame

cols = df1.filter(like='Regime').columns.to_list()
d = dict(zip(replace_data['Region'], replace_data['Values']))

repl_df = df1.mask(df1.eq(0)).T.fillna(df1['Region'].map(d)).T

df1.loc[df1['Description'].eq('D2'), cols] = repl_df

alternative as loop:

mask = df1['Description'].eq('D2')
d = dict(zip(replace_data['Region'], replace_data['Values']))
repl = df1['Region'].map(d)

for c in df1.filter(like='Regime'):
    S = df1.loc[mask, c]
    df1.loc[mask, c] = S.mask(S.eq(0)).fillna(repl)

output:

  Region Description  Baseline 1  Baseline N Regime 1 Regime N
0      A          D1           1           9        0        9
1      A          D2           2           9        6        9
2      B          D1           3           9        0        9
3      B          D2           4           9        7        9
4      C          D1           5           9        2        9
5      C          D2           6           9      3.0        9

CodePudding user response:

Try this, first create a mapping_series from replace_data, then select columns with 'Regime' using filter, and apply a lambda function that replace 0 with NaN and fillna with map of mapping_series based on df1['Region'] :

import pandas as pd

data = {'Region': ['A', 'A', 'B', 'B', 'C', 'C'],
        'Description': ['D1', 'D2', 'D1', 'D2', 'D1', 'D2'],
        'Baseline 1':[1,2,3,4,5,6],
        'Baseline 2':[6,5,4,3,2,1],
        'Regime 1': [0,0,0,0,2,3],
        'Regime 2': [3,2,0,0,0,0],
        }

df1 = pd.DataFrame(data)

replace_data = {'Region':['A','B','C'],
                 'Values':[6,7,8]}

# mapping_series = pd.DataFrame(replace_data).set_index('Region')['Values']
# I actually like the way @mozway defined his mapping 
# using a dataframe is overkill here just zip and dict.
mapping_series = dict(zip(replace_data['Region'], replace_data['Values']))

df1.assign(**df1.filter(like='Regime')
                .apply(lambda x: x.mask(df1['Description'].eq('D2') & (x.eq(0)))
                                  .fillna(df1['Region'].map(mapping_series))))

Output:

  Region Description  Baseline 1  Baseline 2  Regime 1  Regime 2
0      A          D1           1           6       0.0       3.0
1      A          D2           2           5       6.0       2.0
2      B          D1           3           4       0.0       0.0
3      B          D2           4           3       7.0       7.0
4      C          D1           5           2       2.0       0.0
5      C          D2           6           1       3.0       8.0
  • Related