Home > Net >  Implementing a loop calculation on pandas rows based on chain
Implementing a loop calculation on pandas rows based on chain

Time:05-09

I have below block of codes,

import pandas as pd
dat = (pd.DataFrame({'xx1' : [3,2,1], 'aa2' : ['qq', 'pp', 'qq'], 'xx3' : [4,5,6]})
        .sort_values(by = 'xx1')
        .reset_index(drop = True))
dat
for i in range(1, dat.shape[0]) :
    if (dat.loc[i, 'aa2'] == 'qq') :
        dat.loc[i, 'xx3'] = dat.loc[i - 1, 'xx3']

dat

I am wondering if the second block of codes i.e.

for i in range(1, dat.shape[0]) :
    if (dat.loc[i, 'aa2'] == 'qq') :
        dat.loc[i, 'xx3'] = dat.loc[i - 1, 'xx3']

can be implemented using chain in continuation with the first block. Means, I am hoping to have below sort of things,

dat = (pd.DataFrame({'xx1' : [3,2,1], 'aa2' : ['qq', 'pp', 'qq'], 'xx3' : [4,5,6]})
        .sort_values(by = 'xx1')
        .reset_index(drop = True)
        ### implement the for loop here
     )

Any pointer will be very helpful

CodePudding user response:

You can assign xx3 again by masking the qq values and forward-filling it. Since the loop starts from index=1, we start the mask from index=1:

dat = (pd.DataFrame({'xx1' : [3,2,1], 'aa2' : ['qq', 'pp', 'qq'], 'xx3' : [4,5,6]})
        .sort_values(by = 'xx1')
        .reset_index(drop = True)
        .assign(xx3 = lambda df: df['xx3'].mask(df['aa2'].eq('qq') & (df.index!=0)).ffill().astype(df['xx3'].dtype))
      )

Output:

   xx1 aa2  xx3
0    1  qq    6
1    2  pp    5
2    3  qq    5
  • Related