I have below block of codes,
import pandas as pd
dat = (pd.DataFrame({'xx1' : [3,2,1], 'aa2' : ['qq', 'pp', 'qq'], 'xx3' : [4,5,6]})
.sort_values(by = 'xx1')
.reset_index(drop = True))
dat
for i in range(1, dat.shape[0]) :
if (dat.loc[i, 'aa2'] == 'qq') :
dat.loc[i, 'xx3'] = dat.loc[i - 1, 'xx3']
dat
I am wondering if the second block of codes i.e.
for i in range(1, dat.shape[0]) :
if (dat.loc[i, 'aa2'] == 'qq') :
dat.loc[i, 'xx3'] = dat.loc[i - 1, 'xx3']
can be implemented using chain
in continuation with the first block. Means, I am hoping to have below sort of things,
dat = (pd.DataFrame({'xx1' : [3,2,1], 'aa2' : ['qq', 'pp', 'qq'], 'xx3' : [4,5,6]})
.sort_values(by = 'xx1')
.reset_index(drop = True)
### implement the for loop here
)
Any pointer will be very helpful
CodePudding user response:
You can assign
xx3
again by mask
ing the qq
values and forward-filling it. Since the loop starts from index=1
, we start the mask from index=1
:
dat = (pd.DataFrame({'xx1' : [3,2,1], 'aa2' : ['qq', 'pp', 'qq'], 'xx3' : [4,5,6]})
.sort_values(by = 'xx1')
.reset_index(drop = True)
.assign(xx3 = lambda df: df['xx3'].mask(df['aa2'].eq('qq') & (df.index!=0)).ffill().astype(df['xx3'].dtype))
)
Output:
xx1 aa2 xx3
0 1 qq 6
1 2 pp 5
2 3 qq 5