Fill column with max value of a sub-set in-between a specific value-CodePudding

lvalues = [0,0,0,0,0,0,0,0,242,222,183,149,121,102,91,84,0,0,0,0,0,0,0,0,0,230,218,209,197,162,156,144,0,0,0,0,0,0,0,0]
idx = range(0,len(lvalues))

dfSample = pd.DataFrame(lvalues, index=idx)

I have a column with several subsets in-between zeros. I would like to loop through it and use the highest value of which subset and repeat it until it reaches 0 again. For example once the loop reaches 242 repeats it until 0 starts again. Thanks in advance.

CodePudding user response：

If you want to group by consecutive 0/non-0 and get the max, use:

g = dfSample[0].eq(0).diff().fillna(False).cumsum()
dfSample.groupby(g).transform('max')

Logic: transform the series to booleans and get the diff. There will be True on each group start (except the very first item that we fill). Get the cumsum to form groups. Use the grouper to get the max per group.

If you rather want to replace by the first value of each group, a simple mask and fill should work:

dfSample.mask(dfSample[0].shift(fill_value=0).ne(0)).ffill(downcast='infer')

Logic: mask the values that are not preceded by 0, ffill the NaNs.

CodePudding user response：

Use shift to put the the zero at line with the start of the numbers, then use and to condition on to combine it with the real column. Use similar procedure for specifying the end of numbers. After that you can use cumsum to make groups and then just groupby and return the first value of each group. Use:

g =(((dfSample[0].shift()==0)&(dfSample[0]!=0))|((dfSample[0].shift(-1)==0)&(dfSample[0]!=0)).shift()).astype(int).cumsum()
dfSample.groupby(g).transform(lambda x: x.iloc[0])

Output: