lvalues = [0,0,0,0,0,0,0,0,242,222,183,149,121,102,91,84,0,0,0,0,0,0,0,0,0,230,218,209,197,162,156,144,0,0,0,0,0,0,0,0]
idx = range(0,len(lvalues))
dfSample = pd.DataFrame(lvalues, index=idx)
I have a column with several subsets in-between zeros. I would like to loop through it and use the highest value of which subset and repeat it until it reaches 0 again. For example once the loop reaches 242 repeats it until 0 starts again. Thanks in advance.
CodePudding user response:
If you want to group by consecutive 0/non-0 and get the max, use:
g = dfSample[0].eq(0).diff().fillna(False).cumsum()
dfSample.groupby(g).transform('max')
Logic: transform the series to booleans and get the diff. There will be True on each group start (except the very first item that we fill). Get the cumsum
to form groups. Use the grouper to get the max per group.
If you rather want to replace by the first value of each group, a simple mask and fill should work:
dfSample.mask(dfSample[0].shift(fill_value=0).ne(0)).ffill(downcast='infer')
Logic: mask the values that are not preceded by 0, ffill
the NaNs.
CodePudding user response:
Use shift
to put the the zero at line with the start of the numbers, then use and
to condition on to combine it with the real column. Use similar procedure for specifying the end of numbers. After that you can use cumsum
to make groups and then just groupby
and return the first value of each group
. Use:
g =(((dfSample[0].shift()==0)&(dfSample[0]!=0))|((dfSample[0].shift(-1)==0)&(dfSample[0]!=0)).shift()).astype(int).cumsum()
dfSample.groupby(g).transform(lambda x: x.iloc[0])
Output:
0
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 242
9 242
10 242
11 242
12 242
13 242
14 242
15 242
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25 230
26 230
27 230
28 230
29 230
30 230
31 230
32 0
33 0
34 0
35 0
36 0
37 0
38 0
39 0