I have the following multi-index dataframe:
0 1 2 3
0 0 0 0 0 0
0 1 36 19 2 4
0 2 233 21 2 4
0 3 505 25 1 4
0 4 751 27 1 4
0 5 976 28 1 4
0 9 0 0 0 0
0 10 0 0 0 0
0 11 0 0 0 0
0 12 0 0 0 0
1 0 40 19 2 4
1 1 323 18 1 4
1 2 595 24 1 4
1 3 844 26 1 4
1 4 0 0 0 0
1 5 0 0 0 0
1 9 0 0 0 0
1 10 0 0 0 0
1 11 0 0 0 0
1 12 0 0 0 0
What is the easiest way that I can I repeat the last value more than zero until the end of the first group? The desired outcome is:
0 1 2 3
0 0 0 0 0 0
0 1 36 19 2 4
0 2 233 21 2 4
0 3 505 25 1 4
0 4 751 27 1 4
0 5 976 28 1 4
0 9 976 28 1 4
0 10 976 28 1 4
0 11 976 28 1 4
0 12 976 28 1 4
1 0 40 19 2 4
1 1 323 18 1 4
1 2 595 24 1 4
1 3 844 26 1 4
1 4 844 26 1 4
1 5 844 26 1 4
1 9 844 26 1 4
1 10 844 26 1 4
1 11 844 26 1 4
1 12 844 26 1 4
Thanks
CodePudding user response:
You can use replace
with method='ffill'
for each group:
out = df.groupby(level=0).apply(lambda x: x.replace(0, method='ffill'))
print(out)
# Output
0 1 2 3
0 0 0 0 0 0
1 36 19 2 4
2 233 21 2 4
3 505 25 1 4
4 751 27 1 4
5 976 28 1 4
9 976 28 1 4
10 976 28 1 4
11 976 28 1 4
12 976 28 1 4
1 0 40 19 2 4
1 323 18 1 4
2 595 24 1 4
3 844 26 1 4
4 844 26 1 4
5 844 26 1 4
9 844 26 1 4
10 844 26 1 4
11 844 26 1 4
12 844 26 1 4
CodePudding user response:
You can separate your dataframe in lists by column and perform a traversal on each one:
l = [36, 233, 505, 751, 976, 0, 0, 0, 0, 0, 40, 323, 595, 844, 0, 0, 0, 0, 0]
current = l[0]
for i in range(len(l)):
if l[i]>= current:
current = l[i]
elif l[i]==0:
l[i] = current
else:
current = l[i]
print(l)
Output:
[36, 233, 505, 751, 976, 976, 976, 976, 976, 976, 40, 323, 595, 844, 844, 844, 844, 844, 844]
CodePudding user response:
You can use a mask to select the last stretch of 0s per group (with help of GroupBy.cummax
), then ffill
per group to replace with the last non-zero value:
# select the last stretch of zeros per group
mask = df[::-1].groupby(level=0).cummax().eq(0)
# mask the above found values and ffill them
out = df.mask(mask).ffill(downcast='infer')
output:
0 1 2 3
0 0 0 0 0 0
1 36 19 2 4
2 233 21 2 4
3 505 25 1 4
4 751 27 1 4
5 976 28 1 4
9 976 28 1 4
10 976 28 1 4
11 976 28 1 4
12 976 28 1 4
1 0 40 19 2 4
1 323 18 1 4
2 595 24 1 4
3 844 26 1 4
4 844 26 1 4
5 844 26 1 4
9 844 26 1 4
10 844 26 1 4
11 844 26 1 4
12 844 26 1 4
CodePudding user response:
Use GroupBy.ffill
by first level of MultiIndex with replace 0 by missing values - it also replace 0 in middle in groups if exist:
df = df.replace(0, np.nan).groupby(level=0).ffill().fillna(0, downcast='infer')
print (df)
0 1 2 3
0 0 0 0 0 0
1 36 19 2 4
2 233 21 2 4
3 505 25 1 4
4 751 27 1 4
5 976 28 1 4
9 976 28 1 4
10 976 28 1 4
11 976 28 1 4
12 976 28 1 4
1 0 40 19 2 4
1 323 18 1 4
2 595 24 1 4
3 844 26 1 4
4 844 26 1 4
5 844 26 1 4
9 844 26 1 4
10 844 26 1 4
11 844 26 1 4
12 844 26 1 4