Home > Mobile >  Replacing zeros with last non-zero value for each column on multi-index dataframe
Replacing zeros with last non-zero value for each column on multi-index dataframe

Time:06-28

I have the following multi-index dataframe:

        0   1   2   3
0   0   0   0   0   0
0   1   36  19  2   4
0   2   233 21  2   4
0   3   505 25  1   4
0   4   751 27  1   4
0   5   976 28  1   4
0   9   0   0   0   0
0   10  0   0   0   0
0   11  0   0   0   0
0   12  0   0   0   0
1   0   40  19  2   4
1   1   323 18  1   4
1   2   595 24  1   4
1   3   844 26  1   4
1   4   0   0   0   0
1   5   0   0   0   0
1   9   0   0   0   0
1   10  0   0   0   0
1   11  0   0   0   0
1   12  0   0   0   0

What is the easiest way that I can I repeat the last value more than zero until the end of the first group? The desired outcome is:

        0   1   2   3
0   0   0   0   0   0
0   1   36  19  2   4
0   2   233 21  2   4
0   3   505 25  1   4
0   4   751 27  1   4
0   5   976 28  1   4
0   9   976 28  1   4
0   10  976 28  1   4
0   11  976 28  1   4
0   12  976 28  1   4
1   0   40  19  2   4
1   1   323 18  1   4
1   2   595 24  1   4
1   3   844 26  1   4
1   4   844 26  1   4
1   5   844 26  1   4
1   9   844 26  1   4
1   10  844 26  1   4
1   11  844 26  1   4
1   12  844 26  1   4

Thanks

CodePudding user response:

You can use replace with method='ffill' for each group:

out = df.groupby(level=0).apply(lambda x: x.replace(0, method='ffill'))
print(out)

# Output
        0   1  2  3
0 0     0   0  0  0
  1    36  19  2  4
  2   233  21  2  4
  3   505  25  1  4
  4   751  27  1  4
  5   976  28  1  4
  9   976  28  1  4
  10  976  28  1  4
  11  976  28  1  4
  12  976  28  1  4
1 0    40  19  2  4
  1   323  18  1  4
  2   595  24  1  4
  3   844  26  1  4
  4   844  26  1  4
  5   844  26  1  4
  9   844  26  1  4
  10  844  26  1  4
  11  844  26  1  4
  12  844  26  1  4

CodePudding user response:

You can separate your dataframe in lists by column and perform a traversal on each one:

l = [36, 233, 505, 751, 976, 0, 0, 0, 0, 0, 40, 323, 595, 844, 0, 0, 0, 0, 0]

current = l[0]
for i in range(len(l)):
    if l[i]>= current:
        current = l[i]
    elif l[i]==0:
        l[i] = current
    else:
        current = l[i]
    
print(l)

Output:

[36, 233, 505, 751, 976, 976, 976, 976, 976, 976, 40, 323, 595, 844, 844, 844, 844, 844, 844]

CodePudding user response:

You can use a mask to select the last stretch of 0s per group (with help of GroupBy.cummax), then ffill per group to replace with the last non-zero value:

# select the last stretch of zeros per group
mask = df[::-1].groupby(level=0).cummax().eq(0)

# mask the above found values and ffill them
out = df.mask(mask).ffill(downcast='infer')

output:

        0   1  2  3
0 0     0   0  0  0
  1    36  19  2  4
  2   233  21  2  4
  3   505  25  1  4
  4   751  27  1  4
  5   976  28  1  4
  9   976  28  1  4
  10  976  28  1  4
  11  976  28  1  4
  12  976  28  1  4
1 0    40  19  2  4
  1   323  18  1  4
  2   595  24  1  4
  3   844  26  1  4
  4   844  26  1  4
  5   844  26  1  4
  9   844  26  1  4
  10  844  26  1  4
  11  844  26  1  4
  12  844  26  1  4

CodePudding user response:

Use GroupBy.ffill by first level of MultiIndex with replace 0 by missing values - it also replace 0 in middle in groups if exist:

df = df.replace(0, np.nan).groupby(level=0).ffill().fillna(0, downcast='infer')
print (df)
        0   1  2  3
0 0     0   0  0  0
  1    36  19  2  4
  2   233  21  2  4
  3   505  25  1  4
  4   751  27  1  4
  5   976  28  1  4
  9   976  28  1  4
  10  976  28  1  4
  11  976  28  1  4
  12  976  28  1  4
1 0    40  19  2  4
  1   323  18  1  4
  2   595  24  1  4
  3   844  26  1  4
  4   844  26  1  4
  5   844  26  1  4
  9   844  26  1  4
  10  844  26  1  4
  11  844  26  1  4
  12  844  26  1  4
  • Related