Pandas Constant Values after each Zero Value-CodePudding

Say I have the following dataframe:

I want to find a pandas vectorized function (preferably using groupby) that would replace all nonzero values with the first nonzero value in that chunk of nonzero values, i.e. something that would give me

    values   new
0      4      4
1      0      0
2      2      2
3      3      2
4      0      0
5      8      8
6      5      8
7      1      8
8      0      0
9      4      4
10     7      4

Is there a good way of achieving this?

CodePudding user response：

Make a boolean mask to select the rows having zero and its following row, then use this boolean mask with where to replace remaining values with NaN, then use forward fill to propagate the values in forward direction.

m = df['values'].eq(0)
df['new'] = df['values'].where(m | m.shift()).ffill().fillna(df['values'])

Result

print(df)

    values  new
0        4  4.0
1        0  0.0
2        2  2.0
3        3  2.0
4        0  0.0
5        8  8.0
6        5  8.0
7        1  8.0
8        0  0.0
9        4  4.0
10       7  4.0

CodePudding user response：

The following function should do the job for you. Check the comments in the function to understand the work flow of the solution.

import pandas as pd

def ffill_nonZeros(values):
    # get the values that are not equal to 0
    non_zero = values[df['values'] != 0]

    # get their indexes
    non_zero_idx = non_zero.index.to_series()

    # find where indexes are consecutive
    diff = non_zero_idx.diff()
    mask = diff == 1

    # using the mask make all places in non_zero where the change is consecutive equal None
    non_zero[mask] = None

    # fill forward (replace all None values with previous valid value)
    new_non_zero = non_zero.fillna(method='ffill')

    # put new values back in their indexs
    new = values.copy()
    new[new_non_zero.index] = new_non_zero

    return new

Now applying this function to your data:

df = pd.DataFrame([4, 0, 2, 3, 0, 8, 5, 1, 0, 4, 7], columns=['values'])
df['new'] = ffill_nonZeros(df['values'])
print(df)

Output:

    values  new
0        4    4
1        0    0
2        2    2
3        3    2
4        0    0
5        8    8
6        5    8
7        1    8
8        0    0
9        4    4
10       7    4

CodePudding user response：

get rows for zeros, and the rows immediately after:

zeros = df.index[df['values'].eq(0)]
after_zeros = zeros.union(zeros  1)

Get the rows that need to be forward filled:

replace = df.index.difference(after_zeros)
replace = replace[replace > zeros[0]]

Set values and forward fill on replace:

df['new'] = df['values']
df.loc[replace, 'new'] = np.nan
df.ffill()


values  new
0   4   4.0
1   0   0.0
2   2   2.0
3   3   2.0
4   0   0.0
5   8   8.0
6   5   8.0
7   1   8.0
8   0   0.0
9   4   4.0
10  7   4.0