Fill pandas column forward iteratively, but without using iteration?-CodePudding

I have a pandas data frame with a column where a condition is met based on other elements in the data frame (not shown). Additionally, I have a column that extends the validness one row further with the following rule:

If a valid row is followed directly by ExtendsValid, that row is also valid, even if the underlying valid condition doesnt apply. Continue filling valid forward as long as ExtendsValid is 1

I have illustrated the result in column "FinalValid" (desired result. Doesnt have to be a new column, can also fill Valid forward). Note that rows 8 and 9 in the example also become valid. Also note that row 13 does NOT result in FinalValid, because you need a preceding valid row. Preceding valid row can be Valid or an extended valid row.

So far if I had a problem like that I did a cumbersome multi-step process:

Create a new column for when "Valid" or "ExtendsValid" is true
Create a new column marking the start point for each "sub-series" (a consecutive set of ones)
Number each sub-series
fillna using "group by" for each sub series

I can provide sample code but I am really looking for a totally different, more efficient approach, which of course must be non-iterating as well.

Any ideas would be welcome.

#	Valid	ExtendsValid	FinalValid
1	0	0	0
2	1	0	1
3	0	0	0
4	0	0	0
5	1	0	1
6	0	0	0
7	1	0	1
8	0	1	1
9	0	1	1
10	0	0	0
11	1	0	1
12	0	0	0
13	0	1	0
14	0	0	0

CodePudding user response：

IIUC, you want to ffill the 1s only if there is an uninterrupted series of 1s starting on Valid and eventually continuing on ExtendsValid.

For this you can use a groupby.cummin:

df['FinalValid'] = (
 (df['Valid']|df['ExtendsValid'])
 .groupby(df['Valid'].cumsum())
 .cummin()
 )

Output:

NB. I slightly modified the input on row 3 to better illustrate the logic.

     #  Valid  ExtendsValid  FinalValid
0    1      0             0           0
1    2      1             0           1
2    3      0             0           0
3    4      0             1           0
4    5      1             0           1
5    6      0             0           0
6    7      1             0           1
7    8      0             1           1
8    9      0             1           1
9   10      0             0           0
10  11      1             0           1
11  12      0             0           0

CodePudding user response：

Try:

state, data = 0, []
for v, e in zip(df.Valid, df.ExtendsValid):
    state = v or (not v and e and state)
    data.append(int(state))

df["FinalValid_new"] = data
print(df)

Prints:

     #  Valid  ExtendsValid  FinalValid  FinalValid_new
0    1      0             0           0               0
1    2      1             0           1               1
2    3      0             0           0               0
3    4      0             0           0               0
4    5      1             0           1               1
5    6      0             0           0               0
6    7      1             0           1               1
7    8      0             1           1               1
8    9      0             1           1               1
9   10      0             0           0               0
10  11      1             0           1               1
11  12      0             0           0               0
12  13      0             1           0               0
13  14      0             0           0               0

Note: For more performance I recommend to look at numba package.

EDIT: Example using numba:

from numba import njit


@njit
def compute_valid(valid, extends_valid, out):
    state = 0
    for i, (v, e) in enumerate(zip(valid, extends_valid)):
        state = v or (not v and e and state)
        out[i] = int(state)


df["FinalValid_new"] = 0
compute_valid(
    df.Valid.to_numpy(), df.ExtendsValid.to_numpy(), df.FinalValid_new.to_numpy()
)
print(df)

#	Valid	ExtendsValid	FinalValid
1	0	0	0
2	1	0	1
3	0	0	0
4	0	0	0
5	1	0	1
6	0	0	0
7	1	0	1
8	0	1	1
9	0	1	1
10	0	0	0
11	1	0	1
12	0	0	0
13	0	1	0
14	0	0	0

#	Valid	ExtendsValid	FinalValid
1	0	0	0
2	1	0	1
3	0	0	0
4	0	0	0
5	1	0	1
6	0	0	0
7	1	0	1
8	0	1	1
9	0	1	1
10	0	0	0
11	1	0	1
12	0	0	0
13	0	1	0
14	0	0	0

#	Valid	ExtendsValid	FinalValid
1	0	0	0
2	1	0	1
3	0	0	0
4	0	0	0
5	1	0	1
6	0	0	0
7	1	0	1
8	0	1	1
9	0	1	1
10	0	0	0
11	1	0	1
12	0	0	0
13	0	1	0
14	0	0	0