Count consecutive row values but reset count with every 0 in row-CodePudding

within a data frame, I need to count and sum conescutive row values in column A into a new column, column B.

Starting with column A, the script would count the consecutive runs in 1s but when a 0 appears it prints the total count in column B, it then resets the count and continues through the remaining data.

Desired outcome:

I've tried using .shift() along with various if statements but have been unsuccessful.

CodePudding user response：

Here is one way to do it. However, I get the feeling that there might be better ways.. But you can try this for now:

The routine function is use to increment the counter variable until it encounters a value of 0 in the A column. At which point it grabs the total count, and then resets the counter variable.
I use a for-loop to iterate through the A column, and append the returned B values to a list
This list is then inserted into the dataframe.

df = pd.DataFrame({"A":[0,1,1,1,1,0,0,1,1,0]})

def routine(row, c):
    val = 0
    if row:
        c  = 1
    else:
        val = c
        c = 0
    return(val, c)

B_vals = []
counter = 0
for item in df['A'].values:
    b, counter = routine(item, counter)
    B_vals.append(b)

df['B'] = B_vals
print(df)

OUTPUT:

CodePudding user response：

This could be a way to do it. Probably there exists a more elegant solution.

df['B'] = df['A'].groupby(df['A'].ne(df['A'].shift()).cumsum()).cumsum().shift(fill_value=0) * (df['A'].diff() == -1)

This part df['A'].groupby(df['A'].ne(df['A'].shift()) groups the data by consecutive occurences of values. Then we take the cumsum which counts the cumulated sum along each group. Then we shift the results by 1 row because you want the count after the group. Then we mask out all the rows which are not the last row of the group 1.