within a data frame, I need to count and sum conescutive row values in column A into a new column, column B.
Starting with column A, the script would count the consecutive runs in 1s but when a 0 appears it prints the total count in column B, it then resets the count and continues through the remaining data.
Desired outcome:
A | B
0 0
1 0
1 0
1 0
1 0
0 4
0 0
1 0
1 0
0 2
I've tried using .shift() along with various if statements but have been unsuccessful.
CodePudding user response:
Here is one way to do it. However, I get the feeling that there might be better ways.. But you can try this for now:
- The routine function is use to increment the counter variable until it encounters a value of
0
in theA
column. At which point it grabs the total count, and then resets the counter variable. - I use a
for-loop
to iterate through theA
column, and append the returnedB
values to a list - This list is then inserted into the dataframe.
df = pd.DataFrame({"A":[0,1,1,1,1,0,0,1,1,0]})
def routine(row, c):
val = 0
if row:
c = 1
else:
val = c
c = 0
return(val, c)
B_vals = []
counter = 0
for item in df['A'].values:
b, counter = routine(item, counter)
B_vals.append(b)
df['B'] = B_vals
print(df)
OUTPUT:
A B
0 0 0
1 1 0
2 1 0
3 1 0
4 1 0
5 0 4
6 0 0
7 1 0
8 1 0
9 0 2
CodePudding user response:
This could be a way to do it. Probably there exists a more elegant solution.
df['B'] = df['A'].groupby(df['A'].ne(df['A'].shift()).cumsum()).cumsum().shift(fill_value=0) * (df['A'].diff() == -1)
This part df['A'].groupby(df['A'].ne(df['A'].shift())
groups the data by consecutive occurences of values.
Then we take the cumsum which counts the cumulated sum along each group. Then we shift the results by 1 row because you want the count after the group. Then we mask out all the rows which are not the last row of the group 1.