Iterate over rows and calculate values-CodePudding

I have the following pandas dataframe:

temp   stage   issue_datetime
20      1      2022/11/30 19:20
21      1      2022/11/30 19:21
20      1      None
25      1      2022/11/30 20:10
30      2      None
22      2      2022/12/01 10:00
22      2      2022/12/01 10:01
31      3      2022/12/02 11:00
32      3      2022/12/02 11:01
19      1      None
20      1      None

I want to get the following result:

temp   stage   num_issues
20      1      3
21      1      3
20      1      3
25      1      3
30      2      2
22      2      2
22      2      2
31      3      2
32      3      2
19      1      0
20      1      0

Basically, I need to calculate the number of non-None per continuous value of stage and create a new column called num_issues.

How can I do it?

CodePudding user response：

You can find the blocks of continuous value with cumsum on the diff, then groupby that and transform the non-null`

blocks = df['stage'].ne(df['stage'].shift()).cumsum()
df['num_issues'] = df['issue_datetime'].notna().groupby(blocks).transform('sum')

# or
# df['num_issues'] = df['issue_datetime'].groupby(blocks).transform('count')

Output:

    temp  stage    issue_datetime  num_issues
0     20      1  2022/11/30 19:20           3
1     21      1  2022/11/30 19:21           3
2     20      1              None           3
3     25      1  2022/11/30 20:10           3
4     30      2              None           2
5     22      2  2022/12/01 10:00           2
6     22      2  2022/12/01 10:01           2
7     31      3  2022/12/02 11:00           2
8     32      3  2022/12/02 11:01           2
9     19      1              None           0
10    20      1              None           0