How, in python, can I count unique values in a column for gradually increasing numbers of rows withi-CodePudding

I am working in python on a pandas data frame and am trying to count unique values of a column within groups. My problem is that I need that count to represent steadily increasing numbers of rows within the groups and I also don't want NaNs to be counted.

Simplified, the data looks like this

ID    occup  
   
1       NaN
1         A
1       NaN
1       Nan
1         B
2         K
2       NaN
2         L
2         L
2         M

The new column 'occupcount' should, within the groups defined by 'ID', count the number of unique values in 'occup' but, in the first row of each group I want the count to only consider the first row in the respective group. In the second row, I want to count over the first two rows. In the fifth row, I want the count of unique values over all five rows within each group. It should look like this:

ID    occup    occupcount
   
 1      NaN             0
 1        A             1
 1      NaN             1
 1        B             2
 1        A             2
 2        K             1
 2      NaN             1
 2        L             2
 2        K             2
 2        M             3

I tried to solve the task with something like

df['occupcount'] = (df.groupby(["ID"])['occup'].transform('nunique'))

But it only provides the total amount of unique values over all rows within each group, no gradual increase. Thanks in advance!

CodePudding user response：

Idea is chain first duplicated values by both columns with not missing values for mask and then use GroupBy.cumsum:

df['occupcount'] = ((~df.duplicated(['ID','occup']) & df['occup'].notna())
                         .groupby(df['ID'])
                         .cumsum())
print (df)
   ID occup  occupcount
0   1   NaN           0
1   1     A           1
2   1   NaN           1
3   1     B           2
4   1     A           2
5   2     K           1
6   2   NaN           1
7   2     L           2
8   2     L           2
9   2     M           3