Home > database >  How, in python, can I count unique values in a column for gradually increasing numbers of rows withi
How, in python, can I count unique values in a column for gradually increasing numbers of rows withi

Time:11-10

I am working in python on a pandas data frame and am trying to count unique values of a column within groups. My problem is that I need that count to represent steadily increasing numbers of rows within the groups and I also don't want NaNs to be counted.

Simplified, the data looks like this

ID    occup  
   
1       NaN
1         A
1       NaN
1       Nan
1         B
2         K
2       NaN
2         L
2         L
2         M 

The new column 'occupcount' should, within the groups defined by 'ID', count the number of unique values in 'occup' but, in the first row of each group I want the count to only consider the first row in the respective group. In the second row, I want to count over the first two rows. In the fifth row, I want the count of unique values over all five rows within each group. It should look like this:

ID    occup    occupcount
   
 1      NaN             0
 1        A             1
 1      NaN             1
 1        B             2
 1        A             2
 2        K             1
 2      NaN             1
 2        L             2
 2        K             2
 2        M             3 

I tried to solve the task with something like

df['occupcount'] = (df.groupby(["ID"])['occup'].transform('nunique'))

But it only provides the total amount of unique values over all rows within each group, no gradual increase. Thanks in advance!

CodePudding user response:

Idea is chain first duplicated values by both columns with not missing values for mask and then use GroupBy.cumsum:

df['occupcount'] = ((~df.duplicated(['ID','occup']) & df['occup'].notna())
                         .groupby(df['ID'])
                         .cumsum())
print (df)
   ID occup  occupcount
0   1   NaN           0
1   1     A           1
2   1   NaN           1
3   1     B           2
4   1     A           2
5   2     K           1
6   2   NaN           1
7   2     L           2
8   2     L           2
9   2     M           3
  • Related