How to create a rolling unique count by group using pandas-CodePudding

I have a dataframe like the following:

group          value
1              a
1              a
1              b
1              b
1              b
1              b
1              c
2              d
2              d
2              d
2              d
2              e

I want to create a column with how many unique values there have been so far for the group. Like below:

group          value           group_value_id
    1              a           1
    1              a           1
    1              b           2
    1              b           2
    1              b           2
    1              b           2
    1              c           3
    2              d           1
    2              d           1
    2              d           1
    2              d           1
    2              e           2

CodePudding user response：

Also cab be solved as :

df['group_val_id'] = (df.groupby('group')['value'].
                      apply(lambda x:x.astype('category').cat.codes   1))

df
 
    group value  group_val_id
0       1     a             1
1       1     a             1
2       1     b             2
3       1     b             2
4       1     b             2
5       1     b             2
6       1     c             3
7       2     d             1
8       2     d             1
9       2     d             1
10      2     d             1
11      2     e             2

CodePudding user response：

Use custom lambda function with GroupBy.transform and factorize:

df['group_value_id']=df.groupby('group')['value'].transform(lambda x:pd.factorize(x)[0])   1
print (df)
    group value  group_value_id
0       1     a               1
1       1     a               1
2       1     b               2
3       1     b               2
4       1     b               2
5       1     b               2
6       1     c               3
7       2     d               1
8       2     d               1
9       2     d               1
10      2     d               1
11      2     e               2

because:

df['group_value_id'] = df.groupby('group')['value'].rank('dense')
print (df)

DataError: No numeric types to aggregate