Home > front end >  Add incremental counter for repeating feature subsets in pandas
Add incremental counter for repeating feature subsets in pandas

Time:05-18

Suppose I have the following table:

id name mail date
1 Sta [email protected] 11.11.22
2 Danny [email protected] 11.11.22
3 Elle [email protected] 11.11.22
4 Elle [email protected] 11.11.22
5 Elle [email protected] 12.11.22

What is the best way to create an incremental counter for repeating observations for the feature subset [name, date]?

Desired output:

id name mail date counter
1 Sta [email protected] 11.11.22 1
2 Danny [email protected] 11.11.22 1
3 Elle [email protected] 11.11.22 1
4 Elle [email protected] 11.11.22 2
5 Elle [email protected] 12.11.22 1

Edit: The table itself is sorted correctly and the duplicates appear after each other.

CodePudding user response:

df['counter'] = df.groupby(['name', 'date']).cumcount()   1 
df
   id   name                   mail        date  counter
0  1    Sta         [email protected]   11.11.22         1
1  2   Danny       [email protected]   11.11.22         1
2  3   Elle        [email protected]   11.11.22         1
3  4   Elle   [email protected]   11.11.22         2
4  5   Elle        [email protected]   12.11.22         1
  • Related