Suppose I have the following table:
id | name | date | |
---|---|---|---|
1 | Sta | [email protected] | 11.11.22 |
2 | Danny | [email protected] | 11.11.22 |
3 | Elle | [email protected] | 11.11.22 |
4 | Elle | [email protected] | 11.11.22 |
5 | Elle | [email protected] | 12.11.22 |
What is the best way to create an incremental counter for repeating observations for the feature subset [name, date]?
Desired output:
id | name | date | counter | |
---|---|---|---|---|
1 | Sta | [email protected] | 11.11.22 | 1 |
2 | Danny | [email protected] | 11.11.22 | 1 |
3 | Elle | [email protected] | 11.11.22 | 1 |
4 | Elle | [email protected] | 11.11.22 | 2 |
5 | Elle | [email protected] | 12.11.22 | 1 |
Edit: The table itself is sorted correctly and the duplicates appear after each other.
CodePudding user response:
df['counter'] = df.groupby(['name', 'date']).cumcount() 1
df
id name mail date counter
0 1 Sta [email protected] 11.11.22 1
1 2 Danny [email protected] 11.11.22 1
2 3 Elle [email protected] 11.11.22 1
3 4 Elle [email protected] 11.11.22 2
4 5 Elle [email protected] 12.11.22 1