I would like to ask if it is possible to divide pandas DataFrame into groups where values from two columns are unique within the group? Name column is primary key.
input:
| name | num_1 | num_2 |
|--------|-------|--------|
| name_1 | 5 | 12 |
| name_2 | 5 | 12 |
| name_3 | 5 | 12 |
| name_4 | 7 | 14 |
| name_5 | 7 | 14 |
| name_6 | 8 | 14 |
| name_7 | 8 | 14 |
| name_8 | 9 | 13 |
| name_9 | 9 | 13 |
| name_10| 9 | 13 |
output:
| name | num_1 | num_2 | group_id |
|--------|-------|--------|----------|
| name_1 | 5 | 12 | 1 |
| name_2 | 5 | 12 | 2 |
| name_3 | 5 | 12 | 3 |
| name_4 | 7 | 14 | 1 |
| name_5 | 7 | 14 | 2 |
| name_6 | 8 | 14 | 3 |
| name_7 | 8 | 14 | 4 |
| name_8 | 9 | 13 | 1 |
| name_9 | 9 | 13 | 2 |
| name_10| 9 | 13 | 3 |
Basically the num_1 and num_2 cannot repeat in the group. Is there any way how to do it with pandas?
CodePudding user response:
Use groupby
with Groupby.cumcount
:
In [1450]: df['group_id'] = df.groupby('num_2').cumcount() 1
In [1451]: df
Out[1451]:
name num_1 num_2 group_id
0 name_1 5 12 1
1 name_2 5 12 2
2 name_3 5 12 3
3 name_4 7 14 1
4 name_5 7 14 2
5 name_6 8 14 3
6 name_7 8 14 4
7 name_8 9 13 1
8 name_9 9 13 2
9 name_10 9 13 3