Home > other >  Pandas - Create groups with unique values in it
Pandas - Create groups with unique values in it

Time:12-21

I would like to ask if it is possible to divide pandas DataFrame into groups where values from two columns are unique within the group? Name column is primary key.

input:

| name   | num_1 | num_2  |
|--------|-------|--------|
| name_1 |   5   |   12   |
| name_2 |   5   |   12   |
| name_3 |   5   |   12   |
| name_4 |   7   |   14   |
| name_5 |   7   |   14   |
| name_6 |   8   |   14   |
| name_7 |   8   |   14   |
| name_8 |   9   |   13   |
| name_9 |   9   |   13   |
| name_10|   9   |   13   |

output:

| name   | num_1 | num_2  | group_id |
|--------|-------|--------|----------|
| name_1 |   5   |   12   |    1     |
| name_2 |   5   |   12   |    2     |
| name_3 |   5   |   12   |    3     |
| name_4 |   7   |   14   |    1     |
| name_5 |   7   |   14   |    2     |
| name_6 |   8   |   14   |    3     |
| name_7 |   8   |   14   |    4     |
| name_8 |   9   |   13   |    1     |
| name_9 |   9   |   13   |    2     |
| name_10|   9   |   13   |    3     |

Basically the num_1 and num_2 cannot repeat in the group. Is there any way how to do it with pandas?

CodePudding user response:

Use groupby with Groupby.cumcount:

In [1450]: df['group_id'] = df.groupby('num_2').cumcount()   1

In [1451]: df
Out[1451]: 
      name  num_1  num_2  group_id
0   name_1      5     12         1
1   name_2      5     12         2
2   name_3      5     12         3
3   name_4      7     14         1
4   name_5      7     14         2
5   name_6      8     14         3
6   name_7      8     14         4
7   name_8      9     13         1
8   name_9      9     13         2
9  name_10      9     13         3
  • Related