Pandas - Identify non-unique rows, grouping any pairs-CodePudding

I am trying to figure out a non-looping way to identify (auto-incrementing int would be ideal) the non-unique groups of rows (a group can contain 1 or more rows) within each TDateID, GroupID combination.

Here is an example DataFrame that looks like

Index	Cents	SD_YF	TDateID	GroupID
10	182.5	2.1	0	0
11	182.5	2.1	0	0
12	153.5	1.05	0	1
13	153.5	1.05	0	1
14	43	11	1	2
15	43	11	1	2
4	152	21	1	2
5	152	21	1	2

My ideal output would be:

Index	Cents	SD_YF	TDateID	GroupID	UniID
10	182.5	2.1	0	0	1
11	182.5	2.1	0	0	2
12	153.5	1.05	0	1	3
13	153.5	1.05	0	1	4
14	43	11	1	2	5
15	43	11	1	2	6
4	152	21	1	2	5
5	152	21	1	2	6

I have bolded #5 to draw attention to how index 14, 4 are paired together. Similar with #6. I hope that makes sense!

CodePudding user response：

IIUC you need to add the group number the cumcount per duplicate 1:

df['UniID'] = (df['GroupID']
  df.groupby('GroupID').ngroup().add(1)
  df.groupby(['GroupID', 'Cents', 'SD_YF']).cumcount()
)

output:

   Index  Cents  SD_YF  GroupID  UniID
0     10  182.5   2.10        0      1
1     11  182.5   2.10        0      2
2     12  153.5   1.05        1      3
3     13  153.5   1.05        1      4
4     14   43.0  11.00        2      5
5     15   43.0  11.00        2      6
6      4  152.0  21.00        2      5
7      5  152.0  21.00        2      6