Home > other >  Pandas - Identify non-unique rows, grouping any pairs
Pandas - Identify non-unique rows, grouping any pairs

Time:07-08

I am trying to figure out a non-looping way to identify (auto-incrementing int would be ideal) the non-unique groups of rows (a group can contain 1 or more rows) within each TDateID, GroupID combination.

Here is an example DataFrame that looks like

Index Cents SD_YF TDateID GroupID
10 182.5 2.1 0 0
11 182.5 2.1 0 0
12 153.5 1.05 0 1
13 153.5 1.05 0 1
14 43 11 1 2
15 43 11 1 2
4 152 21 1 2
5 152 21 1 2

My ideal output would be:

Index Cents SD_YF TDateID GroupID UniID
10 182.5 2.1 0 0 1
11 182.5 2.1 0 0 2
12 153.5 1.05 0 1 3
13 153.5 1.05 0 1 4
14 43 11 1 2 5
15 43 11 1 2 6
4 152 21 1 2 5
5 152 21 1 2 6

I have bolded #5 to draw attention to how index 14, 4 are paired together. Similar with #6. I hope that makes sense!

CodePudding user response:

IIUC you need to add the group number the cumcount per duplicate 1:

df['UniID'] = (df['GroupID']
  df.groupby('GroupID').ngroup().add(1)
  df.groupby(['GroupID', 'Cents', 'SD_YF']).cumcount()
)

output:

   Index  Cents  SD_YF  GroupID  UniID
0     10  182.5   2.10        0      1
1     11  182.5   2.10        0      2
2     12  153.5   1.05        1      3
3     13  153.5   1.05        1      4
4     14   43.0  11.00        2      5
5     15   43.0  11.00        2      6
6      4  152.0  21.00        2      5
7      5  152.0  21.00        2      6
  • Related