Counting groups in pandas-CodePudding

So I am basically trying to group the groups that share a common element in a Pandas dataframe, my dataframe looks like this (I added the empty lines just for some extra legibility):

ID       REG
15      01497
15      01493

19      01706
19      01706-A

78      05710
78      01738

143     01626
143     01634

144     01626
144     01644

207     05255
207     01638

209     05255
209     03143

I expect to create a new column where I can grop all the IDs that share a same REG number, so it would look like this:

ID     REG          GROUP
15     01497        1
15     01493        1

19     01706        2
19     01706-A      2

78     05710        3
78     01738        3

143    01626        4
143    01634        4

144    01626        4
144    01644        4

207    05255        5
207    01638        5

209    05255        5
209    03143        5

There are some similar questions that address similar problems like this, but they are not quite the same, (Perhaps Pandas DataFrame Groupby two columns and get counts or How do I find common values within groups in pandas? or Numbering Groups In Pandas DataFrame) but I am not aiming to perform a sum or count in the traditional way, any help with this would be appreciated, thanks in advance.

CodePudding user response：

Assuming you have a dataframe df with ID as your index, try:

index_lookup = df.index.unique()
df['GROUP'] = df.apply(lambda x: index_lookup.get_loc(x.index), axis=1)
print(df)

CodePudding user response：

One way is to use pandas.Series.duplicated with pandas.Series.cumcum :

df["GROUP"] = (~df["ID"].duplicated() ^ df["REG"].duplicated()).cumsum()

Output :

print(df)

     ID      REG  GROUP
0    15    01497      1
1    15    01493      1
2    19    01706      2
3    19  01706-A      2
4    78    05710      3
5    78    01738      3
6   143    01626      4
7   143    01634      4
8   144    01626      4
9   144    01644      4
10  207    05255      5
11  207    01638      5
12  209    05255      5
13  209    03143      5