So I am basically trying to group the groups that share a common element in a Pandas dataframe, my dataframe looks like this (I added the empty lines just for some extra legibility):
ID REG
15 01497
15 01493
19 01706
19 01706-A
78 05710
78 01738
143 01626
143 01634
144 01626
144 01644
207 05255
207 01638
209 05255
209 03143
I expect to create a new column where I can grop all the IDs that share a same REG number, so it would look like this:
ID REG GROUP
15 01497 1
15 01493 1
19 01706 2
19 01706-A 2
78 05710 3
78 01738 3
143 01626 4
143 01634 4
144 01626 4
144 01644 4
207 05255 5
207 01638 5
209 05255 5
209 03143 5
There are some similar questions that address similar problems like this, but they are not quite the same, (Perhaps Pandas DataFrame Groupby two columns and get counts or How do I find common values within groups in pandas? or Numbering Groups In Pandas DataFrame) but I am not aiming to perform a sum or count in the traditional way, any help with this would be appreciated, thanks in advance.
CodePudding user response:
Assuming you have a dataframe df with ID as your index, try:
index_lookup = df.index.unique()
df['GROUP'] = df.apply(lambda x: index_lookup.get_loc(x.index), axis=1)
print(df)
CodePudding user response:
One way is to use pandas.Series.duplicated
with pandas.Series.cumcum
:
df["GROUP"] = (~df["ID"].duplicated() ^ df["REG"].duplicated()).cumsum()
Output :
print(df)
ID REG GROUP
0 15 01497 1
1 15 01493 1
2 19 01706 2
3 19 01706-A 2
4 78 05710 3
5 78 01738 3
6 143 01626 4
7 143 01634 4
8 144 01626 4
9 144 01644 4
10 207 05255 5
11 207 01638 5
12 209 05255 5
13 209 03143 5