Home > Mobile >  How to group sequence based on group column assign a groupid
How to group sequence based on group column assign a groupid

Time:05-17

Below is the dataframe I have

ColA  ColB  Time          ColC
A     B     01-01-2022    ABC
A     B     02-01-2022    ABC
A     B     07-01-2022    XYZ
A     B     11-01-2022    IJK
A     B     14-01-2022    ABC

Desired result:

ColA  ColB  Time          ColC   groupID
A     B     01-01-2022    ABC     1
A     B     02-01-2022    ABC     1
A     B     07-01-2022    XYZ     2
A     B     11-01-2022    IJK     3 
A     B     14-01-2022    ABC     4

UPDATED: Below is the code executed after cumsum

df['ColC'] = df['ColC'].ne(df['ColC'].shift(1)).groupby([df['ColA'],
                                                                                                                        df['ColB']]).cumsum()


ColA  ColB  Time          ColC   groupID
A     B     01-01-2022    ABC     1
A     B     02-01-2022    ABC     1
A     B     07-01-2022    XYZ     2
A     B     11-01-2022    XYZ     3 
A     B     14-01-2022    XYZ     4
A     B     14-01-2022    XYZ     4

Thank you in advance

CodePudding user response:

The logic is not fully clear, but it looks like you're trying to group by week number (and ColC):

df['groupID'] = (df
 .groupby([pd.to_datetime(df['Time'], dayfirst=True).dt.isocalendar().week,
           'ColC'], sort=False)
 .ngroup().add(1)
)

output:

  ColA ColB        Time ColC  groupID
0    A    B  01-01-2022  ABC        1
1    A    B  02-01-2022  ABC        1
2    A    B  07-01-2022  XYZ        2
3    A    B  11-01-2022  IJK        3
4    A    B  14-01-2022  ABC        4
  • Related