How to assign a postfix for column values based on ID

DataFrame -> simple event log with 3 columns.

I would like to group my DataFrame (adding the post_fix f.ex _step_1,_step_2 etc.) based on the #applicationnumber. Please see an example attached below. May you please help me to resolve this tackle respectively?

data_example = {'applicationnumber': ['XYZ104183736AA', 'XYZ104183736AA', 'XDASDHGHG54G', 'XDASDHGHG54G','XDASDHGHG54G'], 'event_name': ['verification', 'verification', 'verification', 'verification','verification'],'working_time_in_seconds': [1000,2000,30000,10000,1004]}
df_example = pd.DataFrame(data_example)

Many thanks in advance!

CodePudding user response：

You could combine the columns together using a groupby.cumcount() and treating them as strings:

df['event_name'] = df['event_name'].astype(str)\
                     "_step_" \
                     df.groupby('applicationnumber').cumcount().add(1).astype(str)

prints:

  applicationnumber           event_name  working_time_in_seconds
0          XYZ104AA  verification_step_1                    54365
1          XYZ104AA  verification_step_2                    35453
2            XDA54G  verification_step_1                      342
3            XDA54G  verification_step_2                       52
4            XDA54G  verification_step_3                      123

I've used this sample DF:

>>> df.to_dict()

{'applicationnumber': {0: 'XYZ104AA',
  1: 'XYZ104AA',
  2: 'XDA54G',
  3: 'XDA54G',
  4: 'XDA54G'},
 'event_name': {0: 'verification',
  1: 'verification',
  2: 'verification',
  3: 'verification',
  4: 'verification'},
 'working_time_in_seconds': {0: 54365, 1: 35453, 2: 342, 3: 52, 4: 123}}

Updated:

import numpy as np

df['event_name'] = np.where(
    df.event_name.str.contains('_step_'),df.event_name,\
        df['event_name'].astype(str)\
              "_step_" \
                   df.groupby('applicationnumber').cumcount().add(1).astype(str)
        )