DataFrame -> simple event log with 3 columns.
I would like to group my DataFrame (adding the post_fix f.ex _step_1,_step_2 etc.) based on the #applicationnumber. Please see an example attached below. May you please help me to resolve this tackle respectively?
data_example = {'applicationnumber': ['XYZ104183736AA', 'XYZ104183736AA', 'XDASDHGHG54G', 'XDASDHGHG54G','XDASDHGHG54G'], 'event_name': ['verification', 'verification', 'verification', 'verification','verification'],'working_time_in_seconds': [1000,2000,30000,10000,1004]}
df_example = pd.DataFrame(data_example)
CodePudding user response:
You could combine the columns together using a groupby.cumcount()
and treating them as strings:
df['event_name'] = df['event_name'].astype(str)\
"_step_" \
df.groupby('applicationnumber').cumcount().add(1).astype(str)
prints:
applicationnumber event_name working_time_in_seconds
0 XYZ104AA verification_step_1 54365
1 XYZ104AA verification_step_2 35453
2 XDA54G verification_step_1 342
3 XDA54G verification_step_2 52
4 XDA54G verification_step_3 123
I've used this sample DF
:
>>> df.to_dict()
{'applicationnumber': {0: 'XYZ104AA',
1: 'XYZ104AA',
2: 'XDA54G',
3: 'XDA54G',
4: 'XDA54G'},
'event_name': {0: 'verification',
1: 'verification',
2: 'verification',
3: 'verification',
4: 'verification'},
'working_time_in_seconds': {0: 54365, 1: 35453, 2: 342, 3: 52, 4: 123}}
Updated:
import numpy as np
df['event_name'] = np.where(
df.event_name.str.contains('_step_'),df.event_name,\
df['event_name'].astype(str)\
"_step_" \
df.groupby('applicationnumber').cumcount().add(1).astype(str)
)