Setup:
df_input = pd.DataFrame(data={'Session': [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2],
'Action': ['Start', 'Action1', 'Action2', 'Action3', 'Pause', 'Action4',
'Action5', 'Resume', 'Finish', 'Start', 'Action1', 'Action2', 'Pause',
'Action3', 'Action4', 'Resume', 'Finish', 'Start', 'Action1',
'Action2', 'Finish']})
df_output = pd.DataFrame(data={'Session': [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2],
'Action': ['Start', 'Action1', 'Action2', 'Action3', 'Pause', 'Action4', 'Action5',
'Resume', 'Finish', 'Start', 'Action1', 'Action2', 'Pause', 'Action3',
'Action4', 'Resume', 'Finish', 'Start', 'Action1', 'Action2', 'Finish'],
'Label': [0, 1, 2, 3, 4, 4, 4, 4, 5, 0, 1, 2, 3, 3, 3, 3, 4, 0, 1, 2, 3]})
Input:
Session Action
0 0 Start
1 0 Action1
2 0 Action2
3 0 Action3
4 0 Pause
5 0 Action4
6 0 Action5
7 0 Resume
8 0 Finish
9 1 Start
10 1 Action1
11 1 Action2
12 1 Pause
13 1 Action3
14 1 Action4
15 1 Resume
16 1 Finish
17 2 Start
18 2 Action1
19 2 Action2
20 2 Finish
Output:
Session Action Label
0 0 Start 0
1 0 Action1 1
2 0 Action2 2
3 0 Action3 3
4 0 Pause 4 #Same
5 0 Action4 4 #Same
6 0 Action5 4 #Same
7 0 Resume 4 #Same
8 0 Finish 5
9 1 Start 0
10 1 Action1 1
11 1 Action2 2
12 1 Pause 3 #Same
13 1 Action3 3 #Same
14 1 Action4 3 #Same
15 1 Resume 3 #Same
16 1 Finish 4
17 2 Start 0
18 2 Action1 1
19 2 Action2 2
20 2 Finish 3
Labeling Rule:
In every session, label remains same from pause until resumed.
I had been trying for so long but was unable to make labels same from pause till resumed
CodePudding user response:
This only requires some logic to be written, refer -
label_list, label, paused = [], 0, 0
for index, row in df_input.iterrows():
#print(index, row['Action'])
if paused and not row['Action'] == 'Resume':
label_list.append(label)
elif row['Action'] == 'Resume':
label_list.append(label)
label =1
paused=0
else:
if row['Action'] == 'Pause':
#label =1
label_list.append(label)
paused = 1
elif row['Action'] == 'Finish':
#label =1
label_list.append(label)
label=0
else:
label_list.append(label)
label =1
Output-
Session Action Label
0 0 Start 0
1 0 Action1 1
2 0 Action2 2
3 0 Action3 3
4 0 Pause 4
5 0 Action4 4
6 0 Action5 4
7 0 Resume 4
8 0 Finish 5
9 1 Start 0
10 1 Action1 1
11 1 Action2 2
12 1 Pause 3
13 1 Action3 3
14 1 Action4 3
15 1 Resume 3
16 1 Finish 4
17 2 Start 0
18 2 Action1 1
19 2 Action2 2
20 2 Finish 3
CodePudding user response:
We can try creating a mask to determine exactly where
values are paused, then we can mask
out the Paused rows before taking the total within group:
m = (
df_input['Action']
.where(df_input['Action'].isin(['Pause', 'Resume']))
.groupby(df_input['Session']).shift()
.ffill().eq('Pause')
)
df_input['Label'] = (
df_input['Action'].mask(m).notna()
.groupby(df_input['Session']).cumsum() - 1
)
Session Action Label
0 0 Start 0
1 0 Action1 1
2 0 Action2 2
3 0 Action3 3
4 0 Pause 4
5 0 Action4 4
6 0 Action5 4
7 0 Resume 4
8 0 Finish 5
9 1 Start 0
10 1 Action1 1
11 1 Action2 2
12 1 Pause 3
13 1 Action3 3
14 1 Action4 3
15 1 Resume 3
16 1 Finish 4
17 2 Start 0
18 2 Action1 1
19 2 Action2 2
20 2 Finish 3
Breakdown of steps in a DataFrame for reference:
pd.DataFrame({
1: df_input['Action']
.where(df_input['Action'].isin(['Pause', 'Resume'])),
2: df_input['Action']
.where(df_input['Action'].isin(['Pause', 'Resume']))
.groupby(df_input['Session']).shift(),
3: df_input['Action']
.where(df_input['Action'].isin(['Pause', 'Resume']))
.groupby(df_input['Session']).shift()
.ffill(),
4: df_input['Action']
.where(df_input['Action'].isin(['Pause', 'Resume']))
.groupby(df_input['Session']).shift()
.ffill().eq('Pause'),
5: df_input['Action'].mask(m),
6: df_input['Action'].mask(m).notna(),
7: df_input['Action'].mask(m).notna()
.groupby(df_input['Session']).cumsum() - 1
}).rename_axis(columns='Steps')
Steps 1 2 3 4 5 6 7
0 NaN NaN NaN False Start True 0
1 NaN NaN NaN False Action1 True 1
2 NaN NaN NaN False Action2 True 2
3 NaN NaN NaN False Action3 True 3
4 Pause NaN NaN False Pause True 4
5 NaN Pause Pause True NaN False 4
6 NaN NaN Pause True NaN False 4
7 Resume NaN Pause True NaN False 4
8 NaN Resume Resume False Finish True 5
9 NaN NaN Resume False Start True 0
10 NaN NaN Resume False Action1 True 1
11 NaN NaN Resume False Action2 True 2
12 Pause NaN Resume False Pause True 3
13 NaN Pause Pause True NaN False 3
14 NaN NaN Pause True NaN False 3
15 Resume NaN Pause True NaN False 3
16 NaN Resume Resume False Finish True 4
17 NaN NaN Resume False Start True 0
18 NaN NaN Resume False Action1 True 1
19 NaN NaN Resume False Action2 True 2
20 NaN NaN Resume False Finish True 3
Docs for Methods Used: