I am looking to create a new column 'isRunning' in the data frame by identifying the 'Task' column status. Also, I have a 'Session' column and if a Session only has START value in the Task and no END vaule (for example, Session 3 has only START val), so 'isRunning' should have TRUE flag untill the end of that session.
Session | Task | isRunning | |
---|---|---|---|
0 | 1 | START | TRUE |
1 | 1 | TRUE | |
2 | 1 | TRUE | |
3 | 1 | TRUE | |
4 | 1 | END | TRUE |
5 | 1 | FALSE | |
6 | 2 | FALSE | |
7 | 2 | START | TRUE |
8 | 2 | TRUE | |
9 | 2 | END | TRUE |
10 | 2 | FALSE | |
11 | 2 | FALSE | |
12 | 3 | FALSE | |
13 | 3 | START | TRUE |
14 | 3 | TRUE | |
15 | 3 | TRUE | |
16 | 4 | FALSE | |
17 | 4 | START | TRUE |
18 | 4 | TRUE | |
19 | 4 | TRUE | |
20 | 4 | END | TRUE |
Not sure how to move forward with adding an extra condition and identifying if the Task is running.
import pandas as pd
d = {'Session':[1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4],
'Task':['START', '', '', '', 'END', '', '', 'START', '', 'END', '', '', '', 'START', '', '', '', 'START', '', '', 'END']}
df = pd.DataFrame(data=d)
CodePudding user response:
Assuming one start/end per group. You can use a double mask:
m1 = df['Task'].eq('START').groupby(df['Session]').cummax()
m2 = df.loc[::-1, 'Task'].eq('END').groupby(df['Session']).cummax()
df['IsRunning'] = m1&m2
If several start/end per session are possible:
df['IsRunning'] = (df['Task'].map({'START': True, 'END': False})
.groupby(df['Session']).ffill()
.fillna(False)
) | df['Task'].eq('END')
Output:
Session Task IsRunning
0 1 START True
1 1 True
2 1 True
3 1 True
4 1 END True
5 1 False
6 2 False
7 2 START True
8 2 True
9 2 END True
10 2 False
11 2 False
12 3 False
13 3 START True
14 3 True
15 3 True
16 4 False
17 4 START True
18 4 True
19 4 True
20 4 END True