I have a dataframe with random repeating sequences. I need to set a flag of sorts so that every time I come across the term maintenance, I store the rows between each instance. " store everything between the two maintenance instances."
name process
name 1 maintenance
name 2 process 2
.
.
.
name maintenance
I was thinking about doing a set of logical conditions:
for i in np.arange(len(df)-1):
if df['process'][i] = 'maintenance':
df['new'] = df[i]
but, I am hoping there is a way that pandas can handle this (that I cant find) as I cant seem to figure out the stopping condition.
Thanks!
CodePudding user response:
Since you mention a repeating sequence, I assume maintenance
can arise many times but always as a multiple of two, indicating a start and end of a sequence of processes each maintenance window encapsulates.
In turn, this could be the solution:
df_new = df[((df['process']=='maintenance').astype(int).cumsum()%2==1) | (df['process']=='maintenance')]
Note that it includes the rows with maintenance
too.
CodePudding user response:
df['new'] = np.where(df['process']=='maintenance', df['name'], np.nan)
df
###
name process new
0 name1 maintenance name1
1 name2 process2 NaN
2 name3 process3 NaN
3 name4 process4 NaN
4 name5 maintenance name5
df['new2'] = df[['name','process']].astype(str).apply(' '.join, axis=1)
df
###
name process new new2
0 name1 maintenance name1 name1 maintenance
1 name2 process2 NaN name2 process2
2 name3 process3 NaN name3 process3
3 name4 process4 NaN name4 process4
4 name5 maintenance name5 name5 maintenance