Print Out Dataframe Rows Between Two values-CodePudding

I have a dataframe with random repeating sequences. I need to set a flag of sorts so that every time I come across the term maintenance, I store the rows between each instance. " store everything between the two maintenance instances."

name     process
name 1   maintenance 
name 2   process 2
.
.
.
name     maintenance

I was thinking about doing a set of logical conditions:

for i in np.arange(len(df)-1):
  if df['process'][i] = 'maintenance':
     df['new'] = df[i]

but, I am hoping there is a way that pandas can handle this (that I cant find) as I cant seem to figure out the stopping condition.

Thanks!

CodePudding user response：

Since you mention a repeating sequence, I assume maintenance can arise many times but always as a multiple of two, indicating a start and end of a sequence of processes each maintenance window encapsulates.

In turn, this could be the solution:

df_new = df[((df['process']=='maintenance').astype(int).cumsum()%2==1) | (df['process']=='maintenance')]

Note that it includes the rows with maintenance too.

CodePudding user response：

df['new'] = np.where(df['process']=='maintenance', df['name'], np.nan)
df
###
    name      process    new
0  name1  maintenance  name1
1  name2     process2    NaN
2  name3     process3    NaN
3  name4     process4    NaN
4  name5  maintenance  name5

df['new2'] = df[['name','process']].astype(str).apply(' '.join, axis=1)
df
###
    name      process    new               new2
0  name1  maintenance  name1  name1 maintenance
1  name2     process2    NaN     name2 process2
2  name3     process3    NaN     name3 process3
3  name4     process4    NaN     name4 process4
4  name5  maintenance  name5  name5 maintenance