I want to start df.cummin()
when cond
column is true until recompute df.cummin()
next cond
column is true.
The result is assigned to expected
column
input
import pandas as pd
import numpy as np
A=[17,18,21,15,18,19,22,16,30,50,]
cond=[False,True,False,False,False,True,False,False,True,False]
df=pd.DataFrame({'A':A,'cond':cond})
df
the expected table
A cond expected
0 17 FALSE
1 18 TRUE 18
2 21 FALSE 18
3 15 FALSE 15
4 18 FALSE 15
5 19 TRUE 19
6 22 FALSE 19
7 16 FALSE 16
8 30 TRUE 30
9 50 FALSE 30
CodePudding user response:
You're looking to groupby the cumsum of the cond column. Since you don't want any values up until the first True, you need to essentially delete values for group zero.
import pandas as pd
import numpy as np
A=[17,18,21,15,18,19,22,16,30,50,]
cond=[False,True,False,False,False,True,False,False,True,False]
df=pd.DataFrame({'A':A,'cond':cond})
df['expected'] = df.groupby(df.cond.cumsum())['A'].cummin()
df.loc[df.cond.cumsum().eq(0), 'expected'] = np.nan
Output
A cond expected
0 17 False NaN
1 18 True 18.0
2 21 False 18.0
3 15 False 15.0
4 18 False 15.0
5 19 True 19.0
6 22 False 19.0
7 16 False 16.0
8 30 True 30.0
9 50 False 30.0
CodePudding user response:
I couldn't think of something with a single-line fix so I made something work with a loop:
# First get a list of all the index on which to reset the cummin()
my_ix = df[df['cond']].index
return_list = []
# Loop over all chunks of indexes
for i in range(0, len(my_ix)):
ix_start = my_ix[i]
try:
ix_end = my_ix[i 1]
except IndexError:
# This happens on the last record
ix_end = None
cummin_df = df[ix_start:ix_end]['A'].cummin()
return_list.append(cummin_df)
pd.concat(return_list)
This gives the expected result, only the first record is missing as it starts with cond=False and the expected column is empty. If you like you can populate the output first with all rows before the first row with cond=True.
If you like you can condense the code but that would make it a bit less readable.