How to calculate df.cummin() piecewise-CodePudding

I want to start df.cummin() when cond column is true until recompute df.cummin() next cond column is true.

The result is assigned to expected column

input

import pandas as pd
import numpy as np
A=[17,18,21,15,18,19,22,16,30,50,]
cond=[False,True,False,False,False,True,False,False,True,False]
df=pd.DataFrame({'A':A,'cond':cond})
df

the expected table

    A   cond    expected
0   17  FALSE   
1   18  TRUE    18
2   21  FALSE   18
3   15  FALSE   15
4   18  FALSE   15
5   19  TRUE    19
6   22  FALSE   19
7   16  FALSE   16
8   30  TRUE    30
9   50  FALSE   30

CodePudding user response：

You're looking to groupby the cumsum of the cond column. Since you don't want any values up until the first True, you need to essentially delete values for group zero.

import pandas as pd
import numpy as np
A=[17,18,21,15,18,19,22,16,30,50,]
cond=[False,True,False,False,False,True,False,False,True,False]
df=pd.DataFrame({'A':A,'cond':cond})


df['expected'] = df.groupby(df.cond.cumsum())['A'].cummin()
df.loc[df.cond.cumsum().eq(0), 'expected'] = np.nan

Output

    A   cond  expected
0  17  False       NaN
1  18   True      18.0
2  21  False      18.0
3  15  False      15.0
4  18  False      15.0
5  19   True      19.0
6  22  False      19.0
7  16  False      16.0
8  30   True      30.0
9  50  False      30.0

CodePudding user response：

I couldn't think of something with a single-line fix so I made something work with a loop:

# First get a list of all the index on which to reset the cummin()
my_ix = df[df['cond']].index
return_list = []
# Loop over all chunks of indexes
for i in range(0, len(my_ix)):
    ix_start = my_ix[i]
    try:
        ix_end = my_ix[i 1]
    except IndexError:
        # This happens on the last record
        ix_end = None

    cummin_df = df[ix_start:ix_end]['A'].cummin()
    return_list.append(cummin_df)
pd.concat(return_list)

This gives the expected result, only the first record is missing as it starts with cond=False and the expected column is empty. If you like you can populate the output first with all rows before the first row with cond=True.

If you like you can condense the code but that would make it a bit less readable.