Home > OS >  How to calculate with cummin() until condition is true
How to calculate with cummin() until condition is true

Time:07-16

I want to calculate the minimum value until cond column is true

Then recalculate the minimum value starting from the next row where the cond column is true

The obtained result is assigned to the expected column

input

import pandas as pd
import numpy as np
A=[16,12,21,15,18,19,13,16,10,50]
cond=[False,False,True,False,False,True,False,False,True,False]
df=pd.DataFrame({'A':A,'cond':cond})
df

the expected table

    A   cond    expected
0   16  FALSE   
1   12  FALSE   
2   21  TRUE    12
3   15  FALSE   12
4   18  FALSE   12
5   19  TRUE    15
6   13  FALSE   15
7   16  FALSE   15
8   10  TRUE    10
9   50  FALSE   10

Index 5 calculates the minimum value from index 3 to index 5

Index 8 calculates the minimum value from index 6 to index 8

CodePudding user response:

Calculate the reverse cumsum on cond to identify blocks of rows, then group the column A by these blocks and transform with min to calculate minimum value per block then mask the values and use ffill to propagate last min values in forward direction

b = df['cond'][::-1].cumsum()
df['result'] = df['A'].groupby(b).transform('min').mask(~df['cond']).ffill()

    A   cond  result
0  16  False     NaN
1  12  False     NaN
2  21   True    12.0
3  15  False    12.0
4  18  False    12.0
5  19   True    15.0
6  13  False    15.0
7  16  False    15.0
8  10   True    10.0
9  50  False    10.0

CodePudding user response:

Not super familiar with pandas, but I got it to make a python array with the expected values.

A=[16,12,21,15,18,19,13,16,10,50]
cond=[False,False,True,False,False,True,False,False,True,False]

output = []
low = max(A)   1
lowPrint = 0

for i, j in zip(A, cond):
    if i < low: low = i
    if j:
        lowPrint = low
        low = max(A)   1
    output.append(lowPrint)

As I said, I don't know much about pandas but I assume you can use this to get the values then do as you want with them later.

CodePudding user response:

You can get the desired column by getting the min ms in each range and doing a forward fill.

arr = np.array([16, 12, 21, 15, 18, 19, 13, 16, 10, 50])
c = np.array([0, 0, 1, 0, 0, 1, 0, 0, 1, 0]).astype(bool)

i = np.flatnonzero(c)
splits = np.split(arr, i 1)[:-1]
ms = [s.min() for s in splits]

arr = arr.astype(float)
arr[~c] = np.nan
arr[c] = ms

df = pd.DataFrame(arr).ffill(); df

output:

      0
0   NaN
1   NaN
2  12.0
3  12.0
4  12.0
5  15.0
6  15.0
7  15.0
8  10.0
9  10.0
  • Related