pandas, how to fill a new column with the highest value of previous rows of an other column-CodePudding

Problem to solve :

When a column_A of a dataframe is filled with values which are not ordered, I would like to create a new column_B filled by the last highest previous value met in column_A. I tried to use the rolling() method, but it delivers wrong numbers compared to what I expected.

Reproductive example

A dataframe, is filled by not ordered numbers:

# creation of list of non ordered values
list_of_original_values=[100, 98, 102, 107, 94, 95, 96, 92, 150]

# creation of the dataframe
df = (
    pd.DataFrame()
    .assign(original_values = list_of_original_values)
)

# vizualisation of the dataframe
df

    original_values
0   100
1   98
2   102
3   107
4   94
5   95
6   96
7   92
8   150

Next, row after row, we check what was the highest value of the first column in the previous rows, and we fill a new column with the last of highest value from first column, row after row.

Explanation :

first row: the highest value of first column for this row and previous rows, is 100. Then , we put 100 into the second colum.
second row: the highest value of first column for this row and previous rows, are 100 and 98. The, the highest value is 100. We put 100 in the second column.
third row: the highest value of first column for this row and previous rows, are 100, 98, 102. The, the highest value is 102. We put 102 in the second column.
and so on ...

Then it should give :

# expected values should be this list:
list_of_expected_values=[100, 100, 102, 107, 107, 107, 107, 107, 150]

At the end, the expected dataframe should be is one:

# expected dataframe
df = (
    df.assign(expected_values = list_of_expected_values)
)

df
    original_values     expected_values
0   100                 100
1   98                  100
2   102                 102
3   107                 107
4   94                  107
5   95                  107
6   96                  107
7   92                  107
8   150                 150

I've tried to use pandas rolling() method, but the way I used it, goes to a wrong result regarding what was expected :

# trying to use rolling() and max()
df=df.assign(try_1_with_rolling = lambda df: df['original_values'].rolling(2).max())
# dataframe vizualisation :
df
    original_values     expected_values     try_1_with_rolling
0   100                 100                 NaN
1   98                  100                 100.0
2   102                 102                 102.0
3   107                 107                 107.0
4   94                  107                 107.0
5   95                  107                 95.0
6   96                  107                 96.0
7   92                  107                 96.0
8   150                 150                 150.0

the context of this problem: From censors, we receive data, but time is not ordered: there are some batches. We'd like to detect those batches which are rows coming with date lower the the last higher date met before.

CodePudding user response：

I believe what you are looking for is cummax

df['expected_values'] = df['original_values'].cummax()