Home > database >  pandas, how to fill a new column with the highest value of previous rows of an other column
pandas, how to fill a new column with the highest value of previous rows of an other column

Time:11-19

Problem to solve :

  • When a column_A of a dataframe is filled with values which are not ordered, I would like to create a new column_B filled by the last highest previous value met in column_A. I tried to use the rolling() method, but it delivers wrong numbers compared to what I expected.

Reproductive example

A dataframe, is filled by not ordered numbers:

# creation of list of non ordered values
list_of_original_values=[100, 98, 102, 107, 94, 95, 96, 92, 150]

# creation of the dataframe
df = (
    pd.DataFrame()
    .assign(original_values = list_of_original_values)
)

# vizualisation of the dataframe
df

    original_values
0   100
1   98
2   102
3   107
4   94
5   95
6   96
7   92
8   150

Next, row after row, we check what was the highest value of the first column in the previous rows, and we fill a new column with the last of highest value from first column, row after row.

Explanation :

  • first row: the highest value of first column for this row and previous rows, is 100. Then , we put 100 into the second colum.
  • second row: the highest value of first column for this row and previous rows, are 100 and 98. The, the highest value is 100. We put 100 in the second column.
  • third row: the highest value of first column for this row and previous rows, are 100, 98, 102. The, the highest value is 102. We put 102 in the second column.
  • and so on ...

Then it should give :

# expected values should be this list:
list_of_expected_values=[100, 100, 102, 107, 107, 107, 107, 107, 150]

At the end, the expected dataframe should be is one:

# expected dataframe
df = (
    df.assign(expected_values = list_of_expected_values)
)

df
    original_values     expected_values
0   100                 100
1   98                  100
2   102                 102
3   107                 107
4   94                  107
5   95                  107
6   96                  107
7   92                  107
8   150                 150

I've tried to use pandas rolling() method, but the way I used it, goes to a wrong result regarding what was expected :

# trying to use rolling() and max()
df=df.assign(try_1_with_rolling = lambda df: df['original_values'].rolling(2).max())
# dataframe vizualisation :
df
    original_values     expected_values     try_1_with_rolling
0   100                 100                 NaN
1   98                  100                 100.0
2   102                 102                 102.0
3   107                 107                 107.0
4   94                  107                 107.0
5   95                  107                 95.0
6   96                  107                 96.0
7   92                  107                 96.0
8   150                 150                 150.0

the context of this problem: From censors, we receive data, but time is not ordered: there are some batches. We'd like to detect those batches which are rows coming with date lower the the last higher date met before.

CodePudding user response:

I believe what you are looking for is cummax

df['expected_values'] = df['original_values'].cummax()
  • Related