Home > Enterprise >  Compute sequential max value in Pandas using vectorized operations
Compute sequential max value in Pandas using vectorized operations

Time:10-24

I have a dataframe that looks like this

value
 10
 20
 30
 20
 40
 35
 60
 100
 50
 45
 135

and I want to compute a column with the max value until the respective row. The final dataframe should look like this:

value   max_value
 10        10
 20        20
 30        30
 20        30
 40        40
 35        40
 60        60
 100       100
 50        100
 45        100
 135       135

I can achieve this with this iterrows loop:

max_value = df['value'].iloc[0]
for index, row_data in df.iterrows():
    if row_data['value'] > max_value:
        max_value = row_data['value']
    df.at[index, 'max_value'] = max_value

but I'm looking for an efficient way of computing this using just vectorized operations, ideally without having to compute the max for all the previous rows again, since max_value will either be the previous max_value or the value of the respective row.

CodePudding user response:

Use DataFrame.cummax()

df['max_value'] = df['value'].cummax()
print(df)

    value  max_value
0      10         10
1      20         20
2      30         30
3      20         30
4      40         40
5      35         40
6      60         60
7     100        100
8      50        100
9      45        100
10    135        135
  • Related