I have a dataframe that looks like this
value
10
20
30
20
40
35
60
100
50
45
135
and I want to compute a column with the max value until the respective row. The final dataframe should look like this:
value max_value
10 10
20 20
30 30
20 30
40 40
35 40
60 60
100 100
50 100
45 100
135 135
I can achieve this with this iterrows
loop:
max_value = df['value'].iloc[0]
for index, row_data in df.iterrows():
if row_data['value'] > max_value:
max_value = row_data['value']
df.at[index, 'max_value'] = max_value
but I'm looking for an efficient way of computing this using just vectorized operations, ideally without having to compute the max for all the previous rows again, since max_value
will either be the previous max_value
or the value
of the respective row.
CodePudding user response:
Use DataFrame.cummax()
df['max_value'] = df['value'].cummax()
print(df)
value max_value
0 10 10
1 20 20
2 30 30
3 20 30
4 40 40
5 35 40
6 60 60
7 100 100
8 50 100
9 45 100
10 135 135