Home > Blockchain >  Keep max value until ID and condition change in Pandas
Keep max value until ID and condition change in Pandas

Time:10-21

I have a dataframe that looks like this

id       time        value   approved
  1       0:00        10       false
  1       0:01        20       true
  1       0:02        30       true
  1       0:03        20       true
  1       0:04        40       false
  1       0:05        35       false
  1       0:06        60       false
  2       0:07        20       true
  2       0:08        30       true
  2       0:09        50       false
  2       0:10        45       false
  2       0:11        70       false
  2       0:12        62       false

and I want to create two more columns that will keep the max approved values with a tolerance of 2 secs and the time of the respective max values. So I want it to look like this

id       time        value   approved    max_approved   max_time
  1       0:00        10       false         NaN          NaN
  1       0:01        20       true          20           0:01
  1       0:02        30       true          30           0:02
  1       0:03        20       true          30           0:02
  1       0:04        40       false         40           0:04
  1       0:05        35       false         40           0:04
  1       0:06        60       false         40           0:04
  2       0:07        20       true          20           0:07
  2       0:08        30       true          30           0:08
  2       0:09        50       false         50           0:09
  2       0:10        45       false         50           0:09
  2       0:11        70       false         50           0:09

How can I do this? Thanks

CodePudding user response:

You could use iterrows to do so

max_value = 0
for index, row_data in df.iterrows():
  # your logic, e.g.
  if row_data.approved and row_data.value > max_value:
    max_value = row_data.value
  df['max_approved'].iloc(index) = max_value
  ...

Does this help to get started?

If you want a exact solution, please provide code with the DataFrame (so we don't have to parse the data out of your question. Or your code and where your problems are

CodePudding user response:

The logic or output is not fully clear, but if I guess correctly, you can try:

df[['max_approved', 'max_time']] = (df
   .assign(value=df['value'].where(df['approved']),
           last_time=lambda d: d['td'].dt.total_seconds().where(df['approved']),
          )
   .set_index('td').groupby('id')[['value', 'last_time']]
   .apply(lambda s: s.rolling('2s').max().ffill())
   .to_numpy()
)

output:

    id  time  value  approved              td  max_approved  max_time
0    1  0:00     10     False 0 days 00:00:00           NaN       NaN
1    1  0:01     20      True 0 days 00:00:01          20.0       1.0
2    1  0:02     30      True 0 days 00:00:02          30.0       2.0
3    1  0:03     20      True 0 days 00:00:03          30.0       3.0
4    1  0:04     40     False 0 days 00:00:04          20.0       3.0
5    1  0:05     35     False 0 days 00:00:05          20.0       3.0
6    1  0:06     60     False 0 days 00:00:06          20.0       3.0
7    2  0:07     20      True 0 days 00:00:07          20.0       7.0
8    2  0:08     30      True 0 days 00:00:08          30.0       8.0
9    2  0:09     50     False 0 days 00:00:09          30.0       8.0
10   2  0:10     45     False 0 days 00:00:10          30.0       8.0
11   2  0:11     70     False 0 days 00:00:11          30.0       8.0
12   2  0:12     62     False 0 days 00:00:12          30.0       8.0
  • Related