Get row when value is higher than a given row value in Pandas-CodePudding

Sorry for the confusing title, I'm trying to figure out something that's a bit hard to explain.

I have a dataframe that looks like this (link to csv)

time  value  is_critical
0:00    1      false
0:01    9      true
0:02    2      false
0:03    4      false
0:04    6      true
0:05    5      false
0:06    1      false
0:07    4      false
0:08    8      true
0:09    7      false
0:10    10     false

And I want to compute another dataframe with all the critical values and the date of when the value returned or surpassed the critical value. So the new dataframe would look something like this:

time  value  return_to_critical
0:01    9          0:10
0:04    6          0:08
0:08    8          0:10

How can I do this? Thanks!

CodePudding user response：

It's a bit messy, and not very efficient but here's a solution:

In [3]: df[df["is_critical"]].apply(lambda critical_row: df["time"][(df["time"] > critical_row["time"]) & (df["value"] >= critical_row["value"])].min(), axis=1)
Out[3]:
1    0:10
4    0:08
8    0:10
dtype: object

Works by first filtering out any non-critical rows, then applying a boolean expression to each row of that result: "values in the dataframe where the value is greater than or equal to the current value, and the time is greater than the current time" where "current" refers to each row in the filtered data.

You can clean up a little:

def time_of_return_to_critical(df, critical_row):
    mask = (df.time > critical_row.time) & (df.value >= critical_row.value)
    return df["time"][mask].min()


df[df.is_critical].apply(lambda row: time_of_return_to_critical(df, row), axis=1)

Note that the .min() is a brittle solution. You should convert the "time" column to a proper datetime or timestamp data type because right now it's only "working" as a string comparator.