Home > Net >  Delete rows above certain value once number is reached
Delete rows above certain value once number is reached

Time:09-16

I have a large dataset where I am interested in the part where it shuts down and when it is shut down. However, the data also includes data of the startup which I want to filter out.

The data goes down to <0.2, stays there for a while and then goes up again >0.2. I want to delete the part where it has been <0.2 before and is going up to >0.2.

I have used a standard filter, but since I am still interested in the first part this does not seem to work. Just looking at the derivative is also not an option since the value can go up and down in the beginning as well, the only difference with the latter part is that it has been <0.2 before.

How can I do this?

import pandas as pd

data = {
  "Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19", "2020-06-07 00:20", "2020-06-07 00:21", "2020-06-07 00:22", "2020-06-07 00:23", "2020-06-07 00:24", "2020-06-07 00:25", "2020-06-07 00:26", "2020-06-07 00:27", "2020-06-07 00:28", "2020-06-07 00:29"],
  "Value": [16.2, 15.1, 13.8, 12.0, 11.9, 12.1, 10.8, 9.8, 8.3, 6.2, 4.3, 4.2, 4.2, 3.3, 1.8, 0.1, 0.05, 0.15, 0.1, 0.18, 0.25, 1, 4, 8, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0],
}

df = pd.DataFrame(data)

Required output:

data = {
  "Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19"],
  "Value": [16.2, 15.1, 13.8, 12.0, 11.9, 12.1, 10.8, 9.8, 8.3, 6.2, 4.3, 4.2, 4.2, 3.3, 1.8, 0.1, 0.05, 0.15, 0.1, 0.18],
}

CodePudding user response:

You can identify the switching points (above 0.2 to under and vice versa) using (df['Value'] < 0.2).diff() and then use cumsum. To remove any parts of the dataframe after the value has been below 0.2 for any period of time, simply remove any rows with a cumsum of 2 or more.

s = (df['Value'] < 0.2).diff().cumsum()
df.loc[s < 2]

Result:

       Date and Time    Value
1   2020-06-07 00:01    15.10
2   2020-06-07 00:02    13.80
3   2020-06-07 00:03    12.00
4   2020-06-07 00:04    11.90
5   2020-06-07 00:05    12.10
6   2020-06-07 00:06    10.80
7   2020-06-07 00:07     9.80
8   2020-06-07 00:08     8.30
9   2020-06-07 00:09     6.20
10  2020-06-07 00:10     4.30
11  2020-06-07 00:11     4.20
12  2020-06-07 00:12     4.20
13  2020-06-07 00:13     3.30
14  2020-06-07 00:14     1.80
15  2020-06-07 00:15     0.10
16  2020-06-07 00:16     0.05
17  2020-06-07 00:17     0.15
18  2020-06-07 00:18     0.10
19  2020-06-07 00:19     0.18

CodePudding user response:

You can set boolean masks on the required condition

it has been <0.2 before and is going up to >0.2.

and then filter:

# mask #1 to have the sequence ever been < 0.2
m1 = df['Value'].lt(0.2).cummax()

# mask #2 to have the sequence values are > 0.2
m2 = df['Value'].gt(0.2)

# Final mask to have the negation of BOTH (m1 and m2)
mask = ~(m1 & m2)

df.loc[mask]
  • The first mask make use of cummax() to ensure the sequence has ever been < 0.2.

  • The second mask is to ensure the sequence going up to > 0.2

  • Final mask is to execute the action:

to delete the part where... [the conditions met]

Result:

       Date and Time  Value
0   2020-06-07 00:00  16.20
1   2020-06-07 00:01  15.10
2   2020-06-07 00:02  13.80
3   2020-06-07 00:03  12.00
4   2020-06-07 00:04  11.90
5   2020-06-07 00:05  12.10
6   2020-06-07 00:06  10.80
7   2020-06-07 00:07   9.80
8   2020-06-07 00:08   8.30
9   2020-06-07 00:09   6.20
10  2020-06-07 00:10   4.30
11  2020-06-07 00:11   4.20
12  2020-06-07 00:12   4.20
13  2020-06-07 00:13   3.30
14  2020-06-07 00:14   1.80
15  2020-06-07 00:15   0.10
16  2020-06-07 00:16   0.05
17  2020-06-07 00:17   0.15
18  2020-06-07 00:18   0.10
19  2020-06-07 00:19   0.18
  • Related