There is a dataset containing 3000 rows and 5 columns as -
Id Tag_number Creation_time Code Alert_message
1 GH567 17:19.8 MEI001 0
2 GH567 58:33.4 MEI001 0
3 GH567 15:21.6 MEI001 1
4 GH567 59:33.7 MEI001 1
5 HR864 16:23.5 XUY056 0
6 KI964 34:13.6 UYT345 0
7 KI964 55:24.8 UYT345 1
.... 3000 rows
The dataset contains information on when alerts were created (known as 1) and when not created (as 0). Here the "Creation_time" shows both the date and time when alert was created. The ask is to extract rows based on condition applied to "Alert_message". The condition would be to check whenever there's a "0" occurring is there a "1" occurring consecutive to it in next row. If yes, extract both rows into another dataframe. The end result should look like -
Id Tag_number Creation_time Code Alert_message
2 GH567 58:33.4 MEI001 0
3 GH567 15:21.6 MEI001 1
6 KI964 34:13.6 UYT345 0
7 KI964 55:24.8 UYT345 1
I have used -
df1 = df[df.Alert_message.gt(0) | df.Alert_message.shift(-1).gt(0)]
The result I got -
Id Tag_number Creation_time Code Alert_message
2 GH567 58:33.4 MEI001 0
3 GH567 15:21.6 MEI001 1
4 GH567 59:33.7 MEI001 1
Would need some suggestions on this. Thanks and appreciate your time and effort!!
CodePudding user response:
I would harness .shift
method for task following way, consider simple example
import pandas as pd
df = pd.DataFrame({'Id':[1,2,3,4,5,6,7],'Alert':[0,0,1,1,0,0,1]})
df['PrevAlert'] = df.Alert.shift(1)
df['NextAlert'] = df.Alert.shift(-1)
df2 = df[((df.Alert==0)&(df.NextAlert==1))|((df.Alert==1)&(df.PrevAlert==0))].drop(columns=['PrevAlert','NextAlert'])
print(df2)
gives output
Id Alert
1 2 0
2 3 1
5 6 0
6 7 1
Explanation: I create column PrevAlert nad NextAlert columns which are shifted by 1 in regards to Alert, so that is value of alert from previous and next row or NaN
when there is not previous/next row. I then do select elements where ((Alert equals 0)and(next equals 1))or((Alert equals 1)and(previous equals 0)) then drop columns PrevAlert and NextAlert as these are not longer required.
CodePudding user response:
It works too :)
from itertools import chain
df = pd.DataFrame({'ID': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'], "ALERT": [0, 0, 1, 1, 0, 1, 0, 0, 1]})
idx = df[df['ALERT'] - df['ALERT'].shift().fillna(0) == 1].index
print(df.iloc[list(chain(*[(x-1, x) for x in idx])), :])
ID ALERT
1 B 0
2 C 1
4 E 0
5 F 1
7 H 0
8 I 1