Home > OS >  How to compare one row value with another and extract the complete row into another dataframe in pyt
How to compare one row value with another and extract the complete row into another dataframe in pyt

Time:06-21

There is a dataset containing 3000 rows and 5 columns as -

Id   Tag_number   Creation_time    Code    Alert_message
 1      GH567       17:19.8       MEI001       0
 2      GH567       58:33.4       MEI001       0
 3      GH567       15:21.6       MEI001       1
 4      GH567       59:33.7       MEI001       1
 5      HR864       16:23.5       XUY056       0
 6      KI964       34:13.6       UYT345       0
 7      KI964       55:24.8       UYT345       1 
                                       .... 3000 rows

The dataset contains information on when alerts were created (known as 1) and when not created (as 0). Here the "Creation_time" shows both the date and time when alert was created. The ask is to extract rows based on condition applied to "Alert_message". The condition would be to check whenever there's a "0" occurring is there a "1" occurring consecutive to it in next row. If yes, extract both rows into another dataframe. The end result should look like -

Id   Tag_number   Creation_time    Code    Alert_message
 2      GH567       58:33.4       MEI001       0
 3      GH567       15:21.6       MEI001       1
 6      KI964       34:13.6       UYT345       0
 7      KI964       55:24.8       UYT345       1 

I have used -

df1 = df[df.Alert_message.gt(0) | df.Alert_message.shift(-1).gt(0)]

The result I got -

Id   Tag_number   Creation_time    Code    Alert_message
 2      GH567       58:33.4       MEI001       0
 3      GH567       15:21.6       MEI001       1
 4      GH567       59:33.7       MEI001       1

Would need some suggestions on this. Thanks and appreciate your time and effort!!

CodePudding user response:

I would harness .shift method for task following way, consider simple example

import pandas as pd
df = pd.DataFrame({'Id':[1,2,3,4,5,6,7],'Alert':[0,0,1,1,0,0,1]})
df['PrevAlert'] = df.Alert.shift(1)
df['NextAlert'] = df.Alert.shift(-1)
df2 = df[((df.Alert==0)&(df.NextAlert==1))|((df.Alert==1)&(df.PrevAlert==0))].drop(columns=['PrevAlert','NextAlert'])
print(df2)

gives output

   Id  Alert
1   2      0
2   3      1
5   6      0
6   7      1

Explanation: I create column PrevAlert nad NextAlert columns which are shifted by 1 in regards to Alert, so that is value of alert from previous and next row or NaN when there is not previous/next row. I then do select elements where ((Alert equals 0)and(next equals 1))or((Alert equals 1)and(previous equals 0)) then drop columns PrevAlert and NextAlert as these are not longer required.

CodePudding user response:

It works too :)

from itertools import chain
df = pd.DataFrame({'ID': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'], "ALERT": [0, 0, 1, 1, 0, 1, 0, 0, 1]})
idx = df[df['ALERT'] - df['ALERT'].shift().fillna(0) == 1].index
print(df.iloc[list(chain(*[(x-1, x) for x in idx])), :])
  ID  ALERT
1  B      0
2  C      1
4  E      0
5  F      1
7  H      0
8  I      1
  • Related