I have a pandas dataframe with a column with blocks of 1 (for instance) (see screenshot)
I would like to create another column which signals True only at the beginning (first cell) and end (last cell) of each block. So that when 0 is followed by 1 and 1 is followed by 0 it is true.
For instance in a column with values [0,1,1,1,1,0]
I'm looking to get another column like this : [0,1,0,0,1,0]
.
I figured I could use np.where, but I have no idea how to implement the corresponding conditions. I tried this :
df['output'] = np.where(df['signal'].rolling(2).mean() == 0.5, 1, 0)
It works well to signal the first cell, but the signal in the output column for the last cell is shifted compared to the signal column.
Could you please help me ?
Thanks in advance
CodePudding user response:
Here is the one-line solution:
import pandas as pd
import numpy as np
df = pd.DataFrame(data=[0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0], columns=['signal'])
# Solution
df['output'] = np.where((df['signal'].diff().eq(1)) | (df['signal'].diff().shift(-1) == -1), True, False)
CodePudding user response:
You can use a combination of np.abs
and np.diff
, and cast the result to a boolean array
So, for example, if your sequence is x = [0,0,0,1,1,0,0]
diffs = np.diff(x) # has value [0,0,-1,0,1,0]
abs_diffs = np.abs(diffs) # has value [0,0,1,0,1,0]
boolean_array = abs_diffs == 1 # [False, False, True, False, True, False]
Note that this array has a length which is one less than your original array, you need to decide what your sequences mean for the first/last element in your list