Np.where(x followed by y)-CodePudding

I have a pandas dataframe with a column with blocks of 1 (for instance) (see screenshot)

I would like to create another column which signals True only at the beginning (first cell) and end (last cell) of each block. So that when 0 is followed by 1 and 1 is followed by 0 it is true.

For instance in a column with values [0,1,1,1,1,0] I'm looking to get another column like this : [0,1,0,0,1,0].

I figured I could use np.where, but I have no idea how to implement the corresponding conditions. I tried this : df['output'] = np.where(df['signal'].rolling(2).mean() == 0.5, 1, 0) It works well to signal the first cell, but the signal in the output column for the last cell is shifted compared to the signal column.

Could you please help me ?

Thanks in advance

CodePudding user response：

Here is the one-line solution:

import pandas as pd
import numpy as np

df = pd.DataFrame(data=[0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0], columns=['signal'])

# Solution
df['output'] = np.where((df['signal'].diff().eq(1)) | (df['signal'].diff().shift(-1) == -1), True, False)

CodePudding user response：

You can use a combination of np.abs and np.diff, and cast the result to a boolean array

So, for example, if your sequence is x = [0,0,0,1,1,0,0]

diffs = np.diff(x) # has value [0,0,-1,0,1,0]
abs_diffs = np.abs(diffs) # has value [0,0,1,0,1,0]
boolean_array = abs_diffs == 1 # [False, False, True, False, True, False]

Note that this array has a length which is one less than your original array, you need to decide what your sequences mean for the first/last element in your list