Home > other >  python dataframe combine two boolean columns
python dataframe combine two boolean columns

Time:06-04

I have a dataframe. I am performing forward and backward substraction. Later, perform a comparison and produce boolean outputs. Next, I want to perform logical and on these results and produce one result.

Code:

xdf = pd.DataFrame({'data':range(0,6)},index=pd.date_range('2022-06-03 00:00:00', '2022-06-03 00:00:25', freq='5s'))

# perform 1 row backward substraction
bs = xdf['data'].diff(1).abs().le(1)

# perform 1 row forward substraction
fs= xdf['data'].diff(-1).abs().le(1)


bs = 
2022-06-03 00:00:00    False
2022-06-03 00:00:05     True
2022-06-03 00:00:10     True
2022-06-03 00:00:15     True
2022-06-03 00:00:20     True
2022-06-03 00:00:25     True
Freq: 5S, Name: data, dtype: bool

fs = 

2022-06-03 00:00:00     True
2022-06-03 00:00:05     True
2022-06-03 00:00:10     True
2022-06-03 00:00:15     True
2022-06-03 00:00:20     True
2022-06-03 00:00:25    False

Present and expected output:

xdf['validation'] = np.logical_and(sa,sb)
2022-06-03 00:00:00    False
2022-06-03 00:00:05     True
2022-06-03 00:00:10     True
2022-06-03 00:00:15     True
2022-06-03 00:00:20     True
2022-06-03 00:00:25    False
Freq: 5S, Name: data, dtype: bool

The output is correct and this is what I am expecting. My question, is there a way I can compute all the above (forward substraction and backward substraction) in a single code of line?

CodePudding user response:

Maybe you can try loop the [1,-1] and use np.logical_and.reduce

xdf['validation'] = np.logical_and.reduce([xdf['data'].diff(x).abs().le(1) for x in [1,-1]])
print(xdf)

                     data  validation
2022-06-03 00:00:00     0       False
2022-06-03 00:00:05     1        True
2022-06-03 00:00:10     2        True
2022-06-03 00:00:15     3        True
2022-06-03 00:00:20     4        True
2022-06-03 00:00:25     5       False

CodePudding user response:

IIUC, you can use a rolling max, then check whether the max is ≤ your target:

xdf['validation'] = xdf['data'].diff(-1).abs().rolling(2).max().le(1)

output:

                     data  validation
2022-06-03 00:00:00     0       False
2022-06-03 00:00:05     1        True
2022-06-03 00:00:10     2        True
2022-06-03 00:00:15     3        True
2022-06-03 00:00:20     4        True
2022-06-03 00:00:25     5       False
  • Related