Numpy- how to create a boolean array with row and column indices-CodePudding

I am using argrelextrema from scipy to identify peaks in my data. The data looks something like this:

Now argrelextrema finds me the peaks in the data and outputs two arrays (row and column indices).

argrelextrema(df.values, np.greater, axis=0,order=1)

Now, out of my original data, I would like to create a boolean array where the peaks are identified as True, and the rest else False.

Something like this (only illustration).

Usually I would achieve something like the above with an np.where, but since I now have row and column indices, I didn't find a way how to use that with an np.where clause.

Also, I would like to have a vectorized solution, if possible.

CodePudding user response：

As pointed out argrelextrema returns the indices of the relative extrema values.

Use those indices to index directly into a boolean array of the same shape as your original DataFrame:

import numpy as np
import pandas as pd
from scipy.signal import argrelextrema

# for reproducibility 
np.random.seed(42)

# create toy DataFrame
df = pd.DataFrame(data=np.random.random((100, 3)) * 20, columns=["AAPL", "MSFT", "TSLA"])

# extract rows and cols indices using extrema
rows, cols = argrelextrema(df.values, np.greater, axis=0,order=1)

# create boolean numpy array
values = np.zeros_like(df.values, dtype=bool)

# set the values of the extrema to True
values[rows, cols] = True

# convert to DataFrame
result = pd.DataFrame(data=values, columns=df.columns)
print(result)

Output

     AAPL   MSFT   TSLA
0   False  False  False
1   False  False   True
2   False  False  False
3   False   True   True
4   False  False  False
..    ...    ...    ...
95  False  False  False
96   True  False   True
97  False  False  False
98   True   True   True
99  False  False  False

[100 rows x 3 columns]

The important part of the code above is this:

# create boolean numpy array
values = np.zeros_like(df.values, dtype=bool)

# set the values of the extrema to True
values[rows, cols] = True

As an alternative you could directly built the values array, from a sparse.csr_matrix:

from scipy.sparse import csr_matrix
values = csr_matrix((np.ones(rows.shape), (rows, cols)), shape=df.shape, dtype=bool).toarray()