Find consecutive values beside peaks that are within threshold of X% distance-CodePudding

Goal: I have a 1D series of numbers & I have identified the peaks and valleys of this series. I want to identify values adjacent to these peaks / valleys that are within a threshold distance (eg 5%) of the peak / valley.

Some conditions: The nearby values can be more than one datapoint away from the peak/valley - provided the values between it and the peak/valley are all deemed to be nearby also (ie. within X% of the peak/valley)

For example: I have a series (A) of time series values: [2,4,3,6,5,4,2,6,5] And I have a pandas series (B) representing the peaks and valleys of this time series: [-1,0,0,1,0,0,-1,1,0]. I want to identify values beside the peaks and valleys that are within x% of that peak/valley. And to update (B) to label those nearby close peaks/valleys as 1 or -1. eg. B = [-1,0,0,1,1,0,-1,1,1]

My progress so far:

# Sequence of 1D data
x = np.random.random(10).reshape(-1,1)

# Identify peaks and valleys
p, _ = argrelextrema(x, np.greater)
v, _ = argrelextrema(x, np.less)

# Label peaks / valleys as 1 / -1
peaks = np.zeros_like(x)
peaks[p] = 1
peaks[v] = -1

# compute square pairwise distances of x
sd = squareform(pdist(x))

# Compute the relative distance of x to these neighbouring points
rd = sd / x

# Identify distances that are within a threshold (5%)
gt_distances = (rd < 0.05) * rd

This provides me with a square matrix highlighting the pdist points that are with the defined threshold.

array([[0.        , 0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.00538613, 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ],
       [0.        , 0.00541529, 0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ]])

How do I identify all consecutive points adjacent to a peak or valley that is within this threshold?

CodePudding user response：

setup

df = pd.DataFrame({
    "vals":[2,4,3,6,5,4,2,6,5],
    "extremes":[-1,0,0,1,0,0,-1,1,0],
})

solution

threshold = 0.2

def update_extremes(df):
    return (
        pd.concat(
            [
                df,
                df.mask(df["extremes"] == 0).fillna(method="ffill").rename(columns=lambda n: "prev_" n),
            ],
            axis=1
        )
        .eval("within_threshold = abs(vals-prev_vals)/prev_vals < @threshold")
        .eval("mask = within_threshold and extremes == 0")
        .eval("is_new_extreme = within_threshold.mask(mask).fillna(method='ffill')")
        .eval("new_extremes = prev_extremes.where(is_new_extreme).fillna(0)")
        [["vals", "extremes", "new_extremes"]]  # comment out this to get all columns (which may help in understanding what is going on)
        .astype(int)
    )

running

update_extremes(df)

gives you a dataframe with updated extremes for values following peaks, eg

   vals  extremes  new_extremes
0     2        -1            -1
1     4         0             0
2     3         0             0
3     6         1             1
4     5         0             1
5     4         0             0
6     2        -1            -1
7     6         1             1
8     5         0             1

so feeding in the reversed dataframe:

update_extremes(df.sort_index(ascending=False)).sort_index()

will give you a dataframe with updated extremes for values preceding peaks.

You will need to decide how to figure out how to reconcile any value which may fall within the threshold of a peak and trough.

Lastly, I'm not suggesting you implement the idea exactly like this, whereby intermediate columns are added to a dataframe, just thought it may help you understand the thought process.