Goal: I have a 1D series of numbers & I have identified the peaks and valleys of this series. I want to identify values adjacent to these peaks / valleys that are within a threshold distance (eg 5%) of the peak / valley.
Some conditions: The nearby values can be more than one datapoint away from the peak/valley - provided the values between it and the peak/valley are all deemed to be nearby also (ie. within X% of the peak/valley)
For example: I have a series (A) of time series values: [2,4,3,6,5,4,2,6,5]
And I have a pandas series (B) representing the peaks and valleys of this time series: [-1,0,0,1,0,0,-1,1,0]
. I want to identify values beside the peaks and valleys that are within x% of that peak/valley. And to update (B) to label those nearby close peaks/valleys as 1 or -1. eg. B = [-1,0,0,1,1,0,-1,1,1]
My progress so far:
# Sequence of 1D data
x = np.random.random(10).reshape(-1,1)
# Identify peaks and valleys
p, _ = argrelextrema(x, np.greater)
v, _ = argrelextrema(x, np.less)
# Label peaks / valleys as 1 / -1
peaks = np.zeros_like(x)
peaks[p] = 1
peaks[v] = -1
# compute square pairwise distances of x
sd = squareform(pdist(x))
# Compute the relative distance of x to these neighbouring points
rd = sd / x
# Identify distances that are within a threshold (5%)
gt_distances = (rd < 0.05) * rd
This provides me with a square matrix highlighting the pdist points that are with the defined threshold.
array([[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.00538613, 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0.00541529, 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ]])
How do I identify all consecutive points adjacent to a peak or valley that is within this threshold?
CodePudding user response:
setup
df = pd.DataFrame({
"vals":[2,4,3,6,5,4,2,6,5],
"extremes":[-1,0,0,1,0,0,-1,1,0],
})
solution
threshold = 0.2
def update_extremes(df):
return (
pd.concat(
[
df,
df.mask(df["extremes"] == 0).fillna(method="ffill").rename(columns=lambda n: "prev_" n),
],
axis=1
)
.eval("within_threshold = abs(vals-prev_vals)/prev_vals < @threshold")
.eval("mask = within_threshold and extremes == 0")
.eval("is_new_extreme = within_threshold.mask(mask).fillna(method='ffill')")
.eval("new_extremes = prev_extremes.where(is_new_extreme).fillna(0)")
[["vals", "extremes", "new_extremes"]] # comment out this to get all columns (which may help in understanding what is going on)
.astype(int)
)
running
update_extremes(df)
gives you a dataframe with updated extremes for values following peaks, eg
vals extremes new_extremes
0 2 -1 -1
1 4 0 0
2 3 0 0
3 6 1 1
4 5 0 1
5 4 0 0
6 2 -1 -1
7 6 1 1
8 5 0 1
so feeding in the reversed dataframe:
update_extremes(df.sort_index(ascending=False)).sort_index()
will give you a dataframe with updated extremes for values preceding peaks.
You will need to decide how to figure out how to reconcile any value which may fall within the threshold of a peak and trough.
Lastly, I'm not suggesting you implement the idea exactly like this, whereby intermediate columns are added to a dataframe, just thought it may help you understand the thought process.