How can I identify the start and end of lower period of noisy data?-CodePudding

The

How can I identify the start and end index values of the less noisy and lower valued period marked in yellow?

Here is the test data:

import numpy as np import pandas as pd import matplotlib.pyplot as plt arr = np.array([8,9,7,3,6,3,2,1,2,3,1,2,3,2,2,3,2,2,5,7,8,9,15,20,21]) plt.plot(arr) plt.show()

CodePudding user response：

Well if you just want that 'area', you need some way of finding points within certain bounds. How can we do that? Well, we should probably start by finding the minimum of the array and then finding other values in that same array that fall within the specified deviation:

def lows(arr, dev=0): lim = min(arr) dev pts = [] for i,e in enumerate(arr): if e <= lim: pts.append((i,e)) return pts

The above function returns a list of points that fall within the specified bounds. The lower bound is the minimum of the input array and the upper bound is the minimum value plus the deviation you will supply. For example, if you want all points within 1 of the lowest value:

plt.plot(arr) for pt in lows(arr, 1): circle = plt.Circle(pt, 0.2, color='g') plt.gca().add_patch(circle) plt.show()

CodePudding user response：

For a given point, we can decide to keep/mask it based on certain criteria:

Are its neighbors are within some delta?

Is it within some threshold of the minimum?

Is it in a contiguous block?

Note: Since you tagged and imported pandas, I'll use pandas for convenience, but the same ideas can be implemented with pure numpy/matplotlib.

If all lower periods are around the same level

Then a simple approach is to use a neighbor delta with minimum threshold (though be careful of outliers in the real data):

s = pd.Series(np.hstack([arr, arr])) delta = 2 threshold = s.std() # check if each point's neighbors are within `delta` mask_delta = s.diff().abs().le(delta) & s.diff(-1).abs().le(delta) # check if each point is within `threshold` of the minimum mask_threshold = s < s.min() threshold s.plot(label='raw') s.where(mask_threshold & mask_delta).plot(marker='*', label='delta & threshold')

If the lower periods are at different levels

Then a global minimum threshold won't work since some periods will be too high. In this case try a neighbor delta with contiguous blocks:

# shift the second period by 5 s = pd.Series(np.hstack([arr, arr 5])) delta = 2 blocksize = 10 # check if each point's neighbors are within `delta` mask_delta = s.diff().abs().le(delta) & s.diff(-1).abs().le(delta) # check if each point is in a contiguous block of at least `blocksize` masked = s.where(mask_delta) groups = masked.isnull().cumsum() blocksizes = masked.groupby(groups).transform('count').mask(masked.isnull()) mask_contiguous = blocksizes >= blocksize s.plot(label='raw') s.where(mask_contiguous).plot(marker='*', label='delta & contiguous')

Page link：https//www.codepudding.com/other/141514.html

Prev:How to create alternating non-overlapping dashes in contour line

Next:Selenium Webdriver can't find "navigate" element in Ruby