How can I detect sequential values in a Numpy array and process it?-CodePudding

I have a NumPy array that consists of groups of sequential values and I would like to detect the median value(or closest integer) of each group. After that, I should create new arrays by subtracting and adding some values.

Example: data=[100,101,102,103,170,171,172,252,253,254,255,256,333,334,335]

Demand:

the median value of first(closest to median):103,

the median value of second:171,

the median value of third:254,

the median value of forth:334

I want to subtract and add same value of that numbers, let's say 20 than:

final_array =[(83,123), (151,191), (234,274), (314, 354)]

It should not be a median value but it should be a number in a sublist. How can I do it by using python?

Thanks in advance...

CodePudding user response：

You can do something like this:

First lets split the main array to sequential sub-arrays:

splitted_data = np.array(np.split(data, np.where(np.diff(data) != 1)[0] 1), dtype=object)

essentially we are searching the array where the difference between two number is not 1, if the condition is met it splits it.

The last 1 after the can be changed of course if you are looking for sequences with different difference.

Now since spillted_data is an np.array with different shaped objects, np.median won't work "as-is", so lets np.vectorize that method:

vectorized_med = np.vectorize(np.median)

Then just extract median with the vectorized function & round it to match closest int requirement:

medians = np.round(vectorized_med(splitted_data))

Now you can construct your final array with a list comprehension:

num = 20
final_array = np.array([(i - num, i   num) for i in medians])

final output:

array([[ 82., 122.],
       [151., 191.],
       [234., 274.],
       [314., 354.]])

*Just as a side note, the median of [100, 101, 102, 103] is 101.5.

CodePudding user response：

As an alternative solution (avoiding np.vectorize)

import numpy as np

data=np.array([100,101,102,103,170,171,172,252,253,254,255,256,333,334,335])
ddiff = np.diff(data)

#split data
subArrays = np.split(data, np.where(ddiff != 1)[0] 1)

c_val = 20
medians = []
extremes = []
for subArray in subArrays:
    medians.append(np.round(np.median(subArray)).astype(int))
    extremes.append((medians[-1] - c_val, medians[-1]   c_val))

print(extremes)

#outputs
# [(82, 122), (151, 191), (234, 274), (314, 354)]