I have a NumPy array that consists of groups of sequential values and I would like to detect the median value(or closest integer) of each group. After that, I should create new arrays by subtracting and adding some values.
Example: data=[100,101,102,103,170,171,172,252,253,254,255,256,333,334,335]
Demand:
the median value of first(closest to median):103,
the median value of second:171,
the median value of third:254,
the median value of forth:334
I want to subtract and add same value of that numbers, let's say 20 than:
final_array =[(83,123), (151,191), (234,274), (314, 354)]
It should not be a median value but it should be a number in a sublist. How can I do it by using python?
Thanks in advance...
CodePudding user response:
You can do something like this:
First lets split the main array to sequential sub-arrays:
splitted_data = np.array(np.split(data, np.where(np.diff(data) != 1)[0] 1), dtype=object)
essentially we are searching the array where
the difference between two number is not 1, if the condition is met it splits
it.
The last 1
after the
can be changed of course if you are looking for sequences with different difference.
Now since spillted_data
is an np.array
with different shaped objects, np.median
won't work "as-is", so lets np.vectorize
that method:
vectorized_med = np.vectorize(np.median)
Then just extract median with the vectorized function & round it to match closest int
requirement:
medians = np.round(vectorized_med(splitted_data))
Now you can construct your final array with a list comprehension:
num = 20
final_array = np.array([(i - num, i num) for i in medians])
final output:
array([[ 82., 122.],
[151., 191.],
[234., 274.],
[314., 354.]])
*Just as a side note, the median of [100, 101, 102, 103]
is 101.5
.
CodePudding user response:
As an alternative solution (avoiding np.vectorize
)
import numpy as np
data=np.array([100,101,102,103,170,171,172,252,253,254,255,256,333,334,335])
ddiff = np.diff(data)
#split data
subArrays = np.split(data, np.where(ddiff != 1)[0] 1)
c_val = 20
medians = []
extremes = []
for subArray in subArrays:
medians.append(np.round(np.median(subArray)).astype(int))
extremes.append((medians[-1] - c_val, medians[-1] c_val))
print(extremes)
#outputs
# [(82, 122), (151, 191), (234, 274), (314, 354)]