I have a numpy 2D array of 50 patients and 100 score data points.
scores = array([[7.0, 10.0, 12.0, ..., 0.0],
[0.0, 11.0, 34.0, ..., 1.0],
.
.
.
[0.0, 33.0, 34.0, ..., 50.0]])
each score is a non-negative float value that will be mapped to a category {a, b, c}( which stand for mild, moderate, sever) according to range condition {v < 20: 'A', 20 <= v <= 50 : 'B', 50 <= v : 'C'}.
This can be done using ((25 < a) & (a < 100)).sum()
as in this thread.
Now I need to assign each patient a category, based on the maximum score he received, provided that the count of the category data points is >= certain threshold (say 20%).
For example (taking 20% out of 100 data points as threshold):
- if patient i scored 25 data points of severity 'C' -> he is categorized as C (severe)
- if patient i scored 15 data points of severity 'C' and 15 data points of severity 'B' -> he is categorized as B (moderate).
Is there a way to do that automatically in numpy?
Thank you in advance.
Update:
expected output should be 1D array of the same length of number or patients (50,) in the form
categories = ['A', 'C', 'A', .... 'B']
, where each value is the overall category of the patient.
CodePudding user response:
mapping the values
You can use numpy.select
:
scores = np.array([[7.0, 10.0, 12.0, 0.0],
[0.0, 11.0, 34.0, 55],
[55,55,0,44],
])
out = np.select([scores<20, (20<=scores)&(scores<50), 50<=scores],
['A', 'B', 'C'])
output:
array([['A', 'A', 'A', 'A'],
['A', 'A', 'B', 'C'],
['C', 'C', 'A', 'B']], dtype='<U3')
getting the most frequent
Here use numpy.unique
:
categories np.unique(out, axis=1)[:,0]
output:
array(['A', 'A', 'C'], dtype='<U3')
CodePudding user response:
I made it in one step
data = get_the_data()
data[:, :-1].sort() # sort the data descending along the last dimension.
data_categorized = data[:, 20] # Threshold is 20% at least
# Now I can categorize directly
out = np.select([data<20, (20<=data)&(data<50), 50<=data], ['A', 'B', 'C'])
Instead of categorizing each data point then categorizing the patient as a whole based on at least 20% severity threshold, I sorted the array descending, then I took the item number 20 (out of 100).
Being sorted descending, I am sure when I take the item # 20 that all the items before it are of equal or higher severity.