I have a numpy.array
with several integer values.
For all values from 0 to the maximal value in the array, I want to count how many elements are equal or greater.
This is my current code:
import numpy as np
from random import randint
arr = np.array([randint(0, 10) for _ in range(20)])
val_range = np.arange(arr.max() 1)
count_array = np.array([(arr >= v).sum() for v in val_range])
Is there a better way implementing this with numpy
?
I want to implement this with numpy
and later integrate the code in a function compiled with numba
.
CodePudding user response:
You can use ECDF.
If you use the first definition of vals
you'll get what you want. But I also give another option - this is a bit different than what you ask, because it only computes the "necessary" numbers while your code computes all numbers between 0 and the maximal value, but many of them are unncessary (especially if your array is smaller than the largest number).
Anyways, if you need to sample in other values than what you have, you can easilly use the ecdf
function as it is.
from statsmodels.distributions.empirical_distribution import ECDF
# Generate array
arr = np.array([randint(0, 10) for _ in range(20)])
# Compute ECDF
ecdf = ECDF(arr, side='left')
# what you do in your code
vals = np.arange(np.max(arr) 1)
# a more efficient way (if relevant) - get points to sample from ECDF
vals = np.unique(arr)
vals.sort()
# Get number of elements equal or greated than from each element
(1-ecdf(vals)) * arr.shape[0]
CodePudding user response:
I guess this is what you are looking for? This is compatible with numba njit
[np.count_nonzero(arr >= v) for v in val_range]
Out[24]: [20, 17, 15, 15, 14, 12, 8, 6, 4, 2, 2] ## As in my random list