I have some dynamically created arrays that have varying lengths and I would like to resize them to the same 5000 element length by popping every n element.
Here is what I got so far:
import numpy as np
random_array = np.random.rand(26975,3)
n_to_pop = int(len(random_array) / 5000)
print(n)
If I do the downsampling with n (5) I get 5395 elements
I can do 5395 / 5000 = 1.07899, but I don't know how to calculate how often I should pop a element to remove the last 0.07899 elements.
If I can get within 5000-5050 length that would also be acceptable, then the remainder can be sacrificed with a simple .resize
This is probably just a simple math question, but I couldn't seem to find an answer anywhere.
Any help is much appreciated.
Best regards
Martin
CodePudding user response:
You can use Step solution using np.random.choice
or np.random.permutation
as:
random_array[np.random.permutation(random_array.shape[0])[:5000]]
In case of near uniformly remove the rows, one way is:
indices = np.linspace(0, random_array.shape[0], endpoint=False, num=5000, dtype=int)
# [ 0 5 10 16 ... 26958 26964 26969] --> shape = (5000,)
result = random_array[indices]
CodePudding user response:
You can use something like np.linspace
to make your solution as uniform as possible:
subset = random_array[np.round(np.linspace(0, len(random_array), 5000, endpoint=False)).astype(int)]
You don't always want to drop a uniform number of elements. Consider the case of reducing a 5003 element array to 5000 elements vs a 50003 element array. The trick is to create a set of elements to keep or drop that's as linear as possible in the index, which is exactly what np.linspace
does.
You could also do something like
np.delete(random_array, np.round(np.linspace(0, len(random_array) len(random_array) - 5000, endpoint=False)).astype(int))