First of all, thank you for the time you took to answer me.
To give a little example, I have a huge dataset (n instances, 3 features) like that:
data = np.array([[7.0, 2.5, 3.1], [4.3, 8.8, 6.2], [1.1, 5.5, 9.9]])
It's labeled in another array:
label = np.array([0, 1, 0])
Questions:
I know that I can solve my problem by looping python like (for loop) but I'm concerned about a numpy way (without for-loop) to be less time consumption (do it as fast as possible).
If there aren't a way without for-loop, what would be the best one (M1, M2, any other wizardry method?)?.
My solution:
clusters = []
for lab in range(label.max() 1):
# M1: creating new object
c = data[label == lab]
clusters.append([c.min(axis=0), c.max(axis=0)])
# M2: comparing multiple times (called views?)
# clusters.append([data[label == lab].min(axis=0), data[label == lab].max(axis=0)])
print(clusters)
# [[array([1.1, 2.5, 3.1]), array([7. , 5.5, 9.9])], [array([4.3, 8.8, 6.2]), array([4.3, 8.8, 6.2])]]
CodePudding user response:
You could start from and easier variant of this problem:
Given arr
and its label, could you find a minimum and maximum values of arr
items in each group of labels?
For instance:
arr = np.array([55, 7, 49, 65, 46, 75, 4, 54, 43, 54])
label = np.array([1, 3, 2, 0, 0, 2, 1, 1, 1, 2])
Then you would expect that minimum and maximum values of arr
in each label group were:
min_values = np.array([46, 4, 49, 7])
max_values = np.array([65, 55, 75, 7])
Here is a numpy approach to this kind of problem:
def groupby_minmax(arr, label, return_groups=False):
arg_idx = np.argsort(label)
arr_sort = arr[arg_idx]
label_sort = label[arg_idx]
div_points = np.r_[0, np.flatnonzero(np.diff(label_sort)) 1]
min_values = np.minimum.reduceat(arr_sort, div_points)
max_values = np.maximum.reduceat(arr_sort, div_points)
if return_groups:
return min_values, max_values, label_sort[div_points]
else:
return min_values, max_values
Now there's not much to change in order to adapt it to your use case:
def groupby_minmax_OP(arr, label, return_groups=False):
arg_idx = np.argsort(label)
arr_sort = arr[arg_idx]
label_sort = label[arg_idx]
div_points = np.r_[0, np.flatnonzero(np.diff(label_sort)) 1]
min_values = np.minimum.reduceat(arr_sort, div_points, axis=0)
max_values = np.maximum.reduceat(arr_sort, div_points, axis=0)
if return_groups:
return min_values, max_values, label_sort[div_points]
else:
return np.array([min_values, max_values]).swapaxes(0, 1)
groupby_minmax(data, label)
Output:
array([[[1.1, 2.5, 3.1],
[7. , 5.5, 9.9]],
[[4.3, 8.8, 6.2],
[4.3, 8.8, 6.2]]])
CodePudding user response:
it has already been answered, you can go to this link for your answer python numpy access list of arrays without for loop