How to transform a variable to bucketed variable which tells us which bucket/range it lies to in pyt-CodePudding

I have a variable a = [0.129, 0.369, 0.758, 0.012, 0.925]. I want to transform this variable into a bucketed variable. What I mean by this is explained below.

min_bucket_value, max_bucket_value = 0, 1 (Can be anything, for example, 0 to 800, but the min value is always going to be 0)

num_divisions = 10 (For this example I've taken 10, but it can be higher as well, like 80 divisions instead of 10)

Bucket/division ranges are as shown below.

0 - 0.1 -> 0
0.1 - 0.2 -> 1
0.2 - 0.3 -> 2
0.3 - 0.4 -> 3
0.4 - 0.5 -> 4
0.5 - 0.6 -> 5
0.6 - 0.7 -> 6
0.7 - 0.8 -> 7
0.8 - 0.9 -> 8
0.9 - 1.0 -> 9

so, transformed_a = [1, 3, 7, 0, 9]

So it's like I divide min_bucket_value, max_bucket_value in num_divisions different ranges/buckets and then transform original a to tell which bucket it lies in

I've tried creating torch.linspace(min_bucket_value, max_bucket_value, num_divisions), but not sure how to move forward and map it to a range so that I can get the bucket index to which it belongs to

Can you guys please help

EDIT

There's an extension to this problem.

Let's say that we've got a = [127, 362, 799] and I want to create two buckets. One is a coarse bucket, so a_transform = [12, 36, 89], but what if I want a fine bucket as well so that my second transformation becomes a_fine_transform = [7, 2, 9].

Sub-range index within the range. Basically, coarse division has 80 buckets (giving 127 in 12th bucket) and then the fine bucket which has 10 divisions which tells us that 127 lies in 12th coarse bucket and 7th fine bucket

a can be in float as well. eg, a = [127.36, 362.456, 789.646].

so a_coarse_transform = [12, 36, 78] & a_fine_transform = [7, 2, 6]

where min_bucket_value, max_bucket_value, num_coarse_divisions, num_fine_divisions = 0, 1, 80, 10

CodePudding user response：

For equally spaced buckets you can do it constant time with respect to the number of buckets, width, or location.

Use an affine map to place the values from the range (min_bucket_value, max_bucket_value) to the range 0 < a1 < 1

a1 = (a - min_bucket_value) / (max_bucket_value - min_bucket_value)

Get the index of the divisions

b = (a1 * num_divisions).astype(torch.long)

With this method if you have elements outside the bucket range in a you will have indices outside the range 0:num_divisions.

If you want to apply multiple levels of partitioning

a_coarse_transform, r = numpy.divmod(a1 * coarse_divisions, 1)
a_fine_transform, r = numpy.divmod(r * fine_divisions, 1)

If you wanted one more level you could simply apply divmod to r again, specifying the number of divisions