Split arrays by bins and save the value-CodePudding

My data is an array of 2d I want to split like in histogram (according to the 1d elements)

The problem is that I want to split the array by bins of the 1d elements and save new arrays. I need to do it because I have to do some calculations on the new arrays.

I can to this in the hard way(just go over everything and divide them) But I prefer to do it in a faster way. The only functions that I found are like np.histogram that gives my the size of the array.

An example of Data given and the ideal return :

I don’t care about the bin (0,2) in with 2 or without. So (3,50) can by in the first bin also.

a = [(2,50),(4,60),(3,50),(6,0),(7,1),(4,10),(2,80]
bin = 2

Should return:

a1 = [(2,50),(2,80)]
a2 = [(4,60),(3,50),(4,10)]
a3 = [(6,0) ,(7,1)]

CodePudding user response：

The key part is to create a dictionary and track the maximum value of the key. After that, you can achieve the desired answer in any number of ways.

a = [(2,50),(4,60),(3,50),(6,0),(7,1),(4,10),(2,80)]
bin_ = 2

d = dict()
mx = a[0][0]
for k, v in a:
    d.setdefault(k, []).append(v)
    if k > mx:
        mx = k

ans = [[(i   j, d.get(i   j, [])) for j in list(range(bin_))] for i in range(0, mx, bin_)]
ans = [[(k, el) for k, v in x for el in v] for x in ans]
[x for x in ans if x]
# [[(2, 50), (2, 80), (3, 50)], [(4, 60), (4, 10)], [(6, 0), (7, 1)]]

CodePudding user response：

Here's an alternative solution which I think works:

from operator import itemgetter as get


def get_bins(arr, n):
    (min_arr, _), (max_arr, _) = min(arr, key=get(0)), max(arr, key=get(0))
    bins = [(i, i   n) for i in range(min_arr, max_arr   1, n)]
    result = [[] for _ in bins]
    for t in arr:
        for i, (a, b) in enumerate(bins):
            if a <= t[0] < b:
                result[i].append(t)
    return result

The process is to first generate the intervals which dictate each bin. Then iterate over each tuple in the array and figure out which bin it should be placed into.

Demo:

In [3]: get_bins(a, 1)
Out[3]: [[(2, 50), (2, 80)], [(3, 50)], [(4, 60), (4, 10)], [], [(6, 0)], [(7, 1)]]

In [4]: get_bins(a, 2)
Out[4]: [[(2, 50), (3, 50), (2, 80)], [(4, 60), (4, 10)], [(6, 0), (7, 1)]]

In [5]: get_bins(a, 3)
Out[5]: [[(2, 50), (4, 60), (3, 50), (4, 10), (2, 80)], [(6, 0), (7, 1)]]

Note that when the bin size is 1, an empty list shows up for bin 5, which I'd expect to be desired behavior.

There are some details here that are somewhat unclear, and I've made some assumptions about your desired behavior. For one thing, you may want to inspect closely the first line of the function, which gets the min and max value from the tuples which establishes the range of values within which we're binning items. If you instead want to explicitly pass a min and max value between which to find your bin boundaries, you'll need to make those modifications. Alternatively, if you know you'll always start at 0, you could instead simply find the max of the tuples and bin everything in steps of n from 0 to the max:

def get_bins(arr, n):
    max_arr, _ = max(arr, key=lambda t: t[0])
    bins = [(i, i   n) for i in range(0, max_arr   n   1, n)]
    result = [[] for _ in bins]
    for t in arr:
        for i, (a, b) in enumerate(bins):
            if a < t[0] <= b:
                result[i].append(t)
    return result

Again, some assumptions are being made here about which side of the interval is inclusive. For example, with n=2 and starting the binning from 0, the tuples (6, 0) and (7, 1) are not put into the same bin.