Create histogram from two arrays-CodePudding

I have two numpy arrays with the same dimensions: weights, and percents. Percents is 'real' data, and the weights is how many of each 'real' data there is in the histogram.

Eg)

weights = [[0, 1, 1, 4, 2]
           [0, 1, 0, 3, 5]]
percents = [[1, 2, 3, 4, 5]
            [1, 2, 3, 4, 5]]

(every row of percents is the same)

I would like to "multiply" these together in such a way that I produce weights[x] * [percents[x]]:

results = [[0 * [1]   1 * [2]   1 * [3]   4 * [4]   2 * [5]
           [0 * [1]   1 * [2]   0 * [3]   3 * [4]   5 * [5]]
        = [[2, 3, 4, 4, 4, 4, 5, 5]
           [2, 4, 4, 4, 5, 5, 5, 5, 5]]

Notice that the lengths of each row can be different.. Ideally this can be done in numpy but because of this it may end up being a list of lists.

Edit: I've been able to cobble together these nested for loops but obviously it's not ideal:

list_of_hists = []
for index in df.index:
    hist = []
    # Create a list of lists, later to be flattened to 'results'
    for i, percent in enumerate(percents):
        hist.append(
        # For each percent, create a list of [percent] * weight
            [percent]
            * int(
                df.iloc[index].values[i]
            )
        )
    # flatten the list of lists in hist
    results = [val for list_ in hist for val in list_]
    list_of_hists.append(results)

CodePudding user response：

There is a np.repeat designed for such kind of operations but it doesn't work in 2D case. So you need to work with flattened views of arrays instead.

weights = np.array([[0, 1, 1, 4, 2], [0, 1, 0, 3, 5]])
percents = np.array([[1, 2, 3, 4, 5], [1, 2, 3, 4, 5]])
>>> np.repeat(percents.ravel(), weights.ravel())
array([2, 3, 4, 4, 4, 4, 5, 5, 2, 4, 4, 4, 5, 5, 5, 5, 5])

And after that you need to select index locations where to split it:

>>> np.split(np.repeat(percents.ravel(), weights.ravel()), np.sum(weights, axis=1)[:-1])
[array([2, 3, 4, 4, 4, 4, 5, 5]), array([2, 4, 4, 4, 5, 5, 5, 5, 5])]

Note that np.split is quite unefficient operation as well as your wish to make array out of rows of unequal lenghts.

CodePudding user response：

You can use list-comprehension and reduce from functools:

import functools
res=[functools.reduce(lambda x,y: x y,
                [x*[y] for x, y in zip(w, p)])
                for w, p in zip(weights, percents)]

OUTPUT:

[[2, 3, 4, 4, 4, 4, 5, 5],
 [2, 4, 4, 4, 5, 5, 5, 5, 5]]

Or, just list-comprehension solution only:

res= [[j for i in [x*[y]
              for x, y in zip(w, p)]
                for j in i]
    for w, p in zip(weights, percents)]

OUTPUT:

[[2, 3, 4, 4, 4, 4, 5, 5],
 [2, 4, 4, 4, 5, 5, 5, 5, 5]]