Home > other >  Numpy Sum Rows of 2D Array uniquely (no sequence duplicates)
Numpy Sum Rows of 2D Array uniquely (no sequence duplicates)

Time:10-10

I have the following array

import numpy as np

single_array =
[[ 1 80 80 80]
 [ 2 80 80 89]
 [ 3 52 50 90]
 [ 4 39 34 54]
 [ 5 37 47 32]
 [ 6 42 42 27]
 [ 7 42 52 27]
 [ 8 38 33 28]
 [ 9 42 37 42]]

and want to create another array with all unique sums of 2 rows within this single_array so that 1 2 and 2 1 are treated as duplicates and are only included once.

First I would like to update the 0th column of the array to multiply each value by 10 (so I can identify the corresponding matching), then I want to add up every 2 rows and append them into the new array.

Output should look like this:

double_array=
[[12 160 160 169]
 [13 132 130 170]
 [14 119 114 134]
...
 [98 80 70 70]]

Can I use itertools.combinations to get a 3D array with two unique combinations and then add the rows on the corresponding 3rd axis?

CodePudding user response:

This

import numpy as np
from itertools import combinations

single_array = np.array(
[[ 1, 80, 80, 80],
 [ 2, 80, 80, 89],
 [ 3, 52, 50, 90],
 [ 4, 39, 34, 54],
 [ 5, 37, 47, 32],
 [ 6, 42, 42, 27],
 [ 7, 42, 52, 27],
 [ 8, 38, 33, 28],
 [ 9, 42, 37, 42]]
)

np.vstack([single_array[i] * np.array([10, 1, 1, 1])   single_array[j] 
           for i, j in combinations(range(single_array.shape[0]), 2)])

does what you ask for in terms of specified input and output; I'm not sure if it's what you actually need. I don't think it will scale to big inputs.

A 3D array to find this sum would be ragged (first "layer" would be 9 deep, next one 8, etc.); you could maybe get around this with NaNs or masking. It also wouldn't scale that well for big inputs: you'd be allocating twice as much memory as you need, and then have to index out ragged layers to get your final output.

If you have to do this fast for big arrays, I suggest a pre-allocated output array and a for-loop with Numba:

from numba import jit

@jit(nopython=True)
def unique_row_sums(a):
    n = a.shape[0]
    b = np.empty((n*(n-1)//2, a.shape[1]))
    s = np.array([10, 1, 1, 1])
    k = 0
    for i in range(n):
        for j in range(i 1, n):
            b[k] = s * a[i]   a[j]
            k  = 1
    return b

In my not-too-careful testing with IPython's %timeit, this took about 4µs versus 152µs for the itertools-based version with your data, and should scale better.

  • Related