Home > other >  Find most frequent value per coordinate in multiple 2d Arrays
Find most frequent value per coordinate in multiple 2d Arrays

Time:06-12

I have multiple 2d arrays like this for example:

A = [[-1, -1, 0, 1, -1], [1, 1, 0, -1, -1], [-1, -1, -1, -1, -1], [-1, 1, -1, -1, 0]]
B = [[-1, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
C = [[0, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
D = [[-1, -1, 0, 1, 0], [0, 0, -1, 0, 1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]

I need to find the most frequent value across each respective coordinate so the output would be like this:

E = [[-1 -1 0 1 -1],[1 -1 0 -1 -1],[0 1 -1 1 -1],[-1 1 -1 -1 -1]]

I can definitely loop through each of these arrays but I was looking for a vectorised approach. The elements can be around 10-11 in number and arrays dimensions are around 900X900.

Is it possible to solve this using list comprehension?

CodePudding user response:

Using list comprehension gets a little Hacky. Gave some work, but did it.

Basically you have to use nested child list comprehension, and the arrays must be of the same size for this to work.

To work with a matrix, it would need just 1 nested list, but as we are working with a list of matrixes, it'll be 3 dimensional, so 2 nested childs.

The import mode I used to get the most dominant value.

from statistics import mode


A = [[-1, -1, 0, 1, -1], [1, 1, 0, -1, -1], [-1, -1, -1, -1, -1], [-1, 1, -1, -1, 0]]
B = [[-1, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
C = [[0, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]
D = [[-1, -1, 0, 1, 0], [0, 0, -1, 0, 1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]

matrixes = [A, B, C, D]

result = [[mode([x[k][j] for x in matrixes]) for j in range(len(matrixes[0][0]))] for k in range(len([x[0][0] for x in matrixes]))]


print(result)

result:

[[-1, -1, 0, 1, -1], [1, -1, 0, -1, -1], [0, 1, -1, 1, -1], [-1, 1, -1, -1, -1]]

CodePudding user response:

You can zip all arrays and in each cell of rows and columns of all arrays compute count and find max and save max like below:

import numpy as np
from collections import Counter
def cell_wise_cnt(arrs):
    n_row = 0
    res = np.empty((len(arrs[0]),len(arrs[0][0])))
    for row in zip(*arrs):
        arr = np.array(row)
        num_col = len(arr[0])
        for col in range(num_col):
            res[n_row][col] = Counter(arr[:, col]).most_common()[0][0]
        n_row  = 1
    return res

Output:

>>> cell_wise_cnt(arrs = (A,B,C,D))

array([[-1., -1.,  0.,  1., -1.],
       [ 1., -1.,  0., -1., -1.],
       [ 0.,  1., -1.,  1., -1.],
       [-1.,  1., -1., -1., -1.]])

Benchmark on colab:

%timeit cell_wise_cnt(arrs = (A,B,C,D))
# 136 µs per loop

% timeit scipy.stats.mode([A,B,C,D]).mode
# 585 µs per loop

%timeit stats.mode((A,B,C,D)*100_000).mode
# 1.73 s per loop

%timeit cell_wise_cnt(arrs = (A,B,C,D)*100_000)
# 2.38 s per loop

With python 3.9 and Julio_Lopes's Answer we can get a better run_time.:

import statistics
def Julio_Lopes(arrs):
    return [[statistics.mode(j)  for j in zip(*i)] for i in zip(*arrs)]

%timeit Julio_Lopes(arrs = (A,B,C,D))
# 106 µs per loop

%timeit Julio_Lopes(arrs = (A,B,C,D)*100_000)
# 653 ms per loop

CodePudding user response:

As suggested by @Michael Szczesny: You can simply use scipy.stats.mode:

from scipy import stats

arr = [A,B,C,D]
arr.mode(matrix)[0][0].tolist()

#output
 [[-1, -1, 0, 1, -1],
 [1, -1, 0, -1, -1],
 [0, 1, -1, 1, -1],
 [-1, 1, -1, -1, -1]]
  • Related