Pythonic method of collecting numpy array elements that satisfy given conditions-CodePudding

So I'm working with the discrete correlation function between two time series with data (xi, ti) and (xj,tj) with i and j = 1,2,3... A lag time is computed for each (i,j) pair of points. These lags are then stored in a numpy array dst[i,j] where each (i,j) element represents the lag time for that pair.

I now want to collect the first n lags greater than some value and their (i, j) indices but I want them to be independent pairs such that no two pairs have the same i or same j term (so (1,2) and (3,2) wouldn't work).

As a simple example lets say I have:

dst = np.array([[0.2, 0.5, 0.9, 1.0],
                [2.0, 3.0, 4.0, 5.0],
                [7.0, 8.0, 12.0,13.0]])

And I want the first two pairs with a lag greater than 3. I began by creating a dictionary of the form {(i, j) : lag} containing all elements with a lag > 3 and then ordered it by the lag value.

idxi, idxj = np.where(dst>3)
mydict = {}
for i, j in zip(idxi, idxj):
    mydict[(i,j)] = dst[i,j]
mydict = {k: v for k, v in sorted(mydict.items(), key=lambda item: item[1])}

#so now mydict = {(1, 2): 4.0, (1, 3): 5.0, (2, 0): 7.0, (2, 1): 8.0, (2, 2): 12.0, (2, 3): 13.0}

So the first two independent terms would be (1,2) and (2,0). But I am unsure of the best way of getting these first two pairs while also making sure no two pairs have the same i and j term. I'm sure I could think of a complicated way of doing this but I am looking for a more pythonic and quick method. I'm a bit new to manipulating numpy arrays and would like to know the best way of achieving my goal. So how can I get the first two independent pairs here, and is there a way to do this whole process without creating a sorted dictionary?

CodePudding user response：

I'm not sure if this counts as too complicated, but I think it works at least.

dst = [
    [0.2, 0.5, 0.9, 1.0],
    [2.0, 3.0, 4.0, 5.0],
    [7.0, 8.0, 12.0, 13.0]
]


def find_minimum_value(row):
    for column_index, value in enumerate(row):
        if value > 3:
            return column_index, value


answer = {}
for row_index, row in enumerate(dst):
    values = find_minimum_value(row)
    if values:
        column_index, value = values
        answer[(row_index, column_index)] = value

print(answer)

CodePudding user response：

One approach is to use masked arrays to avoid computing the same indices:

import numpy as np

dst = np.array([[0.2, 0.5, 0.9, 1.0],
                [2.0, 3.0, 4.0, 5.0],
                [7.0, 8.0, 12.0, 13.0]])


def find_maximums(initial, k=2):
    for _ in range(k):
        # find the minimum index and transform to multi-dimensional index
        arg_min = np.unravel_index(initial.argmin(), initial.shape)
        # mask the whole row and the whole column to avoid same indexes
        initial.mask[arg_min[0], :] = initial.mask[:, arg_min[1]] = True
        yield arg_min


res = list(find_maximums(np.ma.masked_less_equal(dst, 3)))
print(res)

Output

[(1, 2), (2, 0)]