So I'm working with the discrete correlation function between two time series with data (xi, ti) and (xj,tj) with i and j = 1,2,3... A lag time is computed for each (i,j) pair of points. These lags are then stored in a numpy array dst[i,j]
where each (i,j) element represents the lag time for that pair.
I now want to collect the first n lags greater than some value and their (i, j) indices but I want them to be independent pairs such that no two pairs have the same i or same j term (so (1,2) and (3,2) wouldn't work).
As a simple example lets say I have:
dst = np.array([[0.2, 0.5, 0.9, 1.0],
[2.0, 3.0, 4.0, 5.0],
[7.0, 8.0, 12.0,13.0]])
And I want the first two pairs with a lag greater than 3. I began by creating a dictionary of the form {(i, j) : lag} containing all elements with a lag > 3 and then ordered it by the lag value.
idxi, idxj = np.where(dst>3)
mydict = {}
for i, j in zip(idxi, idxj):
mydict[(i,j)] = dst[i,j]
mydict = {k: v for k, v in sorted(mydict.items(), key=lambda item: item[1])}
#so now mydict = {(1, 2): 4.0, (1, 3): 5.0, (2, 0): 7.0, (2, 1): 8.0, (2, 2): 12.0, (2, 3): 13.0}
So the first two independent terms would be (1,2) and (2,0). But I am unsure of the best way of getting these first two pairs while also making sure no two pairs have the same i and j term. I'm sure I could think of a complicated way of doing this but I am looking for a more pythonic and quick method. I'm a bit new to manipulating numpy arrays and would like to know the best way of achieving my goal. So how can I get the first two independent pairs here, and is there a way to do this whole process without creating a sorted dictionary?
CodePudding user response:
I'm not sure if this counts as too complicated, but I think it works at least.
dst = [
[0.2, 0.5, 0.9, 1.0],
[2.0, 3.0, 4.0, 5.0],
[7.0, 8.0, 12.0, 13.0]
]
def find_minimum_value(row):
for column_index, value in enumerate(row):
if value > 3:
return column_index, value
answer = {}
for row_index, row in enumerate(dst):
values = find_minimum_value(row)
if values:
column_index, value = values
answer[(row_index, column_index)] = value
print(answer)
CodePudding user response:
One approach is to use masked arrays to avoid computing the same indices:
import numpy as np
dst = np.array([[0.2, 0.5, 0.9, 1.0],
[2.0, 3.0, 4.0, 5.0],
[7.0, 8.0, 12.0, 13.0]])
def find_maximums(initial, k=2):
for _ in range(k):
# find the minimum index and transform to multi-dimensional index
arg_min = np.unravel_index(initial.argmin(), initial.shape)
# mask the whole row and the whole column to avoid same indexes
initial.mask[arg_min[0], :] = initial.mask[:, arg_min[1]] = True
yield arg_min
res = list(find_maximums(np.ma.masked_less_equal(dst, 3)))
print(res)
Output
[(1, 2), (2, 0)]