Manipulating a distance matrix for intersection over time intervals-CodePudding

I've created distance matrices for time steps at every 0.1 seconds for intervals of 60 seconds. The matrices look so for each time step with distance values populating them:

time = 0.1

    a1    b2     c3     d4
a1   0   5.4    9.1    10.1
b2  5.4    0    5.0     3.2
c3  9.1  5.0      0     6.6
d4  10.1 3.2    6.6       0

time = 0.2

    a1    b2     c3     d4
a1  0    2.4    9.1    12.1
b2  2.4   0     6.7     3.6
c3  9.1  6.7      0     9.6
d4  12.1 3.6    9.6       0

The goal is to generate an adjacency matrix, or neighbor list at the end of each 60 second interval (examining 600 dataframes) for neighbors that maintain a distance threshold the entire minute (in each distance matrix examined).

For example, if the distance limit is d=10, then for this 0.2 second sample it would return a list of [a1, b2, c3] since for that 0.2 second interval, they all maintained a distance less than 10.

I was wondering if there is a semi-efficient or clever way to do this with pandas and python.

CodePudding user response：

IIUC, since you have a symmetric matrix, you can use numpy to create boolean masks and filter the indices (or columns) with it. Since it's symmetric, it suffices to analyze either the upper triangle or the lower triangle (I chose lower triangle). Then among the numbers in the lower triangle, build a mask that returns False for the rows that contain a value greater than d.

import numpy as np
def get_neighbor_indices(df, d):
    less_than_d = np.tril(df.lt(d).to_numpy())
    upper_triangle_dummy = np.triu(np.ones(df.shape)==1)
    msk = (less_than_d | upper_triangle_dummy).all(axis=1)
    return df.index[msk].tolist()

>>> get_neighbor_indices(df1, 10)
['a1', 'b2', 'c3']
>>> get_neighbor_indices(df2, 10)
['a1', 'b2', 'c3']

CodePudding user response：

stack your dataframes along a 3rd dimension, then apply your threshold to get boolean values, then use numpy.logical_and.reduce to apply the "and" along your third dimension.

eg if dfs is a list of your dataframes then do

threshold = 10
stacked = np.stack(dfs, axis=2)
result = np.logical_and.reduce(stacked < threshold, axis=2)

You can then put result inside a dataframe with index and column names if you wish.