Home > OS >  python pandas search for specific blocks of data inside a dataframe
python pandas search for specific blocks of data inside a dataframe

Time:11-21

Hello I want to look for a specific block of data inside a dataframe with python and pandas. Lets assume I have a dataframe like this:

A  B  C  D  E
1  3  5  7  9
5  6  7  8  9 
2  4  6  8  8
5  4  3  2  1

and I want to iterate over the dataframe and look for a specific block of data and return the location of that data. Lets say this one:

7  8  9
6  8  8

How can I achieve this in a reasonable runtime?

My solution is taking to much time since I'm looping over and over over the dataframes and I'm sure there is a way better solution for this kind of problem.

CodePudding user response:

Assuming this DataFrame and array as input:

df = pd.DataFrame({'A': [1, 5, 2, 5], 'B': [3, 6, 4, 4], 'C': [5, 7, 6, 3], 'D': [7, 8, 8, 2], 'E': [9, 9, 8, 1

a = np.array([[7, 8, 9], [6, 8,  8]])

You can use 's sliding_window_view:

from numpy.lib.stride_tricks import sliding_window_view as swv

idx, col = np.where((swv(df, a.shape) == a).all(axis=(-1, -2)))

out = list(zip(df.index[idx], df.columns[col]))

Output:

[(1, 'C')]

CodePudding user response:

Let us do signal

#b = np.array([[7,8,9],[6,8,8]])

from scipy import signal
c = signal.correlate(df.values, b, 'valid')
matched = np.where(c == signal.correlate(b,b,'valid'))
print(df.index[matched[0]],df.columns[matched[1]])
Int64Index([1], dtype='int64') Index(['C'], dtype='object')
  • Related