Hello I want to look for a specific block of data inside a dataframe with python and pandas. Lets assume I have a dataframe like this:
A B C D E
1 3 5 7 9
5 6 7 8 9
2 4 6 8 8
5 4 3 2 1
and I want to iterate over the dataframe and look for a specific block of data and return the location of that data. Lets say this one:
7 8 9
6 8 8
How can I achieve this in a reasonable runtime?
My solution is taking to much time since I'm looping over and over over the dataframes and I'm sure there is a way better solution for this kind of problem.
CodePudding user response:
Assuming this DataFrame and array as input:
df = pd.DataFrame({'A': [1, 5, 2, 5], 'B': [3, 6, 4, 4], 'C': [5, 7, 6, 3], 'D': [7, 8, 8, 2], 'E': [9, 9, 8, 1
a = np.array([[7, 8, 9], [6, 8, 8]])
You can use numpy's sliding_window_view
:
from numpy.lib.stride_tricks import sliding_window_view as swv
idx, col = np.where((swv(df, a.shape) == a).all(axis=(-1, -2)))
out = list(zip(df.index[idx], df.columns[col]))
Output:
[(1, 'C')]
CodePudding user response:
Let us do signal
#b = np.array([[7,8,9],[6,8,8]])
from scipy import signal
c = signal.correlate(df.values, b, 'valid')
matched = np.where(c == signal.correlate(b,b,'valid'))
print(df.index[matched[0]],df.columns[matched[1]])
Int64Index([1], dtype='int64') Index(['C'], dtype='object')