Finding specific cell inside Pandas Dataframe based on most similar column and index labels (when co-CodePudding

I have dataframe with around 500 columns and 300 rows and it looks like the example below. I need to select specific dataframe cell based on most similar column label and index label when compared to a reference.

Let me explain my problem: Let's say that I need to find a cell which has column label most similar to refence x=0.561697 and index label most similar to reference y=-0.12849. Most probably, there are not the references' labels among my dataframe's columns and index, however there are very similar labels and that's what I want to find. Another problem is that sometimes is the similarity up to three decimal number, sometimes only up to two decimal numbers. An ideal output would be a cell which column and index labels are most similar to the references (i.e. they share the same number on most decimal numbers).

Can I somehow modify methods "loc" and "iloc" to search for similarity and not exact label/values? Or are there other methods right for this purpose (even outside pandas)? Thank you for suggestions.

#example of my dataframe
my_index=[[-0.176931, -0.15578987, -0.134648739]]
my_columns=[[0.447852, 0.568911395, 0.31997079, 0.451030185, 0.45208958]]
data=[[-6.027819824, -7.581473207, -9.277630354, -10.967289156, -12.490250252], [-13.749975397, -14.709719522, -15.317946078, -15.45502317, -14.990571819], [-13.922128986, -12.463674538, -10.987597885, -9.843527599, -9.179409063]]
df=pd.DataFrame(data)
df.columns=my_columns
df1=df.set_index(my_index)
df1

Using this example, the desired output would be only the cell with value "-12.463675" with column label "0.568911395" and index label "-0.134648739"

DataFrame df1:

            0.447852   0.568911   0.319971   0.451030   0.452090
-0.176931  -6.027820  -7.581473  -9.277630 -10.967289 -12.490250
-0.155790 -13.749975 -14.709720 -15.317946 -15.455023 -14.990572
-0.134649 -13.922129 -12.463675 -10.987598  -9.843528  -9.179409

CodePudding user response：

Assuming you fix you columns' Index to be 1D:

df1.columns = my_columns[0]
# Float64Index([0.447852, 0.568911395, 0.31997079, 0.451030185, 0.45208958], dtype='float64')

You can use the minimal absolute difference to your target:

import numpy as np

out = df1.iloc[np.argmin(abs(df1.index-y)), np.argmin(abs(df1.columns-x))]

output: -12.463674538

Intermediate:

np.argmin(abs(df1.index-y)), np.argmin(abs(df1.columns-x))

output: (2, 1)

Coordinates:

y_pos = np.argmin(abs(df1.index-y))
x_pos = np.argmin(abs(df1.columns-x))
df1.index[y_pos], df1.columns[x_pos]

output: (-0.134648739, 0.568911395)