Home > Back-end >  Finding specific cell inside Pandas Dataframe based on most similar column and index labels (when co
Finding specific cell inside Pandas Dataframe based on most similar column and index labels (when co

Time:11-22

I have dataframe with around 500 columns and 300 rows and it looks like the example below. I need to select specific dataframe cell based on most similar column label and index label when compared to a reference.

Let me explain my problem: Let's say that I need to find a cell which has column label most similar to refence x=0.561697 and index label most similar to reference y=-0.12849. Most probably, there are not the references' labels among my dataframe's columns and index, however there are very similar labels and that's what I want to find. Another problem is that sometimes is the similarity up to three decimal number, sometimes only up to two decimal numbers. An ideal output would be a cell which column and index labels are most similar to the references (i.e. they share the same number on most decimal numbers).

Can I somehow modify methods "loc" and "iloc" to search for similarity and not exact label/values? Or are there other methods right for this purpose (even outside pandas)? Thank you for suggestions.

#example of my dataframe
my_index=[[-0.176931, -0.15578987, -0.134648739]]
my_columns=[[0.447852, 0.568911395, 0.31997079, 0.451030185, 0.45208958]]
data=[[-6.027819824, -7.581473207, -9.277630354, -10.967289156, -12.490250252], [-13.749975397, -14.709719522, -15.317946078, -15.45502317, -14.990571819], [-13.922128986, -12.463674538, -10.987597885, -9.843527599, -9.179409063]]
df=pd.DataFrame(data)
df.columns=my_columns
df1=df.set_index(my_index)
df1

Using this example, the desired output would be only the cell with value "-12.463675" with column label "0.568911395" and index label "-0.134648739"

DataFrame df1:

            0.447852   0.568911   0.319971   0.451030   0.452090
-0.176931  -6.027820  -7.581473  -9.277630 -10.967289 -12.490250
-0.155790 -13.749975 -14.709720 -15.317946 -15.455023 -14.990572
-0.134649 -13.922129 -12.463675 -10.987598  -9.843528  -9.179409

CodePudding user response:

Assuming you fix you columns' Index to be 1D:

df1.columns = my_columns[0]
# Float64Index([0.447852, 0.568911395, 0.31997079, 0.451030185, 0.45208958], dtype='float64')

You can use the minimal absolute difference to your target:

import numpy as np

out = df1.iloc[np.argmin(abs(df1.index-y)), np.argmin(abs(df1.columns-x))]

output: -12.463674538

Intermediate:

np.argmin(abs(df1.index-y)), np.argmin(abs(df1.columns-x))

output: (2, 1)

Coordinates:

y_pos = np.argmin(abs(df1.index-y))
x_pos = np.argmin(abs(df1.columns-x))
df1.index[y_pos], df1.columns[x_pos]

output: (-0.134648739, 0.568911395)

  • Related