Home > Enterprise >  Pandas dataframe - trying to look for nearest values in arrays
Pandas dataframe - trying to look for nearest values in arrays

Time:04-25

I have a Pandas dataframe that looks something like this:

Values Check
[0.01, -0.5, 0.07] 0.1
[0.03, 0.04, 0.08] 0.2

I would like to add a column to this dataframe in which I would have, for each row, the index of the most similar value in the array of "Values" to the value of the column "Check". For instance, in the example I put I will have in both cases "0", since in both the first value of the array is the nearest to the Check.

For that I tried this code:

First of all, I defined a function to look for the nearest value:

def find_nearest(array, value):

   array = np.array(array)
   nearest_index = np.where(abs(array - value) == abs(array - value).min())[0]
   nearest_value = array[abs(array - value) == abs(array - value).min()]
   return nearest_index, nearest_value

Then I use it in my dataframe:

data['nearest'] = find_nearest(data['Values'], data['Check'][0])

However, I get this error:

The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Does anybody know how to solve this? I tried using the function with just an array and a value and it works, so I think that the problem is that the function doesn't work with dataframes, but I don't really know how to do it otherwise.

Thank you in advance.

CodePudding user response:

IIUC use:

data['nearest'] = [find_nearest(a, b)  for a, b in zip(data['Values'], data['Check'])]

Or:

data['nearest'] = data.apply(lambda x: find_nearest(x['Values'], x['Check']), axis=1)
print (data)
               Values  Check        nearest
0  [0.01, -0.5, 0.07]    0.1  ([2], [0.07])
1  [0.03, 0.04, 0.08]    0.2  ([2], [0.08])
  • Related