I have a Pandas dataframe that looks something like this:
Values | Check |
---|---|
[0.01, -0.5, 0.07] | 0.1 |
[0.03, 0.04, 0.08] | 0.2 |
I would like to add a column to this dataframe in which I would have, for each row, the index of the most similar value in the array of "Values" to the value of the column "Check". For instance, in the example I put I will have in both cases "0", since in both the first value of the array is the nearest to the Check.
For that I tried this code:
First of all, I defined a function to look for the nearest value:
def find_nearest(array, value):
array = np.array(array)
nearest_index = np.where(abs(array - value) == abs(array - value).min())[0]
nearest_value = array[abs(array - value) == abs(array - value).min()]
return nearest_index, nearest_value
Then I use it in my dataframe:
data['nearest'] = find_nearest(data['Values'], data['Check'][0])
However, I get this error:
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Does anybody know how to solve this? I tried using the function with just an array and a value and it works, so I think that the problem is that the function doesn't work with dataframes, but I don't really know how to do it otherwise.
Thank you in advance.
CodePudding user response:
IIUC use:
data['nearest'] = [find_nearest(a, b) for a, b in zip(data['Values'], data['Check'])]
Or:
data['nearest'] = data.apply(lambda x: find_nearest(x['Values'], x['Check']), axis=1)
print (data)
Values Check nearest
0 [0.01, -0.5, 0.07] 0.1 ([2], [0.07])
1 [0.03, 0.04, 0.08] 0.2 ([2], [0.08])