How to find 3 similar numbers in a column of a data frame in python pandas-CodePudding

numbers = ['1.46', '1.59', '1.43', '1.42', '1.45', '1.65', '1.35', '1.39', '1.55', '1.88', '1.43']

All I want is to get the list of the 3 closest numbers to each other.

In this case the numbers would be 1.43 1.42 1.43.

I can not find any help anywhere. Some people use an input for example

test = nsmallest(3, price, key=lambda x: abs(x - 1.42))

but I don't want to put an input.

CodePudding user response：

I'd get all fancy with this using pandas:

s = pd.Series([float(i) for i in numbers])
ss = s.sort_values()
idx = ss.rolling(3).apply(lambda x: abs(x.iloc[0]-x.iloc[2])).idxmin()
i = ss.index.get_loc(idx)
ss.iloc[i-2:i 1].to_numpy()

Output:

array([1.42, 1.43, 1.43])

CodePudding user response：

Pandas is not nescessarily the best framework to do it but it is possible

numbers = pd.Series(numbers)
numbers = numbers.astype('float')
numbers = numbers.sort_values()
numbers = numbers.reset_index(drop=True)
smallest_index = numbers.diff(2).idxmin()
numbers.loc[smallest_index-2:smallest_index].values

CodePudding user response：

This works for me

def mostSimilar(numbers):
    sorted_array = numbers.copy()
    sorted_array.sort()
    diff = float(sorted_array[len(sorted_array)-1]) - float(sorted_array[0])
    most_similar_values = None
    for i in range(len(numbers)-3):
        tmpDiff = float(sorted_array[i 1])-float(sorted_array[i])   float(sorted_array[i 2])-float(sorted_array[i 1])
        if tmpDiff < diff:
            diff = tmpDiff
            most_similar_values = (sorted_array[i], sorted_array[i 1], sorted_array[i 2])
    return most_similar_values

CodePudding user response：

You can use this- see below for an alternative solution (same logic, just pure pandas method chaining fun) as well as an explanation of the logic.

window_size = 3
sorted_numbers = pd.Series(numbers).astype(float).sort_values()
mingroup_right = sorted_numbers.diff(window_size-1).argmin()   1
out = sorted_numbers.iloc[mingroup_right-window_size : mingroup_right]

print(out)
3     1.42
2     1.43
10    1.43
dtype: float64

Alternatively, this one is for the pandas method chaining addicts out there:

window_size = 3
out = (
    pd.Series(numbers)
    .astype(float)
    .sort_values()
    .iloc[lambda s: 
        slice( 
            min_pos := s.diff(window_size-1).argmin() - window_size   1,  
            min_pos   window_size
        )
    ]
)

print(out)
3     1.42
2     1.43
10    1.43
dtype: float64

The logic:

window_size → the number of adjacent floats we want to compare at one time
coerce all values to floats
sort them to move adjacent values near eachotheer
diff(window_size-1) will subtract the first and last values in each group of size window_size.
- Finding the minimum values along this output yields the position of the group whose values are all near each other
use argmin to get the position of the minimum diff value, then offset that by window_size to get the positions of the range of values and extract the corresponding slice
.iloc pairs with our argmin() based slice to extract the group from the original array