Home > Mobile >  Different approaches for finding the max 'k' values in a Python list for implementing knn
Different approaches for finding the max 'k' values in a Python list for implementing knn

Time:12-20

I tried to find the max 3 values in the list for implementing my knn model. While trying to do so, I did it using the method that was intuitive to me the code was something as follows `

first_k = X_train['distance'].sort_values().head(k)
prediction = first_k.value_counts().idxmax()

` The first_k list contains the first k elements from the sorted values of the distance column. Prediction is what the model will return at last.

Another approach I found on the internet was this `

prediction = y_train[X_train["distance"].nsmallest(n=k).index].mode()[0]

` The second approach yields the correct results and my approach did not work as intended. Can someone explain to me why my approach did not work.

CodePudding user response:

The difference is in the usage of .index after the method nsmallest(n=k) in the alternative approach. What you are doing in your code is the following:

  1. Sort X using distance as sorting key, then take the first k elements in the sorted dataset
  2. Check the distance frequency and the the first occurrence of the most frequent distance

The alternative approach instead does the following steps:

  1. Recover the k smallest elements in the distance column
  2. Get the corresponding index value of the rows recovered in the previous step (for example with k=5 it could be an element that when printed shows something similar to Int64Index([3, 9, 10, 1, 8], dtype='int64')
  3. Recover in y the labels with the same index values of the ones recovered in the previous step
  4. Get the most frequent label in y (or the mode)

So, as you can see, the main difference is the fact that the most frequent distance is not necessarily the most frequent class among the K neighbours that you have recovered.

Anyway you code can be easily fixed:

first_k = X_train['distance'].sort_values().head(k).index
prediction = y_train[first_k].mode()[0]
  • Related