I am creating a function that finds a k-nearest neighbors prediction.
def knn_predict(data, x_new, k):
""" (tuple, number, int) -> number
data is a tuple.
data[0] are the x coordinates and
data[1] are the y coordinates.
k is a positive nearest neighbor parameter.
Returns k-nearest neighbor estimate using nearest
neighbor parameter k at x_new.
Assumes i) there are no duplicated values in data[0],
ii) data[0] is sorted in ascending order, and
iii) x_new falls between min(x) and max(x).
>>> knn_predict(([0, 5, 10, 15], [1, 7, -5, 11]), 2, 2)
4.0
>>> knn_predict(([0, 5, 10, 15], [1, 7, -5, 11]), 2, 3)
1.0
>>> knn_predict(([0, 5, 10, 15], [1, 7, -5, 11]), 8, 2)
1.0
>>> knn_predict(([0, 5, 10, 15], [1, 7, -5, 11]), 8, 3)
4.333333333333333
"""
#use find_index and the x_new value for k loops to find N\/k(x_new) (list of indexes)
#incorporate k value!!!
nk = [find_index(data[0], x_new) for k in range(k)] #here
#use N\/k(x_new) indexes to find correlated y values
yvals = [data[1][val] for val in nk]
#use correlated y values summed together divided by k to find y new
ynew = sum(yvals) / k
return ynew
important line:
nk = [find_index(data[0], x_new) for k in range(k)] #here
The line with #here at the end is supposed to use this function:
def find_index(x, x_new):
""" (list, number) -> int
Returns the smallest index i such that x[i] <= x_new
and x[i 1] >= x_new.
Assumes i) there are no duplicated values in x,
ii) x is sorted in ascending order, and
iii) x_new falls between min(x) and max(x).
>>> find_index([1, 5, 7, 9], 1)
0
>>> find_index([1, 5, 7, 9], 2)
0
>>> find_index([1, 5, 7, 9], 6)
1
>>> find_index([1, 5, 7, 9], 7)
1
>>> find_index([1, 5, 7, 9], 8)
2
>>> find_index([1, 5, 7, 9], 9)
2
"""
for i, element in enumerate(x):
if x_new <= x[i 1] and element <= x_new:
return i
and return the indexes. k is the number of indexes it will find. How can I correctly fix the line so that it finds k number of index (the list should be k long)
CodePudding user response:
This solves your problem in three lines of code. I don't think find_index
is useful at all. Notice that this code doesn't care if the entries are in order or not, or even whether the value is between min(x) and max(x).
def knn_predict(data, x_new, k):
""" (tuple, number, int) -> number
data is a tuple.
data[0] are the x coordinates and
data[1] are the y coordinates.
k is a positive nearest neighbor parameter.
Returns k-nearest neighbor estimate using nearest
neighbor parameter k at x_new.
"""
# Find the deltas from our target to the x values.
deltas = [(abs(t-x_new),y) for t,y in zip(*data)]
# Sort the values by the distance.
deltas.sort()
# Return the sum of the Ys.
return sum( d[1] for d in deltas[:k] ) / k
print( knn_predict(([0, 5, 10, 15], [1, 7, -5, 11]), 2, 2) )
print( knn_predict(([0, 5, 10, 15], [1, 7, -5, 11]), 2, 3) )
print( knn_predict(([0, 5, 10, 15], [1, 7, -5, 11]), 8, 2) )
print( knn_predict(([0, 5, 10, 15], [1, 7, -5, 11]), 8, 3) )
Output:
4.0
1.0
1.0
4.333333333333333