Home > Software design >  Nearest Neighbor Search is too long for multiple datas
Nearest Neighbor Search is too long for multiple datas

Time:02-22

Firstly, i have an image that I pass in arguments, and i retrieve all of his contours with OpenCV (with the cv.findContours method). I parse this list with my parseArray method to have a well parsed list of x,y contours coordinates of the img [(x1, y1), (x2, y2), ...] (The size of this list equals 24163 for my unicorn image)

So here is my code:

def parseArray(array):
    parsedArray = []
    for i in array:
        for j in i:
            parsedArray.append((j[0][0], j[0][1]))
    return parsedArray

def delItemList(index, list):
    del list[index: index   1]

img = cv.imread(sys.argv[1])

canny = cv.Canny(img, 215, 275)
contours, hierarchies = cv.findContours(canny,cv.RETR_LIST, cv.CHAIN_APPROX_NONE)
parsedArray = parseArray(contours)

drawList = []

while (len(parsedArray) > 0):
    tmp = [(0,0)]
    tree = KDTree(parsedArray)
    dist, ind = tree.query(tmp, k=1)
    tmp[0] = parsedArray[int(ind)]
    drawList.append(parsedArray[int(ind)])
    delItemList(int(ind), parsedArray)

And here is a time of this :

enter image description here

How can i reduce strongly the time of my loop (less than one second), is it possible?

CodePudding user response:

I think you spend most of your time in your while loop so I will focus on those lines:

while (len(parsedArray) > 0):
    tmp = [(0,0)]
    tree = KDTree(parsedArray)
    dist, ind = tree.query(tmp, k=1)
    tmp[0] = parsedArray[int(ind)]
    drawList.append(parsedArray[int(ind)])
    delItemList(int(ind), parsedArray)

My understanding is that you want to use a KDTree to find the nearest neighbor of the point [(0,0]] among the points of your contour and that once you find it, you remove it from the contour points and start again. This is costly because you are creating a complex structure that is optimised to perform nearest neighbor query only for one query and then you create it again and again. I can suggest you two optimisations:

  1. If you want to keep the KDTree for whatever reason, then query all the points at once: tree.query(tmp, k=len(parsedArray)) (c.f. scipy documentation)
  2. Compute the distance between [(0, 0)] and each point of your contour and sort them by this distance. You can find solution for this on other thread such as here
  • Related