so basically I create a program that based on the user input of house coordinates (x,y), it will assign an optimal spot in R2 for a police station to be built based on the average manhattan distance. Obviously, the less the better. Here is what I've already written:
L=[]
K=[]
Kx=[]
Ky=[]
NoHouses=int(input("How many houses are there? : "))
for i in range(1,NoHouses 1):
plot1 = input("coordinate(x, y) for house" str(i) " :").split()
plot1 = [int(i) for i in plot1]
K.append(plot1)
Kx.append(plot1[0])
Ky.append(plot1[1])
Ky.sort()
Kx.sort()
for i in range(Kx[0],Kx[-1] 1):
for j in range(Ky[0],Ky[-1] 1):
L.append([i,j])
print("House Coordinates: ",(K))
print("List of x coordinate: ",(Kx))
print("List of y coordinates: ",(Ky))
print("Possible police station coordinates: ",(L))
basically what I try to do is to find the manhattan distances from the possible police station coordinate to the houses given by the user. So in the end I should be left with a list "D" that has len(L)==len(D) (num of police station coordinates== num of manhattan distances) , which then I will sort and find the minimum . The problem I face is finding those manhattan distances for each possible police station coordinates. What am I doing wrong? While I expected the output to be len(L) it seems to be random. Below is what I've tried :
D=[]
K=[[1,2],[2,2]] #test case
L=[[1,1],[2,6]] #testcase
def manhattan_distance(point1,point2):
return sum(abs(value1 - value2) for value1, value2 in zip(point1,point2))
for i in range(len(L)):
for j in range(len(K)):
x1=L[i]
x2=K[j]
D.append(manhattan_distance(x1,x2))
print(manhattan_distance(x1,x2))
print(D)
CodePudding user response:
The code you attached worked for 1 coordinate pair. For multiple coordinate pairs you kept appending to D
instead of creating a new D2
for the new value of the coordinates. For multiple values of coordinate pairs I would suggest the following:
D=[]
K=[[1,2], [2,9]] #test case
L=[[1,1],[2,6], [4,5]] #testcase
def manhattan_distance(point1,point2):
return sum(abs(value1 - value2) for value1, value2 in zip(point1,point2))
D = {}
for ii, k in enumerate(K):
temp = []
for l in L:
temp.append(manhattan_distance(l,k))
D[ii] = temp
print(D) # {0: [1, 5, 6], 1: [9, 3, 6]}
This will create a dictionary with a list per coordinate pair in K.
However, you can do the same without the for loops, thus improving runtime drastically for large L values. To do so, we will use the numpy
package. The idea is to compute everything at once using vector calculation instead of running the for loop. Using the same data as above, we can do the following:
L = np.array(L)
K = np.array(K)
Now L
is a (?,2) shapes array holding all the coordinates and K is a (2,) 1 dimensional array. By performing subtraction as follows:
temp = L - K
What happens is that numpy
broadcasts K
to all rows of L
, performing the subtraction to all rows at the same time. Then, performing absolute and summing over the column dimension will result in the wanted D value:
D = {}
for ii, k in enumerate(K):
temp = np.sum(np.abs(L - k), axis=1)
D[ii] = list(temp)
print(D)
If you want it in list type you can then convert the numpy array
back to list.
Running both version with L which has a size of (100000, 2) using %%timeit
resulted in:
- Using the for loop approach, it took
282 ms ± 6.71 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
- Using numpy, it took
17.4 ms ± 646 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
An order of magnitude faster.