I have a csv file with places and latitudes and longitudes. I want to create matrix based on them. I tried creating matrix using:
arr = df['latitude'].values - df['latitude'].values[:, None]
pd.concat((df['name'], pd.DataFrame(arr, columns=df['name'])), axis=1)
but it only creates matrix with latitude values and I want to calculate distance between places.So the matrix I want to get will be the matrix of distances between all of the hotels.
CodePudding user response:
Based on the answer of @ravenspoint here a simple code to calculate distance.
>>> import numpy as np
>>> import pandas as pd
>>> import geopy.distance
>>> data = {"hotels": ["1", "2", "3", "4"], "lat": [20.55697, 21.123698, 25.35487, 19.12577], "long": [17.1, 18.45893, 16.78214, 14.75498]}
>>> df = pd.DataFrame(data)
>>> df
hotels lat long
1 20.556970 17.10000
2 21.123698 18.45893
3 25.354870 16.78214
4 19.125770 14.75498
Now lets create a matrix to map distance between hotels. The matrix should have the size (nbr of hotels x nbr of hotels).
>>> matrix = np.ones((len(df), len(df))) * -1
>>> np.fill_diagonal(matrix, 0)
>>> matrix
array([[ 0., -1., -1., -1.],
[-1., 0., -1., -1.],
[-1., -1., 0., -1.],
[-1., -1., -1., 0.]])
So here -1 is to avoid the calculation of the same distance twice as dist(1,2) = dist(2,1).
Next, just loop over hotels and calculate the distance. Here the geopy package is used.
>>> for i in range(len(df)):
coords_i = df.loc[i, ["lat", "long"]].values
for j in range(i 1, len(df)):
coords_j = df.loc[j, ["lat", "long"]].values
matrix[i,j] = geopy.distance.geodesic(coords_i, coords_j).km
>>> matrix
array([[ 0. , 154.73003254, 532.33605633, 292.29813424],
[ -1. , 0. , 499.00500751, 445.97821702],
[ -1. , -1. , 0. , 720.69054683],
[ -1. , -1. , -1. , 0. ]])
Please note that the nested loop is not the best way to do the job, and the code can be enhanced.
CodePudding user response:
- Read the CSV input file for hotel name, lat and lon,
placing them in a table of three columns.
- LOOP A over the hotels
- LOOP B over the hotels, starting with next hotel after A
- Calculate D distance between A and B
- Store D in matrix at column A and row B
- Store D in matrix at column B and row A
If the hotels are scattered over a wide area, you will need to use the Haversine formula to calculate accurate distances.