I am trying something that could be a little hard to understand but i will try to be very specific.
I have a dataframe of python like this
Locality | Count | Lat. | Long. |
---|---|---|---|
Krasnodar | Russia | 44 | 39 |
Tirana | Albania | 41.33 | 19.83 |
Areni | Armenia | 39.73 | 45.2 |
Kars | Armenia | 40.604517 | 43.100758 |
Brunn Wolfholz | Austria | 48.120396 | 16.291722 |
Kleinhadersdorf Flur Marchleiten | Austria | 48.663197 | 16.589687 |
Jalilabad district | Azerbaijan | 39.3607139 | 48.4613556 |
Zeyem Chaj | Azerbaijan | 40.9418889 | 45.8327778 |
Jalilabad district | Azerbaijan | 39.5186111 | 48.65 |
And a dataframe cities.txt
with a the name of some countries:
Albania
Armenia
Austria
Azerbaijan
And so on. The nex what I am doing is convert this Lat. and Long. values as radians and then with the values from the list do something like:
with open('cities.txt') as file:
lines=file.readlines()
x=np.where(df['Count'].eq(lines),pd.DataFrame(
dist.pairwise(df[['Lat.','Long.']].to_numpy())*6373,
columns=df.Locality.unique(), index=df.Locality.unique()))
Where pd.DataFrame(dist.pairwise(df[['Lat.','Long.']].to_numpy())*6373, columns=df.Locality.unique(), index=df.Locality.unique())
is converting radians in Lat. Long.
into distances in km and create a dataframe as a matrix for each line (country).
In the end i will have a lot of matrix 2d (in theory) grouped by countries and i want to apply this:
>>>Russia.min()
0
>>>Russia.max()
5
to get the .min()
and .max()
value in each matrix and save this results in cities.txt
as
Country Max.Dist. Min. Dist.
Albania 5 1
Armenia 10 9
Austria 5 3
Azerbaijan 0 0
Unfortunately, 1) I'm stock in the first part where I have an warning ValueError: Lengths must be equal
, 2) can be possible have this matrix grouped by country and 3) save my .min()
and .max()
values?
CodePudding user response:
I am not sure what you exactly want as minimum. In this solution, the minimum is 0 if there is only 1 city, but otherwise the shortest distance between 2 cities within the country. Also, the filename cities.txt
seems just a filter. I didn't do this but seems straightforward.
import numpy as np
import pandas as pd
Here just some sample data;
cities = pd.read_json("https://raw.githubusercontent.com/lutangar/cities.json/master/cities.json")
cities = cities.sample(10000)
Create and apply a custom aggregate for groupby()
from sklearn.metrics import DistanceMetric
dist = DistanceMetric.get_metric('haversine')
country_groups = cities.groupby("country")
def city_distances(group):
geo = group[['lat','lng']]
EARTH_RADIUS = 6371
haversine_distances = dist.pairwise(np.radians(geo) )
haversine_distances *= EARTH_RADIUS
distances = {}
distances['max'] = np.max(haversine_distances)
distances['min'] = 0
if len(haversine_distances[ np.nonzero(haversine_distances)] ) > 0 :
distances['min'] = np.min( haversine_distances[ np.nonzero(haversine_distances)] )
return pd.Series(distances)
country_groups.apply(city_distances)
In my case this prints something like
max min
country
AE 323.288482 323.288482
AF 1130.966661 15.435642
AI 12.056890 12.056890
AL 272.300688 3.437074
AM 268.051071 1.328605
... ... ...
YE 662.412344 19.103222
YT 3.723376 3.723376
ZA 1466.334609 24.319334
ZM 1227.429001 218.566369
ZW 503.562608 26.316902
[194 rows x 2 columns]