Patients often bypass their nearest hospital to go to another hospital for surgery (- many reasons for that). I have 500,000 patient episodes of patients attending 24 hospitals in the UK.
I want to know the proportion of patients attanding a hospital that wasnt the nearest option. So say a hospital in London had 100 patients and 20 should have gone to Cambridge their proportion is 20%. In the example below patient2's nearest hospital may well have been ulon1,ulat1. u standing for neurosurgical unit(=hospital).
I have the Latitude and longitude of the patients and the hospitals. I can't show the data of patient codes because of confidentiality.
Essentially my dataframe looks like this
d = {'patient_ID': [0, 1, 2, 3, 5,], 'patient_lon': [ 'plon1', 'plon2', 'plon3', 'plon4', 'plon5'], 'patient_lat': ['plat1','plat2', 'plat3', 'plat4','plat5'],\
'unit_lon' : ['ulon1', 'ulon2', 'ulon3', 'ulon4', 'ulon5'], 'unit_lat': ['ulat1', 'ulat2','ulat3', 'ulat4', 'ulat5']}
pd.DataFrame(data=d)
|patient_ID |patient_lon | patient_lat | unit_lon | unit_lat
-------- ---------- ---------- -------- -------
|0 | plon1 | plat1 | ulon1 | ulat1
|1 | plon2 | plat2 | ulon2 | ulat2
|2 | plon3 | plat3 | ulon3 | ulat3
|3 | plon4 | plat4 | ulon4 | ulat4
|5 | plon5 | plat5 | ulon5 | ulat5
I have used the Haversine method to calculate distance from the patient to the hospital they attended.
How can I use that to calculate all the distances to the 24 hospitals and find the minimum as the 'local' one. (They all provide neurosurgery which is what I am interested in). Then compare that to the one they actually went to in a new dataframe column.
BTW I am a surgeon so a novice here.
CodePudding user response:
My answer below reformats the data slightly to store lat long in tuples for some of the columns, hope this is ok but if not please respond and we'll work up the answer.
1. Simulate some plausible patient locations
from haversine import haversine
import pandas as pd
import numpy as np
import random
# number of simulated patient data
NUM_PATIENTS = 10
# a grid for sampling some patient locations from
COORD_LL = (51.578099, -0.232274)
COORD_UR = (52.797460, 1.556070)
GRID_LAT = np.linspace(COORD_LL[0], COORD_UR[0], num=NUM_PATIENTS)
GRID_LONG = np.linspace(COORD_LL[1], COORD_UR[1], num=NUM_PATIENTS)
GRID_LAT = np.around(GRID_LAT, decimals=4)
GRID_LONG = np.around(GRID_LONG, decimals=4)
2. Store locations of hospitals
Next up we'll store the names and latlong coords of the hospitals. In your example above this would be your 24 UK hospitals, again I've just made some things up here.
# names and locations of hospitals
HOSPITALS = dict(
ADDBR=(52.1779, 0.1464),
BURY=(52.2412, 0.6939),
PBOROUGH=(52.5548, -0.2613),
NWICH=(52.6091, 1.2609),
LONDON=(51.5553, -0.0993),
)
3. Assemble dataframe
Now we use the data above to create some lists of data and a dataframe.
# Simulate patient data: generate lists dataframe
patient_latlongs = tuple(zip(GRID_LAT, GRID_LONG))
patient_id = [i for i in range(len(patient_latlongs))]
unit_visited = [random.choice(list(HOSPITALS.keys())) for x in range(len(patient_latlongs))]
unit_visited_latlong = [HOSPITALS.get(x) for x in unit_visited]
df = pd.DataFrame.from_dict(
{
"patient_ID": patient_id,
"patient_latlong": patient_latlongs,
"unit_visited": unit_visited,
"unit_visited_latlong": unit_visited_latlong,
}
)
Output:
patient_ID patient_latlong unit_visited unit_visited_latlong
0 0 (51.5781, -0.2323) LONDON (51.5553, -0.0993)
1 1 (51.7136, -0.0336) PBOROUGH (52.5548, -0.2613)
2 2 (51.8491, 0.1651) ADDBR (52.1779, 0.1464)
3 3 (51.9846, 0.3638) LONDON (51.5553, -0.0993)
4 4 (52.12, 0.5625) BURY (52.2412, 0.6939)
5 5 (52.2555, 0.7613) PBOROUGH (52.5548, -0.2613)
6 6 (52.391, 0.96) LONDON (51.5553, -0.0993)
7 7 (52.5265, 1.1587) LONDON (51.5553, -0.0993)
8 8 (52.662, 1.3574) LONDON (51.5553, -0.0993)
9 9 (52.7975, 1.5561) BURY (52.2412, 0.6939)
4. Find nearest hospital
We write a function for finding the nearest hospital. This is probabaly a bit bespoke to our example. As mentioned above comments, haversine is a very convenient library for this. This function returns the key of the nearest hospital. We can look this up in our hospitals
dict.
def find_nearest_hospital(latlong: tuple, hospitals: dict) -> str:
"""
Calculate nearest hospital and return name of it. Assumes a dict
storing hospital names as keys latlong as tuples.
latlong: tuple
input latlong tuple
hospitals: dict
key / value pairs storing hospital names as keys and locations in latlong tuple values
returns:
name of closest hospital
"""
distances = {}
for hospital, location in hospitals.items():
distances.update({hospital: haversine(latlong, location)})
return min(distances, key=distances.get)
5. Compute distances
Assign new columns in dataframe calculating the nearest hospitals to the patients. Transform is a bit faster than apply, ideally we'd probably use a numpy vectorized function but this might be fast enough for your use case. If not, write back and we can take a look.
df = df.assign(
closest_unit=df["patient_latlong"].transform(lambda x: find_nearest_hospital(x, HOSPITALS)),
closest_unit_lat=lambda x: x["closest_unit"].replace(
{k: v[0] for k, v in HOSPITALS.items()},
),
closest_unit_long=lambda x: x["closest_unit"].replace(
{k: v[1] for k, v in HOSPITALS.items()},
),
visited_closest=lambda x: (x["closest_unit"] == x["unit_visited"]),
)
Output:
patient_ID patient_latlong unit_visited unit_visited_latlong \
0 0 (51.5781, -0.2323) PBOROUGH (52.5548, -0.2613)
1 1 (51.7136, -0.0336) BURY (52.2412, 0.6939)
2 2 (51.8491, 0.1651) ADDBR (52.1779, 0.1464)
3 3 (51.9846, 0.3638) NWICH (52.6091, 1.2609)
4 4 (52.12, 0.5625) ADDBR (52.1779, 0.1464)
5 5 (52.2555, 0.7613) ADDBR (52.1779, 0.1464)
6 6 (52.391, 0.96) NWICH (52.6091, 1.2609)
7 7 (52.5265, 1.1587) LONDON (51.5553, -0.0993)
8 8 (52.662, 1.3574) LONDON (51.5553, -0.0993)
9 9 (52.7975, 1.5561) PBOROUGH (52.5548, -0.2613)
closest_unit closest_unit_lat closest_unit_long visited_closest
0 LONDON 51.5553 -0.0993 False
1 LONDON 51.5553 -0.0993 False
2 ADDBR 52.1779 0.1464 True
3 ADDBR 52.1779 0.1464 False
4 BURY 52.2412 0.6939 False
5 BURY 52.2412 0.6939 False
6 BURY 52.2412 0.6939 False
7 NWICH 52.6091 1.2609 False
8 NWICH 52.6091 1.2609 False
9 NWICH 52.6091 1.2609 False
On a related/unreleated note, writing from a neurological ward.