Patients often bypass their nearest hospital to go to another hospital for surgery (- many reasons for that). I have 500,000 patient episodes of patients attending 24 hospitals in the UK.

I want to know the proportion of patients attanding a hospital that wasnt the nearest option. So say a hospital in London had 100 patients and 20 should have gone to Cambridge their proportion is 20%. In the example below patient2's nearest hospital may well have been ulon1,ulat1. u standing for neurosurgical unit(=hospital).

I have the Latitude and longitude of the patients and the hospitals. I can't show the data of patient codes because of confidentiality.

Essentially my dataframe looks like this

d = {'patient_ID': [0, 1, 2, 3, 5,], 'patient_lon': [ 'plon1', 'plon2', 'plon3', 'plon4', 'plon5'], 'patient_lat': ['plat1','plat2', 'plat3', 'plat4','plat5'],\
      'unit_lon' : ['ulon1', 'ulon2', 'ulon3', 'ulon4', 'ulon5'], 'unit_lat': ['ulat1', 'ulat2','ulat3', 'ulat4', 'ulat5']}
pd.DataFrame(data=d)

|patient_ID  |patient_lon   | patient_lat  | unit_lon  | unit_lat
  --------     ----------     ----------   --------   -------
|0           | plon1        |   plat1      |  ulon1    |  ulat1
|1           | plon2        |   plat2      |  ulon2    |  ulat2
|2           | plon3        |   plat3      |  ulon3    |  ulat3
|3           | plon4        |   plat4      |  ulon4    |  ulat4
|5           | plon5        |   plat5      |  ulon5    |  ulat5

I have used the Haversine method to calculate distance from the patient to the hospital they attended.

How can I use that to calculate all the distances to the 24 hospitals and find the minimum as the 'local' one. (They all provide neurosurgery which is what I am interested in). Then compare that to the one they actually went to in a new dataframe column.

BTW I am a surgeon so a novice here.

CodePudding user response：

My answer below reformats the data slightly to store lat long in tuples for some of the columns, hope this is ok but if not please respond and we'll work up the answer.

1. Simulate some plausible patient locations

from haversine import haversine
import pandas as pd
import numpy as np
import random

# number of simulated patient data
NUM_PATIENTS = 10

# a grid for sampling some patient locations from
COORD_LL = (51.578099, -0.232274)
COORD_UR = (52.797460, 1.556070)
GRID_LAT = np.linspace(COORD_LL[0], COORD_UR[0], num=NUM_PATIENTS)
GRID_LONG = np.linspace(COORD_LL[1], COORD_UR[1], num=NUM_PATIENTS)
GRID_LAT = np.around(GRID_LAT, decimals=4)
GRID_LONG = np.around(GRID_LONG, decimals=4)

2. Store locations of hospitals

Next up we'll store the names and latlong coords of the hospitals. In your example above this would be your 24 UK hospitals, again I've just made some things up here.

# names and locations of hospitals
HOSPITALS = dict(
    ADDBR=(52.1779, 0.1464),
    BURY=(52.2412, 0.6939),
    PBOROUGH=(52.5548, -0.2613),
    NWICH=(52.6091, 1.2609),
    LONDON=(51.5553, -0.0993),
)

3. Assemble dataframe

Now we use the data above to create some lists of data and a dataframe.

# Simulate patient data: generate lists   dataframe
patient_latlongs = tuple(zip(GRID_LAT, GRID_LONG))
patient_id = [i for i in range(len(patient_latlongs))]
unit_visited = [random.choice(list(HOSPITALS.keys())) for x in range(len(patient_latlongs))]
unit_visited_latlong = [HOSPITALS.get(x) for x in unit_visited]

df = pd.DataFrame.from_dict(
    {
        "patient_ID": patient_id,
        "patient_latlong": patient_latlongs,
        "unit_visited": unit_visited,
        "unit_visited_latlong": unit_visited_latlong,
    }
)

Output:

   patient_ID     patient_latlong unit_visited unit_visited_latlong
0           0  (51.5781, -0.2323)       LONDON   (51.5553, -0.0993)
1           1  (51.7136, -0.0336)     PBOROUGH   (52.5548, -0.2613)
2           2   (51.8491, 0.1651)        ADDBR    (52.1779, 0.1464)
3           3   (51.9846, 0.3638)       LONDON   (51.5553, -0.0993)
4           4     (52.12, 0.5625)         BURY    (52.2412, 0.6939)
5           5   (52.2555, 0.7613)     PBOROUGH   (52.5548, -0.2613)
6           6      (52.391, 0.96)       LONDON   (51.5553, -0.0993)
7           7   (52.5265, 1.1587)       LONDON   (51.5553, -0.0993)
8           8    (52.662, 1.3574)       LONDON   (51.5553, -0.0993)
9           9   (52.7975, 1.5561)         BURY    (52.2412, 0.6939)

4. Find nearest hospital

We write a function for finding the nearest hospital. This is probabaly a bit bespoke to our example. As mentioned above comments, haversine is a very convenient library for this. This function returns the key of the nearest hospital. We can look this up in our hospitals dict.

def find_nearest_hospital(latlong: tuple, hospitals: dict) -> str:
    """
    Calculate nearest hospital and return name of it. Assumes a dict
    storing hospital names as keys   latlong as tuples.

    latlong: tuple
        input latlong tuple

    hospitals: dict
        key / value pairs storing hospital names as keys and locations in latlong tuple values

    returns:
        name of closest hospital
    """
    distances = {}
    for hospital, location in hospitals.items():
        distances.update({hospital: haversine(latlong, location)})

    return min(distances, key=distances.get)

5. Compute distances

Assign new columns in dataframe calculating the nearest hospitals to the patients. Transform is a bit faster than apply, ideally we'd probably use a numpy vectorized function but this might be fast enough for your use case. If not, write back and we can take a look.

df = df.assign(
    closest_unit=df["patient_latlong"].transform(lambda x: find_nearest_hospital(x, HOSPITALS)),
    closest_unit_lat=lambda x: x["closest_unit"].replace(
        {k: v[0] for k, v in HOSPITALS.items()},
    ),
    closest_unit_long=lambda x: x["closest_unit"].replace(
        {k: v[1] for k, v in HOSPITALS.items()},
    ),
    visited_closest=lambda x: (x["closest_unit"] == x["unit_visited"]),
)

Output:

   patient_ID     patient_latlong unit_visited unit_visited_latlong  \
0           0  (51.5781, -0.2323)     PBOROUGH   (52.5548, -0.2613)   
1           1  (51.7136, -0.0336)         BURY    (52.2412, 0.6939)   
2           2   (51.8491, 0.1651)        ADDBR    (52.1779, 0.1464)   
3           3   (51.9846, 0.3638)        NWICH    (52.6091, 1.2609)   
4           4     (52.12, 0.5625)        ADDBR    (52.1779, 0.1464)   
5           5   (52.2555, 0.7613)        ADDBR    (52.1779, 0.1464)   
6           6      (52.391, 0.96)        NWICH    (52.6091, 1.2609)   
7           7   (52.5265, 1.1587)       LONDON   (51.5553, -0.0993)   
8           8    (52.662, 1.3574)       LONDON   (51.5553, -0.0993)   
9           9   (52.7975, 1.5561)     PBOROUGH   (52.5548, -0.2613)   

  closest_unit  closest_unit_lat  closest_unit_long  visited_closest  
0       LONDON           51.5553            -0.0993            False  
1       LONDON           51.5553            -0.0993            False  
2        ADDBR           52.1779             0.1464             True  
3        ADDBR           52.1779             0.1464            False  
4         BURY           52.2412             0.6939            False  
5         BURY           52.2412             0.6939            False  
6         BURY           52.2412             0.6939            False  
7        NWICH           52.6091             1.2609            False  
8        NWICH           52.6091             1.2609            False  
9        NWICH           52.6091             1.2609            False

On a related/unreleated note, writing from a neurological ward.