Home > Software design >  Calculating town distance excluding the locations with time in SQL
Calculating town distance excluding the locations with time in SQL

Time:12-09

I have two tables

Location: Has Town name, locations latitude/longitude, and date the town was opened

|--------------|------------|------------|-------------------|
|    Towns     |  latitude  | longitude  |  establish_time   |
|--------------|------------|------------|-------------------|
|  Town_A      |  72.92629  | -12.89272  |    2021-07-21     |
|  Town_B      |  93.62789  | -83.10172  |    2021-08-21     |
|  Town_C      |  68.92612  | -67.17242  |    2021-09-21     |
|--------------|------------|------------|-------------------|

Users: Has user_id, user latitude/longitude, town they purchased movie tickets, time of purchase

|------------|------------|------------|--------------|-----------------|
|   user_id  |  latitude  | longitude  |    Towns     |  purchase_time  |
|------------|------------|------------|--------------|-----------------|
|    1       |  21.89027  | -53.03772  |   Town_A     |    2021-08-19   |
|    1       |  23.87847  | -41.78172  |   Town_C     |    2021-09-29   |
|    1       |  77.87092  | -96.39242  |   Town_A     |    2021-11-07   |
|    2       |  83.37847  | -62.78278  |   Town_B     |    2021-10-21   |
|    3       |  89.81924  | -80.73892  |   Town_B     |    2021-12-07   |
|    3       |  27.87282  | -18.39183  |   Town_A     |    2021-08-23   |
|------------|------------|------------|--------------|-----------------|

I'm trying to find the shortest distance from user location to the all other towns at the time of ticket purchase_time

Here is my SQL (Snowflake):

SELECT 
        u.*,
        seq4() as seq,
        l.town as nearest_town,
        haversine(u.latitude, u.longitude, l.latitude, l.longitude)
    FROM User AS u, location AS l
    QUALIFY row_number() OVER (PARTION BY u.seq ORDER BY haversine(u.latitude, u.longitude, l.latitude, l.longitude)) = 1

The result I'm getting is actually based on the user location as follows:

|------------|------------|------------|--------------|-----------------|
|   user_id  |  latitude  | longitude  |    Towns     | nearest_town    |
|------------|------------|------------|--------------|-----------------|
|    1       |  21.89027  | -53.03772  |   Town_C     |   Town_C        |
|    1       |  23.87847  | -41.78172  |   Town_C     |   Town_A        | 
|    1       |  77.87092  | -96.39242  |   Town_A     |   Town_B        | <--- This should be Town_A because Town_B was not present at the time of purchase
|    2       |  83.37847  | -62.78278  |   Town_B     |   Town_C        |
|    3       |  89.81924  | -80.73892  |   Town_B     |   Town_A        |
|    3       |  27.87282  | -18.39183  |   Town_A     |   Town_C        | <--- This should be Town_A because Town_B was not present at the time of purchase
|------------|------------|------------|--------------|-----------------|

The above query is resulting in nearest_town even when the town didn't exist at the time of purchase_time

How can I exclude the un-opened towns during calculating the nearest_towns

CodePudding user response:

Switch your implicit CROSS JOIN (implied by your comma delimited list of tables in your FROM clause) to an INNER JOIN where the ON condition is the date restriction you want to add.

Something like

FROM Users AS u
    INNER JOIN location AS l
        ON l.establish_time <= u.purchase_time

This says, in English "Only consider locations that were established before the purchase of this ticket when joining these two tables".

  • Related