I have two tables
Location: Has Town name, locations latitude/longitude, and date the town was opened
|--------------|------------|------------|-------------------|
| Towns | latitude | longitude | establish_time |
|--------------|------------|------------|-------------------|
| Town_A | 72.92629 | -12.89272 | 2021-07-21 |
| Town_B | 93.62789 | -83.10172 | 2021-08-21 |
| Town_C | 68.92612 | -67.17242 | 2021-09-21 |
|--------------|------------|------------|-------------------|
Users: Has user_id, user latitude/longitude, town they purchased movie tickets, time of purchase
|------------|------------|------------|--------------|-----------------|
| user_id | latitude | longitude | Towns | purchase_time |
|------------|------------|------------|--------------|-----------------|
| 1 | 21.89027 | -53.03772 | Town_A | 2021-08-19 |
| 1 | 23.87847 | -41.78172 | Town_C | 2021-09-29 |
| 1 | 77.87092 | -96.39242 | Town_A | 2021-11-07 |
| 2 | 83.37847 | -62.78278 | Town_B | 2021-10-21 |
| 3 | 89.81924 | -80.73892 | Town_B | 2021-12-07 |
| 3 | 27.87282 | -18.39183 | Town_A | 2021-08-23 |
|------------|------------|------------|--------------|-----------------|
I'm trying to find the shortest distance from user location to the all other towns at the time of ticket purchase_time
Here is my SQL (Snowflake):
SELECT
u.*,
seq4() as seq,
l.town as nearest_town,
haversine(u.latitude, u.longitude, l.latitude, l.longitude)
FROM User AS u, location AS l
QUALIFY row_number() OVER (PARTION BY u.seq ORDER BY haversine(u.latitude, u.longitude, l.latitude, l.longitude)) = 1
The result I'm getting is actually based on the user location as follows:
|------------|------------|------------|--------------|-----------------|
| user_id | latitude | longitude | Towns | nearest_town |
|------------|------------|------------|--------------|-----------------|
| 1 | 21.89027 | -53.03772 | Town_C | Town_C |
| 1 | 23.87847 | -41.78172 | Town_C | Town_A |
| 1 | 77.87092 | -96.39242 | Town_A | Town_B | <--- This should be Town_A because Town_B was not present at the time of purchase
| 2 | 83.37847 | -62.78278 | Town_B | Town_C |
| 3 | 89.81924 | -80.73892 | Town_B | Town_A |
| 3 | 27.87282 | -18.39183 | Town_A | Town_C | <--- This should be Town_A because Town_B was not present at the time of purchase
|------------|------------|------------|--------------|-----------------|
The above query is resulting in nearest_town even when the town didn't exist at the time of purchase_time
How can I exclude the un-opened towns during calculating the nearest_towns
CodePudding user response:
Switch your implicit CROSS JOIN
(implied by your comma delimited list of tables in your FROM clause) to an INNER JOIN
where the ON
condition is the date restriction you want to add.
Something like
FROM Users AS u
INNER JOIN location AS l
ON l.establish_time <= u.purchase_time
This says, in English "Only consider locations that were established before the purchase of this ticket when joining these two tables".