There are two types of users who purchase the movie tickets from either town A, town B, town C or online.
I have the following tables as:
Locations: This table consists of locations of movie centers
|--------------|------------|------------|
| Towns | latitude | longitude |
|--------------|------------|------------|
| Town_A | 72.92629 | -12.89272 |
| Town_B | 93.62789 | -83.10172 |
| Town_C | 68.92612 | -67.17242 |
|--------------|------------|------------|
Users: This table contains the history of user's purchase i.e. either online or in towns. Also consists of user's latitude/longitude during the purchase.
|------------|------------|------------|--------------|
| user_id | latitude | longitude | Towns |
|------------|------------|------------|--------------|
| 1 | 21.89027 | -53.03772 | Town_A |
| 1 | 23.87847 | -41.78172 | Town_C |
| 1 | 39.62847 | -80.19892 | online |
| 1 | 77.87092 | -96.39242 | Town_A |
| 2 | 71.87782 | -38.03782 | online |
| 2 | 83.37847 | -62.78278 | Town_B |
| 3 | 89.81924 | -80.73892 | Town_B |
| 3 | 27.87282 | -18.39183 | Town_A |
|------------|------------|------------|--------------|
I want to find the nearest town based on user's lat/long during his purchase. The final table would look like below:
|------------|------------|------------|--------------|-----------------|
| user_id | latitude | longitude | Towns | nearest_town |
|------------|------------|------------|--------------|-----------------|
| 1 | 21.89027 | -53.03772 | Town_A | Town_B | <--- Town_B is near based on his lat/long (Irrespective of his purchase town)
| 1 | 23.87847 | -41.78172 | Town_C | Town_A | <--- Town_A is near based on his lat/long
| 1 | 39.62847 | -80.19892 | online | Town_Online |
| 1 | 77.87092 | -96.39242 | Town_A | Town_A |
| 2 | 71.87782 | -38.03782 | online | Town_Online |
| 2 | 83.37847 | -62.78278 | Town_B | Town_C |
| 3 | 89.81924 | -80.73892 | Town_B | Town_A |
| 3 | 27.87282 | -18.39183 | Town_A | Town_A |
|------------|------------|------------|--------------|-----------------|
SQL Query (Snowflake) my attempt:
With specific_location as
(
select user_id,
latitude,
longitude,
case when Towns in ('Town_A','Town_B','Town_C') then 'Town' else 'Town_Online' end as purchase_in
from Locations
)
select *,
case when purchase_in = 'Town' then
(select Towns from Location qualify row_number() over (order by haversine(user.latitude,user.longitude,location.latitude,location.longitude))=1)
else purchase_in
end as nearest_town
from specific_location
I'm getting an error: syntax error unexpected 'when' and unexpected 'else'
CodePudding user response:
You CTE specific_location
is missing a JOIN to USERS
as locations itself does not have a user_id
column.
I would also make an enriched user, to add a sequence just so later the location match can be distinctly per user row, and then do the user/location join in a second CTE, and thus the select you do at the end is with pre-computed values:
I also swapped you two value CASE statements for IFF's
WITH enriched_user AS (
SLECT
u.user_id,
u.latitude,
u.longitude,
u.town,
seq4() as seq,
IFF(towns IN ('Town_A','Town_B','Town_C'), 'Town', 'Town_Online') AS purchase_in
FROM user AS u
), user_and_closest_location AS (
SELECT
u.user_id,
u.latitude,
u.longitude,
u.town,
u.purchase_in
l.town as closest_town
haversine(u.latitude, u.longitude, l.latitude, l.longitude)
FROM enriched_user AS u,
location AS l
QUALIFY row_number() OVER (PARTION BY u.seq ORDER BY haversine(u.latitude, u.longitude, l.latitude, l.longitude)) = 1
)
SELECT
u.user_id,
u.latitude,
u.longitude,
u.town,
IFF(u.purchase_in = 'Town', u.closest_town, u.purchase_in) AS nearest_town
FROM user_and_closest_location AS u
ORDER BY 1,2,3;
The logic all calculating the distance based join for all row, is that it will be faster, and if there are things you want to not do it for, it would be better to prune the input there, but then you will need to rejoin to input to captured the skipped values.
WITH enriched_user AS (
SLECT
u.user_id,
u.latitude,
u.longitude,
u.town,
seq4() as seq,
IFF(towns IN ('Town_A','Town_B','Town_C'), 'Town', 'Town_Online') AS purchase_in
FROM user AS u
), user_and_closest_location AS (
SELECT
u.user_id,
u.latitude,
u.longitude,
u.town,
u.purchase_in
l.town as closest_town
haversine(u.latitude, u.longitude, l.latitude, l.longitude)
FROM enriched_user AS u,
location AS l
WHERE u.purchase_in = 'Town'
QUALIFY row_number() OVER (PARTION BY u.seq ORDER BY haversine(u.latitude, u.longitude, l.latitude, l.longitude)) = 1
)
SELECT
u.user_id
u.latitude,
u.longitude,
u.town,
IFF(u.purchase_in = 'Town', ucl.closest_town, u.purchase_in) AS nearest_town
FROM enriched_user user_and_closest_location AS u
LEFT JOIN user_and_closest_location AS ucl
ON u.seq = ucl.seq
ORDER BY 1,2,3;
also the in towns could be flipped to be not 'online`
IFF(towns IN ('Town_A','Town_B','Town_C'), 'Town', 'Town_Online') AS purchase_in
becoming:
IFF(towns != 'online', 'Town', 'Town_Online')
at which point the actual test can be moved to where it's used later.