environment: Server version: 10.7.3-MariaDB-log
tables:
user location history:
CREATE TABLE `location_history` (
`id` int(10) UNSIGNED NOT NULL,
`userId` int(10) UNSIGNED DEFAULT NULL,
`latitude` double(10,8) DEFAULT NULL,
`longitude` double(11,8) DEFAULT NULL,
`createdAt` timestamp NOT NULL DEFAULT current_timestamp()
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
polygons(areas) points:
CREATE TABLE `location_area_points` (
`id` int(10) UNSIGNED NOT NULL,
`location_area_id` int(10) UNSIGNED DEFAULT NULL,
`area_group_id` int(10) UNSIGNED DEFAULT NULL,
`latitude` double(10,8) DEFAULT NULL,
`longitude` double(11,8) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;
what I am trying to achieve: find out how long an user has been inside an area,
for example find out when exactly did the userId 1 last enter into area_group_id 24,
now, what we managed to do so far: find in which area, each point is using the following query:
SELECT
location_history.id,
location_history.userId,
location_history.createdAt,
s.location_area_id
FROM
location_history
JOIN(
SELECT
location_area_points.location_area_id,
ST_PolygonFromText(
CONCAT(
"POLYGON((",
GROUP_CONCAT(
CONCAT(
location_area_points.latitude,
' ',
location_area_points.longitude
) SEPARATOR ', '
),
"))"
)
) AS polygon
FROM
location_area_points
GROUP BY
location_area_points.location_area_id
) s
ON
ST_CONTAINS(
s.polygon,
POINT(
location_history.latitude,
location_history.longitude
)
)
ORDER BY
createdAt
DESC
we get the following as an example with users 1 and 6, :
id userId createdAt location_area_id
11765 1 2022-07-18 17:03:23 24
11764 1 2022-07-18 17:03:07 24
11763 1 2022-07-18 17:02:25 24
11762 1 2022-07-18 17:02:16 24
11761 1 2022-07-18 17:01:24 24
11760 1 2022-07-18 17:00:32 24
11759 1 2022-07-18 16:59:41 24
11758 1 2022-07-18 16:59:40 24 <----- include in the results
11757 1 2022-07-18 16:58:49 2
11756 1 2022-07-18 16:58:04 2
11755 1 2022-07-18 16:57:06 2
11754 1 2022-07-18 16:56:23 24
11752 1 2022-07-18 16:56:14 24
11753 1 2022-07-18 16:56:14 24
11751 1 2022-07-18 16:54:31 24
11750 1 2022-07-18 16:54:30 24
11749 6 2022-07-18 16:53:39 5
11748 6 2022-07-18 16:52:47 5
11747 6 2022-07-18 16:51:56 5 <----- include in the results
11746 6 2022-07-18 16:51:55 24
11744 6 2022-07-18 16:51:04 24
11745 1 2022-07-18 16:51:04 24
11743 1 2022-07-18 16:50:13 24
11740 1 2022-07-18 16:49:20 24
11738 1 2022-07-18 16:48:29 24
now I would like to run additional query on the result above, to find out the first occurrence of the last group please see the code above "include in the results"
so the final result should be :
id userId createdAt location_area_id
11758 1 2022-07-18 16:59:40 24
11747 6 2022-07-18 16:51:56 5
I apologize if my question is not structured well as I am not sure how to ask such a complicated question and I am open to advice/modification to the question at hand.
CodePudding user response:
@danblack commented above that this is a window function problem, but there's a catch: MySQL 8 does not support nested window functions.
What window functions do is take the result of your query, partition it by rules you state, then allow you to add extra columns to the result with information based on that grouping.
You provided the table definitions - yay! You did not provide insert statements for the test data - aww. To test my answer I will be creating a table that matches your result set above, then selecting all to perform the window functions.
Window functions allow you to add extra columns in the result, that include information about neighboring rows.
SELECT
location_history.id,
location_history.userId,
location_history.createdAt,
s.location_area_id,
LAG(location_area_id) OVER (PARTITION BY userId ORDER BY createdAt) as previous_area
FROM
... (the rest of your query)
This will produce a result set just like the one above, but with a column for where that user was in the previous record.
Window functions apply to the result of a query after the query is essentially complete, so there's not a way to narrow the results except to use them as a subquery. To get only the rows where the location changed:
SELECT * from (
// insert big query here
) as `transitions`
WHERE where location_area_id != previous_area OR previous_area IS NULL
You may not care about the case where previous_area is null, it is just the first record for each user. But if the user hasn't changed locations since, then that record may be relevant to you.
Now we have a list of every time the location changed.
------- -------- --------------------- ------------------ ---------------
| id | userId | createdAt | location_area_id | previous_area |
------- -------- --------------------- ------------------ ---------------
| 11758 | 1 | 2022-07-18 16:59:40 | 24 | 2 |
| 11755 | 1 | 2022-07-18 16:57:06 | 2 | 24 |
| 11747 | 6 | 2022-07-18 16:51:56 | 5 | 24 |
| 11744 | 6 | 2022-07-18 16:51:04 | 24 | NULL |
| 11738 | 1 | 2022-07-18 16:48:29 | 24 | NULL |
------- -------- --------------------- ------------------ ---------------
The remaining challenge is to find the LATEST row for each user. It seems like a good time for another window function, but that's not supported in MySQL 8. We can modify the above to get that result:
SELECT MAX(id) as last_transition from (
// insert big query here
) as t
WHERE where location_area_id != previous_area OR previous_area IS NULL
GROUP BY userId
which yields
-----------------
| last_transition |
-----------------
| 11758 |
| 11747 |
-----------------
A quick check shows that this agrees with the records you indicate in the question.
So now we can take the big query, and join it with this result (that also uses the big query), and have the complete answer:
WITH big_query AS (
SELECT
location_history.id,
location_history.userId,
location_history.createdAt,
s.location_area_id
LAG(location_area_id) OVER (PARTITION BY userId ORDER BY createdAt) as previous_area
FROM
location_history
JOIN(
SELECT
location_area_points.location_area_id,
ST_PolygonFromText(
CONCAT(
"POLYGON((",
GROUP_CONCAT(
CONCAT(
location_area_points.latitude,
' ',
location_area_points.longitude
) SEPARATOR ', '
),
"))"
)
) AS polygon
FROM
location_area_points
GROUP BY
location_area_points.location_area_id
) s
ON
ST_CONTAINS(
s.polygon,
POINT(
location_history.latitude,
location_history.longitude
)
)
ORDER BY
createdAt
DESC
)
SELECT * from big_query
JOIN (
SELECT MAX(id) as id FROM big_query
WHERE location_area_id != previous_area OR previous_area IS NULL
GROUP BY userId
) as last_transitions using(id)
with the final answer
------- -------- --------------------- ------------------ ---------------
| id | userId | createdAt | location_area_id | previous_area |
------- -------- --------------------- ------------------ ---------------
| 11758 | 1 | 2022-07-18 16:59:40 | 24 | 2 |
| 11747 | 6 | 2022-07-18 16:51:56 | 5 | 24 |
------- -------- --------------------- ------------------ ---------------
The WITH
statement lets you take a SELECT
result and treat it like a table in its own right for the duration of the query. You might think of a better name than "big_query" :).
CodePudding user response:
Thanks to @Jerry https://stackoverflow.com/a/73040422/2294803 with his amazing answer. I used his example and the final query I came up with which works on mariadb 10.7.3
for completness:
SELECT
*,
MAX(id) AS last_transition
FROM
(
SELECT
*
FROM
(
SELECT
location_history.id,
location_history.userId,
location_history.createdAt,
s.location_area_id,
LAG(location_area_id) OVER(
PARTITION BY userId
ORDER BY
createdAt
) AS previous_area
FROM
location_history
JOIN(
SELECT
location_area_points.location_area_id,
ST_PolygonFromText(
CONCAT(
"POLYGON((",
GROUP_CONCAT(
CONCAT(
location_area_points.latitude,
' ',
location_area_points.longitude
) SEPARATOR ', '
),
"))"
)
) AS POLYGON
FROM
location_area_points
GROUP BY
location_area_points.location_area_id
) s
ON
ST_CONTAINS(
s.polygon,
POINT(
location_history.latitude,
location_history.longitude
)
)
ORDER BY
createdAt
DESC
) AS `transitions`
WHERE
location_area_id != previous_area OR previous_area IS NULL
) AS t
WHERE
location_area_id != previous_area OR previous_area IS NULL
GROUP BY
userId
gives me the result:
id userId createdAt location_area_id previous_area last_transition
11758 1 2022-07-18 16:59:40 24 2 11758
5121 3 2022-07-18 00:05:30 2 10 5121
2364 4 2022-05-03 22:59:48 11 2 2364
12978 5 2022-07-19 17:12:41 2 10 12978
1747 12 2022-05-03 12:23:35 2 NULL 1747
1703 14 2022-05-03 02:49:57 24 NULL 1703
1734 17 2022-05-03 11:08:43 24 NULL 1734
2623 24 2022-05-29 11:55:59 2 NULL 2623
2610 25 2022-05-17 07:39:02 2 NULL 2610
2620 29 2022-05-29 11:48:04 13 2 2620
2629 35 2022-05-29 13:40:45 2 NULL 2629
3215 36 2022-07-17 22:48:32 24 25 3215
11777 41 2022-07-19 09:47:30 24 NULL 11777
3252 42 2022-07-17 22:50:09 24 NULL 3252
which seem exactly what I have been looking for!
Thanks to @Jerry