I have a table that is setup like this:
number_value | number_date
------------- ----------------------
4 | 2022-04-05 10:00:00
4 | 2022-04-05 10:05:00
4 | 2022-04-05 10:10:00
5 | 2022-04-05 10:15:00
5 | 2022-04-05 10:20:00
7 | 2022-04-05 10:25:00
5 | 2022-04-05 10:30:00
5 | 2022-04-05 10:35:00
This data is going to be used to create a line graph.
To reduce the amount of data that needs to be sent, I want the select query to format the data like this:
number_value | start_point | end_point
------------- --------------------- ----------------------
4 | 2022-04-05 10:00:00 | 2022-04-05 10:10:00
5 | 2022-04-05 10:15:00 | 2022-04-05 10:20:00
7 | 2022-04-05 10:25:00 | 2022-04-05 10:25:00
5 | 2022-04-05 10:30:00 | 2022-04-05 10:35:00
This was my first attempt at a query which does this:
SELECT number_value, MIN(number_date) as start_point, MAX(number_date) as end_point
FROM number_table
GROUP BY number_value;
An obvious flaw with this attempt is the GROUP BY
makes so if there is a duplicate value which occurred at a prior time (5 -> 7 -> 5), then those values are grouped together.
The problem I'm having is I can't see a way to do this as an SQL query.
All help appreciated.
CodePudding user response:
This is a gaps and islands problem, and we can use the difference in row numbers method for one approach:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY number_date) rn1,
ROW_NUMBER() OVER (PARTITION BY number_value ORDER BY number_date) rn2
FROM number_table
)
SELECT number_value,
MIN(number_date) AS start_point,
MAX(number)date) AS end_point
FROM cte
GROUP BY number_value, rn1 - rn2
ORDER BY MIN(number_date);