How can I return only the records with the latest upload_date(s) from the data below?
My data is as follows:
upload_date |day_name |rows_added|row_count_delta|days_since_last_update|
----------------------- --------- ---------- --------------- ----------------------
2022-05-01 00:00:00.000|Sunday | 526043| | |
2022-05-02 00:00:00.000|Monday | 467082| -58961| 1|
2022-05-02 15:58:54.094|Monday | 421427| -45655| 0|
2022-05-02 18:19:22.894|Monday | 421427| 0| 0|
2022-05-03 16:54:04.136|Tuesday | 496021| 74594| 1|
2022-05-03 18:17:27.502|Tuesday | 496021| 0| 0|
2022-05-04 18:19:26.392|Wednesday| 487154| -8867| 1|
2022-05-05 18:18:15.277|Thursday | 489713| 2559| 1|
2022-05-06 16:15:39.518|Friday | 489713| 0| 1|
2022-05-07 16:18:00.916|Saturday | 482955| -6758| 1|
My desired results should be:
upload_date |day_name |rows_added|row_count_delta|days_since_last_update|
----------------------- --------- ---------- --------------- ----------------------
2022-05-01 00:00:00.000|Sunday | 526043| | |
2022-05-02 18:19:22.894|Monday | 421427| 0| 0|
2022-05-03 18:17:27.502|Tuesday | 496021| 0| 0|
2022-05-04 18:19:26.392|Wednesday| 487154| -8867| 1|
2022-05-05 18:18:15.277|Thursday | 489713| 2559| 1|
2022-05-06 16:15:39.518|Friday | 489713| 0| 1|
2022-05-07 16:18:00.916|Saturday | 482955| -6758| 1|
NOTE only the latest upload_date for 2022-05-02 and 2022-05-03 should be in the result set.
CodePudding user response:
You can use a window function to PARTITION
by day (casting the timestamp to a date) and sort the results by most recent first by ordering by upload_date descending. Using ROW_NUMBER()
it will assign a 1
to the most recent record per date. Then just filter on that row number. Note that I am assuming the datatype for upload_date is TIMESTAMP
in this case.
SELECT
*
FROM (
SELECT
your_table.*,
ROW_NUMBER() OVER (PARTITION BY CAST(upload_date AS DATE)
ORDER BY upload_date DESC) rownum
FROM your_table
)
WHERE rownum = 1
CodePudding user response:
WITH cte AS (
SELECT
max(upload_date) OVER (PARTITION BY upload_date::date),
upload_date,
day_name,
rows_added,
row_count_delta,
days_since_last_update
FROM test101 ORDER BY 1
)
SELECT
upload_date,
day_name,
rows_added,
row_count_delta,
days_since_last_update
FROM
cte
WHERE
max = upload_date;
CodePudding user response:
This is more verbose but I find it easier to read and build:
SELECT *
FROM mytable t1
JOIN (
SELECT CAST(upload_date AS DATE) day_date, MAX(upload_date) max_date
FROM mytable
GROUP BY day_date) t2
ON t1.upload_date = t2.max_date AND
CAST(upload_date AS DATE) = t2.day_date;
I don't know about perfomance right away, but I suspect the window function is worse because you will need to order by
, which is usually a slow operation unless your table already have an index for doing so.