Given the following table structure and sample data:
CREATE TABLE IF NOT EXISTS `records` (
`id` int unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
`external_id` int unsigned NOT NULL,
`sub_id` int unsigned DEFAULT 0,
`amount` bigint unsigned NOT NULL,
`updated_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) DEFAULT CHARSET=utf8;
INSERT INTO `records` (`external_id`, `sub_id`,`amount`, `updated_at`) VALUES
(1, 0, 160, '2022-01-13 16:00:00'),
(1, 1001, 150, '2022-01-13 16:40:00'),
(1, 1002, 170, '2022-06-13 16:40:00'),
(1, 1003, 170, '2022-06-13 16:40:00');
I'm trying to get the MIN
value for amount
for the past X (assume 30 days), for a given external_id
, using the timestamp field updated_at
, with the following constraints:
- If there are no records for the past 30 days (changes), the latest record is still a valid one,
- Each new record for a given
external_id
would "cancel and replace" the previous record, - If there are both records with
sub_id = 0
andsub_id <> 0
, for the same givenexternal_id
the records withsub_id <> 0
would prevail.
So, the query for the above data should return 150
.
A fiddle and what I have tried at: https://dbfiddle.uk/?rdbms=mariadb_10.6&fiddle=e4ddd6b55dbccf607633c1cf7d9cd4ef
Extra Information (Later edit)
To give you a better picture of the whole idea here:
Each time an update is made on the amount
field, there is a new record created in the records
table (to create a history log).
My task is to query the MIN
amount for the past 30 days.
Some external_id
records also have that sub_id
.
Whether there is or isn't a sub_id
, the record for the external_id
would be created.
Amounts with sub_id
are usually bigger (something extra gets added to the amount for product_id
).
CodePudding user response:
At the moment it's not clear what should happen if there are multiple external_id
values. You seem to want only one row returned? (If this is not the case, please improve the example to include the desired results when there are multiple different external_id
values.)
If you do just want one value returned, you could simply ORDER BY <something> LIMIT 1
I, however, am going to assume you want just one value per external_id
.
WITH
filtered_sorted
AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY external_id
ORDER BY CASE WHEN sub_id <> 0 THEN 0 ELSE 1 END,
amount
)
AS rn
FROM
records AS r
WHERE
updated_at >= (SELECT COALESCE(MAX(updated_at), DATE(NOW()) - INTERVAL 30 DAY)
FROM records
WHERE updated_at <= DATE(NOW()) - INTERVAL 30 DAY
AND external_id = r.external_id
)
)
SELECT
*
FROM
filtered_sorted
WHERE
rn = 1
This is based on the notion that the most recent row on or before the start of the day 30 days ago is still valid and should be included in consideration for the lowest amount.
- For each
external_id
...- Ignore all records
updated_at
after the start of 30 days ago - Return the most recent remaining
updated_at
(which could be exactly the start of the day 30 days ago) - If no such row is found, return the start of the day 30 days ago
- We will consider all rows for that
updated_at
onwards
- Ignore all records
Then the ROW_NUMBER()
prefers rows with a non-zero sub_id
and then rows with the lowest amount
.
- For the rows being considered, after the
WHERE
clause described above- Assign each row a row number
- Each
external_id
should have it's own individual sequence of row numbers (achieved withPARTITION BY
external_id`) - Rows with
sub_id <> 0
should come before before any rows withsub_id = 0
(achieved byORDER BY CASE WHEN sub_id <> 0 THEN 0 ELSE 1 END
) - Rows with lower
amount
values should come first (achieved with `ORDER BY amount.)
Then, just return rows where the assigned row number is 1
- Partitioned by
external_id
- Filtered by
updated_at
- Sorted by
sub_id <> 0
,amount
(One row per external_id
)