The example table:
id | name | create_time | group_id |
---|---|---|---|
1 | a | 2022-01-01 12:00:00 | group1 |
2 | b | 2022-01-01 13:00:00 | group1 |
3 | c | 2022-01-01 12:00:00 | NULL |
4 | d | 2022-01-01 13:00:00 | NULL |
5 | e | NULL | group2 |
I need to get top 1 rows (with the minimal create_time
) grouped by group_id
with these conditions:
create_time
can be null - it should be treated as a minimal valuegroup_id
can be null - all rows with nullablegroup_id
should be returned (if it's not possible, we can usecoalesce(group_id, id)
or sth like that assuming that ids are unique and never collide with group ids)- it should be possible to apply pagination on the query (so join can be a problem)
- the query should be universal as much as possible (so no vendor-specific things). Again, if it's not possible, it should work in MySQL 5&8, PostgreSQL 9 and H2
The expected output for the example:
id | name | create_time | group_id |
---|---|---|---|
1 | a | 2022-01-01 12:00:00 | group1 |
3 | c | 2022-01-01 12:00:00 | NULL |
4 | d | 2022-01-01 13:00:00 | NULL |
5 | e | NULL | group2 |
I've already read similar questions on SO but 90% of answers are with specific keywords (numerous answers with PARTITION BY
like https://stackoverflow.com/a/6841644/5572007) and others don't honor null values in the group condition columns and probably pagination (like https://stackoverflow.com/a/14346780/5572007).
CodePudding user response:
I would guess
SELECT id, name, MAX(create_time), group_id
FROM tb GROUP BY group_id
UNION ALL
SELECT id, name, create_time, group_id
FROM tb WHERE group_id IS NULL
ORDER BY name
I should point out that 'name' is a reserved word.
CodePudding user response:
select * from T t1
where coalesce(create_time, 0) = (
select min(coalesce(create_time, 0)) from T t2
where coalesce(t2.group_id, t2.id) = coalesce(t1.group_id, t1.id)
)
Not sure how you imagine "pagination" should work. Here's one way:
and (
select count(distinct coalesce(t2.group_id, t2.id)) from T t2
where coalesce(t2.group_id, t2.id) <= coalesce(t1.group_id, t1.id)
) between 2 and 5 /* for example */
order by coalesce(t1.group_id, t1.id)
I'm assuming there's an implicit cast from 0 to a date value with a resulting value lower than all those in your database. Not sure if that's reliable. (Try '19000101'
instead?) Otherwise the rest should be universal. You could probably also parameterize that in the same way as the page range.
You've also got a potential a complication with potential collisions between the group_id
and id
spaces. Yours don't appear to have that problem though having mixed data types creates its own issues.
This all gets more difficult when you want to order by other columns like name
:
select * from T t1
where coalesce(create_time, 0) = (
select min(coalesce(create_time, 0)) from T t2
where coalesce(t2.group_id, t2.id) = coalesce(t1.group_id, t1.id)
) and (
select count(*) from (
select * from T t1
where coalesce(create_time, 0) = (
select min(coalesce(create_time, 0)) from T t2
where coalesce(t2.group_id, t2.id) = coalesce(t1.group_id, t1.id)
)
) t3
where t3.name < t1.name or t3.name = t1.name
and coalesce(t3.group_id, t3.id) <= coalesce(t1.group_id, t1.id)
) between 2 and 5
order by t1.name;
That does handle ties but also makes the simplifying assumption that name
can't be null which would add yet another small twist. At least you can see that it's possible without CTEs and window functions but expect these to also be a lot less efficient to run.
https://dbfiddle.uk/?rdbms=mysql_5.5&fiddle=9697fd274e73f4fa7c1a3a48d2c78691
CodePudding user response:
You can combine two queries with UNION ALL
. E.g.:
select id, name, create_time, group_id
from mytable
where group_id is not null
and not exists
(
select null
from mytable older
where older.group_id = mytable.group_id
and older.create_time < mytable.create_time
)
union all
select id, name, create_time, group_id
from mytable
where group_id is null
order by id;
This is standard SQL and very basic at that. It should work in about every RDBMS.
As to pagination: This is usually costly, as you run the same query again and again in order to always pick the "next" part of the result, instead of running the query only once. The best approach is usually to use the primary key to get to the next part so an index on the key can be used. In above query we'd ideally add where id > :last_biggest_id
to the queries and limit the result, which would be fetch next <n> rows only
in standard SQL. Everytime we run the query, we use the last read ID as :last_biggest_id
, so we read on from there.
Variables, however, are dealt with differently in the various DBMS; most commonly they are preceded by either a colon, a dollar sign or an at sign. And the standard fetch clause, too, is supported by only some DBMS, while others have a LIMIT
or TOP
clause instead.
If these little differences make it impossible to apply them, then you must find a workaround. For the variable this can be a one-row-table holding the last read maximum ID. For the fetch clause this can mean you simply fetch as many rows as you need and stop there. Of course this isn't ideal, as the DBMS doesn't know then that you only need the next n rows and cannot optimize the execution plan accordingly.
And then there is the option not to do the pagination in the DBMS, but read the complete result into your app and handle pagination there (which then becomes a mere display thing and allocates a lot of memory of course).