Home > Software design >  Exists on a distinct column selection does not work as expected when offset is provided
Exists on a distinct column selection does not work as expected when offset is provided

Time:09-04

A colleague of mine presented me with the following interesting problem that I can't find an explanation for.

Given the following table definition:

CREATE TABLE demo (
   id integer PRIMARY KEY,
   name varchar(10)
);

INSERT INTO demo VALUES (1, 'test');
INSERT INTO demo VALUES (2, 'test');

The following queries (which are assumed to be semantically identical - please correct me if I'm wrong):

SELECT DISTINCT name
FROM demo
WHERE name = 'test';

SELECT DISTINCT name
FROM demo
WHERE name = 'test'
-- actual value is irrelevant as long 
-- as it is > number of entries that would result 
LIMIT 10
OFFSET 0;

Both correctly return:

name
----
test

In addition, the query:

SELECT EXISTS(
    SELECT DISTINCT name
    FROM demo
    WHERE name = 'test'
    LIMIT 10
    OFFSET 0
);

also correctly returns 1 (or t on Postgres). However, the query:

SELECT EXISTS(
    SELECT DISTINCT name
    FROM demo
    WHERE name = 'test'
    LIMIT 10
    OFFSET 1 -- note the offset: 1 more than what the DISTINCTed query should return
);

also returns 1 on SQLite and MySQL, but f on Postgres. It seems as if the OFFSET is applied to the query result in Postgres (as expected), but the DISTINCT has precedence in SQLite and MySQL.

AFAIK, the SQL standard defines LIMIT/OFFSET to be evaluated last (though I couldn't actually find a link to the standard to verify this myself, though every search turns up the same...), meaning that the Postgres behaviour is correct.

Is this a bug that has been fixed in Postgres?

Tested on:

  • SQLite 3.36.0
  • MySQL 8.0.28-0ubuntu0.20.04.3
  • PostgreSQL 14.2 (Debian 14.2-1.pgdg110 1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit

Edit:

Interestingly, using GROUP BY instead of DISTINCT like follows:

SELECT EXISTS(
    SELECT name
    FROM demo
    WHERE name = 'test'
    GROUP BY name
    LIMIT 10
    OFFSET 1
);

correctly returns 0 on SQLite (haven't tested MySQL, but I assume it would work since grouping is done before OFFSET is applied). So this seems to be a specific issue with DISTINCT.


Edit: Nevermind, MySQL seems to have issues with the latter GROUP BY approach as well. The former seems a bug in SQLite as well as MySQL, while the latter is problematic only in MySQL.

CodePudding user response:

It seems that as noted in the comments, this is simply a bug in MySQL/SQLite. I was about to report it on their tracker but Oracle wants a lot of information for me to be able to do that, so I refrained.

The behaviour in Postgres is the expected one and correct.

  • Related