Strange behavior from lag(), skipping over certain rows-CodePudding

I have a table we'll call service, toy version is:

CREATE TABLE service (
  service_id   SERIAL       PRIMARY KEY,
  client_id    INTEGER      NOT NULL REFERENCES client (client_id),
  service_date TIMESTAMP(0) NOT NULL CHECK (service_date::DATE <= CURRENT_DATE),

-- system fields
  added_by     INTEGER      NOT NULL REFERENCES staff (staff_id),
  added_at     TIMESTAMP(0) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  changed_by   INTEGER      NOT NULL REFERENCES staff (staff_id),
  changed_at   TIMESTAMP(0) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  is_deleted   BOOLEAN      NOT NULL DEFAULT FALSE
)

The last five system fields are for database bookkeeping, and to enable soft (front end) deletion.

There's another table service_log which, together with service allows us to view a full revision history for each record in service. Fields of service_log are the same as service, plus a few other fields that facilitate the revision history. On update or insert to service, the state of the data is recorded as an insert to service_log. Obviously service_log.service_id is not a primary key nor is it unique, since each service record may have multiple revisions.

Our database has security features preventing hard (database-level) deletion of records. However, while investigating a bug I wanted to rule out the possibility that there may have been records hard-deleted from service but with evidence still present in service_log. The following query looking for service_log records without service_id present in service returns ~800 records (about 0.5% of all services since 2020):

WITH services_in_context AS (
    SELECT service_id,
        lag(service_id) OVER (ORDER BY service_id) AS prev_id
    FROM service
    WHERE service_date::DATE >= '1/1/2020'::DATE
)

SELECT DISTINCT log.service_id
FROM service_log log
INNER JOIN services_in_context s
    ON log.service_id BETWEEN s.prev_id   1 AND s.service_id - 1
ORDER BY log.service_id DESC;

However, the anomaly goes away when I restructure the query to use GENERATE_SERIES(). And the "missing" rows from service are actually there when I query them individually.

All this leads me to believe that lag(service_id) OVER (ORDER BY service_id) is skipping some records in the table. Which makes me think maybe the index postgres created for the primary key is corrupt? Is this the likely culprit, and if so what is the best way to fix it? Is there potentially a different reason lag() is missing some primary keys?

CodePudding user response：

What is reason for the WHERE statement and what is the service_date for the records returned? Your WHERE statement can easily create a gap.

WITH services_in_context AS (
    SELECT service_id,
        lag(service_id) OVER (ORDER BY service_id) AS prev_id
    FROM service
)

SELECT DISTINCT log.service_id
FROM service_log log
INNER JOIN services_in_context s
    ON log.service_id BETWEEN s.prev_id   1 AND s.service_id - 1
ORDER BY log.service_id DESC;

Your requirement can also be more easily satisfied by a simple LEFT JOIN.

SELECT sl.*
FROM service_log sl
LEFT JOIN service s ON s.service_id = sl.service_id 
WHERE s.service_id IS NULL