Find the next free timestamp not in a table yet-CodePudding

I have a table, event, with a column unique_time of type timestamptz. I need each of the values in unique_time to be unique.

Given a timestamptz input, input_time, I need to find the minimum timestamptz value that satisfies the following criteria:

the result must be >= input_time
the result must not already be in unique_time

I cannot merely add one microsecond to the greatest value in unique_time, because I need the minimum value that satisfies the above criteria.

Is there a concise way to compute this as part of an insert or update to the event table?

CodePudding user response：

I suggest a function with a loop:

CREATE OR REPLACE FUNCTION f_next_free(_input_time timestamptz, OUT _next_free timestamptz)
  LANGUAGE plpgsql STABLE STRICT AS
$func$
BEGIN
   LOOP
      SELECT INTO _next_free  _input_time
      WHERE  NOT EXISTS (SELECT FROM event WHERE unique_time = _input_time);
      
      EXIT WHEN FOUND;
      _input_time := _input_time   interval '1 us';
   END LOOP;
END
$func$;

Call:

SELECT f_next_free('2022-05-17 03:44:22.771741 02');

Be sure to have an index on event(unique_time). If the column is defined UNIQUE or PRIMARY KEY, that index is there implicitly.

Since Postgres timestamps have microsecond resolution, the next free timestamp is at least 1 microsecond (interval '1 us') away. See:

Ignoring time zones altogether in Rails and PostgreSQL

Could also be a recursive CTE, but the overhead is probably bigger.

Concurrency!

Is there a concise way to compute this as part of an INSERT or UPDATE to the event table?

The above is obviously subject to a race condition. Any number of concurrent transaction might find the same free spot. Postgres cannot lock rows that are not there, yet.

Since you want to INSERT (similar for UPDATE) I suggest INSERT .. ON CONFLICT DO NOTHING instead in a loop directly. Again, we need a UNIQUE or PRIMARY KEY on unique_time:

CREATE OR REPLACE FUNCTION f_next_free(INOUT _input_time timestamptz, _payload text)
  LANGUAGE plpgsql AS
$func$
BEGIN
   LOOP
      INSERT INTO event (unique_time, payload)
      VALUES (_input_time, _payload)
      ON CONFLICT DO NOTHING;
      
      EXIT WHEN FOUND;
      _input_time := _input_time   interval '1 us';
   END LOOP;
END
$func$;

Adapt your "payload" accordingly.

A successful INSERT locks the row. Even if concurrent transactions cannot see the inserted row yet, a UNIQUE index is absolute.
(You could make it work with advisory locks ...)

CodePudding user response：

Ah, forgot about the approaches from my comment that would try to generate an (infinite) sequence of all microsecond timestamps following the $input_time. There's a much simpler query that can generate exactly the timestamp you need:

INSERT INTO event(unique_time, others)
SELECT MIN(candidates.time), $other_values
FROM (
  SELECT $input_time AS "time"
UNION ALL
  SELECT unique_time   1 microsecond AS time
  FROM event
  WHERE unique_time >= $input_time
) AS candidates
WHERE NOT EXISTS (
  SELECT *
  FROM unique_time coll
  WHERE coll.unique_time = candidates.time
);

However, I'm not sure how well Postgres can optimise this, the MIN aggregate might load all the timestamps from event that are larger than $input_time - which might be fine if you always append events at the end, but still. A probably better alternative would be

INSERT INTO event(unique_time, others)
SELECT available.time, $other_values
FROM (
  SELECT *
  FROM (
    SELECT $input_time AS "time"
  UNION ALL
    SELECT unique_time   1 microsecond AS time
    FROM event
    WHERE unique_time >= $input_time
  ) AS candidates
  WHERE NOT EXISTS (
    SELECT *
    FROM unique_time coll
    WHERE coll.unique_time = candidates.time
  )
  ORDER BY candidates.unique_time ASC
) AS available
ORDER BY available.time ASC
LIMIT 1;

This might (I don't know) still have to evaluate the complex subquery every time you insert something though, which would be rather inefficient if most of the input don't cause a collision. Also I have no idea how well this works under concurrent loads (i.e. multiple transactions running the query at the same time) and whether it has possible race conditions.

Alternatively, just use a WHILE loop (in the client or PL/SQL) that attempts to insert the value until it succeeds and increments the timestamp on every iteration - see @Erwin Brandstetter's answer for that.