Home > Software engineering >  Find the next free timestamp not in a table yet
Find the next free timestamp not in a table yet

Time:05-17

I have a table, event, with a column unique_time of type timestamptz. I need each of the values in unique_time to be unique.

Given a timestamptz input, input_time, I need to find the minimum timestamptz value that satisfies the following criteria:

  • the result must be >= input_time
  • the result must not already be in unique_time

I cannot merely add one microsecond to the greatest value in unique_time, because I need the minimum value that satisfies the above criteria.

Is there a concise way to compute this as part of an insert or update to the event table?

CodePudding user response:

I suggest a function with a loop:

CREATE OR REPLACE FUNCTION f_next_free(_input_time timestamptz, OUT _next_free timestamptz)
  LANGUAGE plpgsql STABLE STRICT AS
$func$
BEGIN
   LOOP
      SELECT INTO _next_free  _input_time
      WHERE  NOT EXISTS (SELECT FROM event WHERE unique_time = _input_time);
      
      EXIT WHEN FOUND;
      _input_time := _input_time   interval '1 us';
   END LOOP;
END
$func$;

Call:

SELECT f_next_free('2022-05-17 03:44:22.771741 02');

Be sure to have an index on event(unique_time). If the column is defined UNIQUE or PRIMARY KEY, that index is there implicitly.

Related:

Since Postgres timestamps have microsecond resolution, the next free timestamp is at least 1 microsecond (interval '1 us') away. See:

Could also be a recursive CTE, but the overhead is probably bigger.

Concurrency!

Is there a concise way to compute this as part of an INSERT or UPDATE to the event table?

The above is obviously subject to a race condition. Any number of concurrent transaction might find the same free spot. Postgres cannot lock rows that are not there, yet.

Since you want to INSERT (similar for UPDATE) I suggest INSERT .. ON CONFLICT DO NOTHING instead in a loop directly. Again, we need a UNIQUE or PRIMARY KEY on unique_time:

CREATE OR REPLACE FUNCTION f_next_free(INOUT _input_time timestamptz, _payload text)
  LANGUAGE plpgsql AS
$func$
BEGIN
   LOOP
      INSERT INTO event (unique_time, payload)
      VALUES (_input_time, _payload)
      ON CONFLICT DO NOTHING;
      
      EXIT WHEN FOUND;
      _input_time := _input_time   interval '1 us';
   END LOOP;
END
$func$;

Adapt your "payload" accordingly.

A successful INSERT locks the row. Even if concurrent transactions cannot see the inserted row yet, a UNIQUE index is absolute.
(You could make it work with advisory locks ...)

CodePudding user response:

Ah, forgot about the approaches from my comment that would try to generate an (infinite) sequence of all microsecond timestamps following the $input_time. There's a much simpler query that can generate exactly the timestamp you need:

INSERT INTO event(unique_time, others)
SELECT MIN(candidates.time), $other_values
FROM (
  SELECT $input_time AS "time"
UNION ALL
  SELECT unique_time   1 microsecond AS time
  FROM event
  WHERE unique_time >= $input_time
) AS candidates
WHERE NOT EXISTS (
  SELECT *
  FROM unique_time coll
  WHERE coll.unique_time = candidates.time
);

However, I'm not sure how well Postgres can optimise this, the MIN aggregate might load all the timestamps from event that are larger than $input_time - which might be fine if you always append events at the end, but still. A probably better alternative would be

INSERT INTO event(unique_time, others)
SELECT available.time, $other_values
FROM (
  SELECT *
  FROM (
    SELECT $input_time AS "time"
  UNION ALL
    SELECT unique_time   1 microsecond AS time
    FROM event
    WHERE unique_time >= $input_time
  ) AS candidates
  WHERE NOT EXISTS (
    SELECT *
    FROM unique_time coll
    WHERE coll.unique_time = candidates.time
  )
  ORDER BY candidates.unique_time ASC
) AS available
ORDER BY available.time ASC
LIMIT 1;

This might (I don't know) still have to evaluate the complex subquery every time you insert something though, which would be rather inefficient if most of the input don't cause a collision. Also I have no idea how well this works under concurrent loads (i.e. multiple transactions running the query at the same time) and whether it has possible race conditions.

Alternatively, just use a WHILE loop (in the client or PL/SQL) that attempts to insert the value until it succeeds and increments the timestamp on every iteration - see @Erwin Brandstetter's answer for that.

  • Related