Is it good idea to reuse auto-generated ids? [PostgreSQL]-CodePudding

I am evaluating the possibility of reusing ids that have been removed from a column with auto-generated ids. Many records will be deleted per day and we do not want the identifier to exceed 9999.

Would I have concurrency problems when inserting into the table? (around 100 records would be inserted per day)
Is there a correct way to reuse ids?
Should I use another approach to assign this case of ids?
Should I leave development forever?

I'm just evaluating this possibility. Thank you for your time and experience.

create a table

CREATE TABLE IF NOT EXISTS test(
id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
data TEXT);

create function

CREATE OR REPLACE function get_id_test()
returns bigint
language plpgsql
as
$$
DECLARE
min_id bigint;
previous bigint;
next_id bigint;
f record;
BEGIN
   SELECT MIN(id) into min_id FROM test;
   IF  min_id > 1 THEN
       RETURN 1;
   END IF;

   FOR f
   IN select t.id from test t order by t.id
   LOOP
       IF  f.id = min_id THEN
           previous = f.id;
           CONTINUE;
       END IF;
       IF  f.id > (previous   1) THEN
           RETURN previous   1;
       ELSE 
           previous = f.id;
       END IF;
   END LOOP;
   next_id = previous   1;
   RETURN nextval(pg_get_serial_sequence('test', 'id'));
end;
$$;

insert 5 rows

INSERT INTO test (id,data) VALUES (get_id_test(),'test'),
                                  (get_id_test(),'test'),
                                  (get_id_test(),'test'),
                                  (get_id_test(),'test'),
                                  (get_id_test(),'test');

delete 2 rows (2 and 4)

DELETE FROM test WHERE id=2 OR id=4;

insert row (expecting id=2)

INSERT INTO test (id,data) VALUES (get_id_test(),'test expecting id 2');

insert row (expecting id=4)

INSERT INTO test (id,data) VALUES (get_id_test(),'test expecting id 4');

insert row (expecting normal next id from nextval())

INSERT INTO test (id,data) VALUES (get_id_test(),'test expecting id 6');

CodePudding user response：

tl;dr: Don't do it.

You certainly can do it, and transactions will save you. But you're violating assumptions that humans make, and increasing your costs of maintenance and of onboarding new staff to the project.

When I see an autoinc PK, in postgres and other databases, I assume

monotonic increasing
therefore "recent" rows have "large" IDs, which makes a difference for physical layout on-disk and for query optimization
IDs may have been sent out into the world, e.g. appearing in syslog messages, and they mean something

Your hole-filling approach is very very slow. At least use ORDER BY ID DESC, since presumably there's few holes at the bottom and most opportunities are from recent deletions near the top. Consider comparing RANK() to ID so the backend finds a hole without you having to loop looking for it. Most allocators will amortize the cost of a scan by exploiting a cache of scan results.

There is a rich literature on this topic -- start with malloc slab allocators. File systems need to fill holes, also, which Seltzer's "A Comparison of FFS Disk Allocation Policies" briefly touches on.

Consider maintaining a table of at most ten thousand rows which maps integer to either NULL or timestamp last released. A timestamp index would help you quickly find the next unallocated ID. Use old ones or recent ones first, whichever policy you prefer.

You're deliberately limiting the magnitude of the bigint ID, so int would probably suffice, saving a little storage.

Consider leaving the PK alone, and define a new integer attribute column where you perform hole-filling to your heart's content.