Home > OS >  Create many partitions as a result of a select statement in Postgres
Create many partitions as a result of a select statement in Postgres

Time:09-11

Here is a table of phone numbers named phone_number:

phone_number country_code owner
07911 123456 44 Ada
08912 654321 44 Thomas
06 12 34 56 78 33 Jonathan
06 87 65 43 21 33 Arthur

Let's say we want to partition this table by country code, therefore creating this table phone_number_bis

CREATE TABLE phone_number_bis (
                phone_number VARCHAR,
                country_code INTEGER,
                owner VARCHAR NOT NULL,
                PRIMARY KEY (phone_number, country_code)
) PARTITION BY LIST(country_code)

Loading the content of phone_number into phone_number_bis will produce the following error:

INSERT INTO phone_number_bis( phone_number, country_code, owner)
SELECT  phone_number, country_code, owner
FROM phone_number;

ERROR: no partition of relation "phone_number_bis" found for row
Partition key of the failing row contains (country_code) = (44)

Is there a SQL command that could create all necessary partitions before loading data into phone_number_bis, not knowing the content of the country_code column in advance ?

NB: as Franck Heikens pointed out, partitioning the table may not be relevant for storing phone numbers. This is an example made in order to make a complex problem more understable.

CodePudding user response:

If your client is psql, you can use \gexec to make it run a query and then run each result as a new command. So then you would need to write one query which output a text string containing a suitable CREATE TABLE statement for each distinct country_code. To do it entirely on the server side, you could use pl/pgsql to do much the same thing, constructing a string and then using dynamic sql to EXECUTE the string.

CodePudding user response:

Is there a SQL command that could create all necessary partitions before loading data into phone_number_bis, not knowing the content of the country_code column in advance ? You can use DEFAULT partition, then split partition from DEFAULT.

begin;
create table phone_number(phone_number text,country_code integer, owner text);
insert into phone_number select 'dummy_' || (random()::numeric(10,4)),
    120   i,'owner'||i from generate_series(1, 5) g(i);
insert into phone_number select 'dummy_' || (random()::numeric(10,4)),
    220 i,'owner'||i from generate_series(1, 5) g(i);
insert into phone_number select 'dummy_' || (random()::numeric(10,4)), 1 ,'owner'||i from generate_series(1, 20) g(i);
select string_agg(distinct (country_code::text),', ' order by (country_code::text)) 
from phone_number 
where country_code > 99 and country_code < 201;
commit;

BEGIN;
CREATE TABLE phone_number_bis (
    phone_number text,country_code integer,OWNER text,
    PRIMARY KEY (phone_number, country_code)
)
PARTITION BY LIST (country_code);

CREATE TABLE phone_number_bis_01 PARTITION OF phone_number_bis
FOR VALUES IN (1);

CREATE TABLE phone_number_bis_2_300 PARTITION OF phone_number_bis
FOR VALUES IN (121, 122, 123, 124, 125);

CREATE TABLE phone_number_bis_default PARTITION OF phone_number_bis DEFAULT;

INSERT INTO phone_number_bis (phone_number, country_code, OWNER)
SELECT phone_number, country_code,OWNER FROM phone_number;
COMMIT;

now split the partition from default partition create an new partition for values in 200 to 300.

BEGIN;
ALTER TABLE phone_number_bis DETACH PARTITION phone_number_bis_default;
ALTER TABLE phone_number_bis_default RENAME TO phone_number_bis_default_old;
CREATE TABLE phone_number_bis_200_300 PARTITION OF phone_number_bis
FOR VALUES IN (221, 222, 223, 224, 225);
CREATE TABLE phone_number_bis_default PARTITION OF phone_number_bis DEFAULT;
INSERT INTO phone_number_bis (phone_number, country_code, OWNER)
SELECT phone_number, country_code, OWNER FROM phone_number_bis_default_old;
COMMIT;

https://www.postgresql.org/docs/current/ddl-partitioning.html
Quote:

Choosing the target number of partitions that the table should be divided into is also a critical decision to make. Not having enough partitions may mean that indexes remain too large and that data locality remains poor which could result in low cache hit ratios. However, dividing the table into too many partitions can also cause issues. Too many partitions can mean longer query planning times and higher memory consumption during both query planning and execution, as further described below.

Quote:

Another reason to be concerned about having a large number of partitions is that the server's memory consumption may grow significantly over time, especially if many sessions touch large numbers of partitions. That's because each partition requires its metadata to be loaded into the local memory of each session that touches it.

More partition, will consume more memory, there is an case: https://www.postgresql.org/message-id/flat/PH0PR11MB5191F459DCB44A91682FE8C8D6409@PH0PR11MB5191.namprd11.prod.outlook.com#86aaad1ddd6350efc062c2dd79a31821

  • Related