We are using Postgres jsonb
type in one of our database tables. Table structure is shown as below:
CREATE TABLE IF NOT EXISTS public.draft_document (
id bigserial NOT NULL PRIMARY KEY,
...
document jsonb NOT NULL,
ein_search character varying(11) NOT NULL
);
CREATE INDEX IF NOT EXISTS count_draft_document_idx ON public.draft_document USING btree (ein_search);
CREATE INDEX IF NOT EXISTS read_draft_document_idx ON public.draft_document USING btree (id, ein_search);
The json structure of document
column may vary. Below is one example of a possible schema for document
:
"withholdingCredit": {
"type": "array",
"items": {
"$ref": "#/definitions/withholding"
}
}
Where the withholding
structure (array elements) respects:
"withholding": {
"properties": {
...
"proportionalityIndicator": {
"type": "boolean"
},
"tribute": {
"$ref": "#/definitions/tribute"
},
"payingSourceEin": {
"type": "string"
},
"value": {
"type": "number"
}
...
}
...
},
"tribute": {
"type": "object",
"properties": {
"code": {
"type": "number"
},
"additionalCode": {
"type": "number"
}
...
}
}
Here is an example of the json into document
jsonb column:
{
"withholdingCredit":[
{
"value": 15000,
"tribute":{
"code": 1216,
"additionalCode": 2
},
"payingSourceEin": "03985506123132",
"proportionalityIndicator": false
},
...
{
"value": 98150,
"tribute":{
"code": 3155,
"additionalCode": 1
},
"payingSourceEin": "04185506123163",
"proportionalityIndicator": false
}
]
}
The maximum number of elements in the array can vary up to a maximum limit of 100.000 (one hundred thousand) elements. It is a business limit.
We need a paged select query that returns the withholding
array disaggregated (1 element per row), where each row also brings the sum
of the withholding
elements value
and the array length
.
The query also needs to return the withholdings ordered
by proportionalityIndicator
, tribute-->code
, tribute-->additionalCode
, payingSourceEin
. Something like:
id | sum | jsonb_array_length | jsonb_array_elements |
---|---|---|---|
30900 | 1.800.027 | 2300 | {"value":15000,"tribute":{"code":1216,...}, ...} |
... | ... | ... | { ... } |
30900 | 1.800.027 | 2300 | {"value":98150,"tribute":{"code":3155,...}, ...} |
We have defined the following query:
SELECT dft.id,
SUM((elem->>'value')::NUMERIC),
jsonb_array_length(dft.document->'withholdingCredit'),
jsonb_array_elements(jsonb_agg(elem
ORDER BY
elem->>'proportionalityIndicator',
(elem->'tribute'->>'code')::NUMERIC,
(elem->'tribute'->>'additionalCode')::NUMERIC,
elem->>'payingSourceEin'))
FROM
draft_document dft
CROSS JOIN LATERAL jsonb_array_elements(dft.document->'withholdingCredit') arr(elem)
WHERE (dft.document->'withholdingCredit') IS NOT NULL
AND dft.id = :id
AND dft.ein_search = :ein_search
GROUP BY dft.id
LIMIT :limit OFFSET :offset;
This query works, but with performance limitation when we have a large number of elements into the jsonb array. Any suggestion on how to improve it is welcome.
BTW, we are using Postgres 9.6.
CodePudding user response:
Your weird query which breaks it apart, aggregates it, and breaks is apart again does seem to trigger some pathological memory management issue in PostgreSQL (tested on 15dev). Maybe you should file a bug report on that.
But you can avoid the problem by just breaking it apart one time. Then you need to use a window function to get the tabulations you want to include all rows even those removed by the offset and limit.
SELECT dft.id,
SUM((elem->>'value')::NUMERIC) over (),
count(*) over (),
elem
FROM
draft_document dft
CROSS JOIN LATERAL jsonb_array_elements(dft.document->'withholdingCredit') arr(elem)
WHERE (dft.document->'withholdingCredit') IS NOT NULL
AND dft.id = 4
AND dft.ein_search = '4'
ORDER BY
elem->>'proportionalityIndicator',
(elem->'tribute'->>'code')::NUMERIC,
(elem->'tribute'->>'additionalCode')::NUMERIC,
elem->>'payingSourceEin'
limit 4 offset 500;
In my hands this gives the same answer as your query, but takes 370 ms rather than 13,789 ms.
At higher offsets than that, my query still works while yours leads to a total lock up requiring a hard reset.
If anyone wants to reproduce the poor behavior, I generated the data by:
insert into draft_document select 4, jsonb_build_object('withholdingCredit',jsonb_agg(jsonb_build_object('value',floor(random()*99999)::int,'tribute','{"code": 1216, "additionalCode": 2}'::jsonb,'payingSourceEin',floor(random()*99999999)::int,'proportionalityIndicator',false))),'4' from generate_series(1,100000) group by 1,3;