I am trying to learn some postgres skills to help me building my flask application and I really care about performance but I faced some issues.
I have the following table (Sessions
) in postgres:
SessionID | Other columns | Statics |
---|---|---|
BDzQDeGr | * | {"excludedOptions": {"questID": ["optID", "optID2"]}} |
Note: Statics is a jsonb.
I wanted to execute a query that append optID
to "Statics"->'excludedOptions'->'questID'
jsonb array if questID
exists else append questID
to excludedOptions
that has ["optID"]
as value.
I spent my last 4 days trying to do that with one query, And finally I did it ...
WITH ex AS
(
SELECT "Statics" -> 'excludedOptions'
FROM "Sessions"
WHERE "SessionID" = '{data["sessionId"]}'
)
UPDATE "Sessions"
SET "Statics" = jsonb_set(
"Statics",
ARRAY['excludedOptions', '{data["questionId"]}'],
array_to_json(array_remove((
SELECT
ARRAY['{data["answerId"]}'] || coalesce((
SELECT
jsonb_array_to_text_array((
SELECT value
FROM jsonb_each(( SELECT ex.* FROM ex))
WHERE key = '{data["questionId"]}'
))
), null)
), null))::jsonb,
TRUE
) WHERE "SessionID" = '{data["sessionId"]}'
NOTE: SQL above is a python f-string
but when I measured the executing time it was about 300-600ms witch is a lot of time for this simple operation.
I wanted it to be one query cz somehow I believe that one query is better :).
Can you help me making this query faster?
Would creating a function make it faster? or simpler?
Thank you for spending a little of your time for me
UPDATE
Table Schema:
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description |
---|---|---|---|---|---|---|---|
ID | numeric(16,0) | not null | main | ||||
SessionID | character varying(8) | not null | extended | ||||
Title | text | not null | extended | ||||
Type | text | not null | extended | ||||
Source | text | not null | extended | ||||
Questions | character(6)[] | not null | extended | ||||
Skipped | character(6)[] | '{}'::bpchar[] | extended | ||||
Answered | json | '{}'::json | extended | ||||
UsedHints | character(6)[] | '{}'::bpchar[] | extended | ||||
FinalResult | smallint | 0 | plain | ||||
Statics | jsonb | '{"timer":{}, "excludedOptions":{}}'::json | extended | ||||
Marked | character(6)[] | '{}'::bpchar[] | extended | ||||
Time | integer | not null | 0 | plain |
EXPLAIN ANALYZE:
Seq Scan on "Sessions" (cost=0.00..1.00 rows=1 width=894) (actual time=0.016..0.017 rows=1 loops=1)
Planning Time: 0.037 ms
Execution Time: 0.030 ms
CodePudding user response:
Your table has no indexes, so Postgresql must scan the whole table to find anything. That's what a "Seq Scan" is. You need to add some indexes to avoid table scans and make queries efficient.
Since you're checking WHERE "SessionID" = ...
an index on SessionID would allow Postgres to find all matching rows without having to scan the whole table.
See Use The Index, Luke for more about SQL performance tuning and indexes.
Other notes:
- Don't use
char
, there is no benefit and many drawbacks. Usevarchar
ortext
. - While arrays and JSON seem like they save a lot of effort, they're very, very easy to abuse. There's no short-cut around good schema design. Arrays should be limited to very simple lists, and JSON for when you need to store truly arbitrary data. All of your arrays and JSON should probably be replaced with traditional columns and join tables.
- Store time as a timestamp or other date/time type, not an integer.
- Don't use the
json
type.jsonb
is more efficient. Especially don't mixjson
andjsonb
, it just invites confusion. numeric(16,0)
is an odd choice for a primary key. A simplebigint
will use less space, be more efficient, and store a larger number. PostgreSQL even provides the convenientbigserial
type for auto-incrementing primary keys.- Don't quote table and column names (identifiers) unless necessary to resolve ambiguities. SQL identifiers are normally case-insensitive, but quoting makes them case-sensitive which will lead to problems.
- Don't use SQL keywords for column names. For example,
time
. Use the "at" and "on" conventions for timestamps and dates. For example,created_at
orcreated_on
.
CodePudding user response:
The performance hints given Schwern are absolutely valid. However, your UPDATE statement can be simplified drastically - at least if I understood your question correctly.
update sessions
set statistics =
case
when statistics -> 'excludedOptions' ? 'questID'
then jsonb_set(statistics, '{excludedOptions,questID}', (statistics #> '{excludedOptions,questID}') || '["newOptId"]')
else
jsonb_set(statistics, '{excludedOptions}', '{"questID": ["newOptId"]}')
end
where session_id = ....;
If there is a a key questID
within the excludedOptions
then this will append the string value "newOptId"
to that array. If there is no key questID
it will be created with a single-element array as its value.
There is no need for a CTE or unnesting and aggregating the array for questID
sessions.session_id
should have an index to make this fast.