Home > Mobile >  can you make postgres sql more efficient
can you make postgres sql more efficient

Time:08-31

I am trying to learn some postgres skills to help me building my flask application and I really care about performance but I faced some issues.

I have the following table (Sessions) in postgres:

SessionID Other columns Statics
BDzQDeGr * {"excludedOptions": {"questID": ["optID", "optID2"]}}


Note: Statics is a jsonb.

I wanted to execute a query that append optID to "Statics"->'excludedOptions'->'questID' jsonb array if questID exists else append questID to excludedOptions that has ["optID"] as value.

I spent my last 4 days trying to do that with one query, And finally I did it ...

WITH ex AS 
(
    SELECT "Statics" -> 'excludedOptions' 
    FROM "Sessions" 
    WHERE "SessionID" = '{data["sessionId"]}'
)
UPDATE "Sessions" 
SET "Statics" = jsonb_set(
    "Statics", 
    ARRAY['excludedOptions', '{data["questionId"]}'], 
    array_to_json(array_remove((
        SELECT
            ARRAY['{data["answerId"]}'] || coalesce((
            SELECT
                jsonb_array_to_text_array((
                    SELECT value 
                    FROM jsonb_each(( SELECT ex.* FROM ex)) 
                    WHERE key = '{data["questionId"]}'
                ))
            ), null)
    ), null))::jsonb, 
    TRUE
) WHERE "SessionID" = '{data["sessionId"]}'

NOTE: SQL above is a python f-string

but when I measured the executing time it was about 300-600ms witch is a lot of time for this simple operation.
I wanted it to be one query cz somehow I believe that one query is better :).

Can you help me making this query faster?

Would creating a function make it faster? or simpler?

Thank you for spending a little of your time for me

UPDATE

Table Schema:

Column Type Collation Nullable Default Storage Stats target Description
ID numeric(16,0) not null main
SessionID character varying(8) not null extended
Title text not null extended
Type text not null extended
Source text not null extended
Questions character(6)[] not null extended
Skipped character(6)[] '{}'::bpchar[] extended
Answered json '{}'::json extended
UsedHints character(6)[] '{}'::bpchar[] extended
FinalResult smallint 0 plain
Statics jsonb '{"timer":{}, "excludedOptions":{}}'::json extended
Marked character(6)[] '{}'::bpchar[] extended
Time integer not null 0 plain

EXPLAIN ANALYZE:

Seq Scan on "Sessions"  (cost=0.00..1.00 rows=1 width=894) (actual time=0.016..0.017 rows=1 loops=1)
Planning Time: 0.037 ms
Execution Time: 0.030 ms

CodePudding user response:

Your table has no indexes, so Postgresql must scan the whole table to find anything. That's what a "Seq Scan" is. You need to add some indexes to avoid table scans and make queries efficient.

Since you're checking WHERE "SessionID" = ... an index on SessionID would allow Postgres to find all matching rows without having to scan the whole table.

See Use The Index, Luke for more about SQL performance tuning and indexes.

Other notes:

  1. Don't use char, there is no benefit and many drawbacks. Use varchar or text.
  2. While arrays and JSON seem like they save a lot of effort, they're very, very easy to abuse. There's no short-cut around good schema design. Arrays should be limited to very simple lists, and JSON for when you need to store truly arbitrary data. All of your arrays and JSON should probably be replaced with traditional columns and join tables.
  3. Store time as a timestamp or other date/time type, not an integer.
  4. Don't use the json type. jsonb is more efficient. Especially don't mix json and jsonb, it just invites confusion.
  5. numeric(16,0) is an odd choice for a primary key. A simple bigint will use less space, be more efficient, and store a larger number. PostgreSQL even provides the convenient bigserial type for auto-incrementing primary keys.
  6. Don't quote table and column names (identifiers) unless necessary to resolve ambiguities. SQL identifiers are normally case-insensitive, but quoting makes them case-sensitive which will lead to problems.
  7. Don't use SQL keywords for column names. For example, time. Use the "at" and "on" conventions for timestamps and dates. For example, created_at or created_on.

CodePudding user response:

The performance hints given Schwern are absolutely valid. However, your UPDATE statement can be simplified drastically - at least if I understood your question correctly.

update sessions
   set statistics =
         case 
           when statistics -> 'excludedOptions' ? 'questID' 
             then jsonb_set(statistics, '{excludedOptions,questID}', (statistics #> '{excludedOptions,questID}') || '["newOptId"]') 
           else 
             jsonb_set(statistics, '{excludedOptions}', '{"questID": ["newOptId"]}') 
          end
where session_id = ....;

If there is a a key questID within the excludedOptions then this will append the string value "newOptId" to that array. If there is no key questID it will be created with a single-element array as its value.

There is no need for a CTE or unnesting and aggregating the array for questID

sessions.session_id should have an index to make this fast.

Online example

  • Related