Home > Mobile >  SQLite query containing a 'SET' to a subquery never completing
SQLite query containing a 'SET' to a subquery never completing

Time:01-02

I have a SQLite table containing messages. The usernames of users sending messages are unique, but users are also identified by their ID. I know the ID for all users who have sent messages, but some messages are only associated with a user by their username (the user_id column is NULL) and I want to fill these gaps by finding a message associated with the same username along with a non-null user-id. I tried doing this with the following SQL query

UPDATE messages
    SET user_id = (
        SELECT ref.user_id
        FROM messages AS ref
        WHERE ref.user_id IS NOT NULL
            AND ref.username = messages.username
        LIMIT 1
    ) WHERE user_id IS NULL

but the query never completes (it's stuck on 'Executing query'). This doesn't seem to be implementation-specific, because this happens in Python, DB Browser and DBeaver (the latter two both written in Java). What might be causing this?

CodePudding user response:

Your query is written to search for the user id again, for every individual message with a null user id. You could speed this up with an index tailored towards this subquery:

CREATE INDEX messages_name_id_idx
    ON messages (username, user_id)
    WHERE user_id IS NOT NULL

By putting username first, this index is optimized towards looking up messages by their username. The WHERE clause skips, in advance, the rows not used in your subquery.

You could also reduce the worst-case time complexity by instead first generating a list of username/id correspondences, then updating the messages accordingly:

WITH ids AS (
    SELECT user_id, username
    FROM messages
    WHERE user_id IS NOT NULL
    GROUP BY 1,2)
UPDATE messages
    SET user_id = (
        SELECT user_id
        FROM ids
        WHERE ids.username = messages.username
        LIMIT 1)
    WHERE user_id IS NULL

Now instead of searching your full messages table anew for every row that needs fixing, you search it only two times.

These two ideas do similar things, so you might not get much more benefit from doing both than you would from doing one of them.

If there's a truly enormous number of usernames in your database, but a relatively small number of these that you'll need to "fix" messages for, you could try a multi-step process with temporary tables to limit what you look at to only the relevant usernames.

Note: I've assumed in all my examples that username is never NULL. If it sometimes is null, you should adjust to skip such rows.

  • Related