I have a SQLite table containing messages. The usernames of users sending messages are unique, but users are also identified by their ID. I know the ID for all users who have sent messages, but some messages are only associated with a user by their username (the user_id column is NULL) and I want to fill these gaps by finding a message associated with the same username along with a non-null user-id. I tried doing this with the following SQL query
UPDATE messages
SET user_id = (
SELECT ref.user_id
FROM messages AS ref
WHERE ref.user_id IS NOT NULL
AND ref.username = messages.username
LIMIT 1
) WHERE user_id IS NULL
but the query never completes (it's stuck on 'Executing query'). This doesn't seem to be implementation-specific, because this happens in Python, DB Browser and DBeaver (the latter two both written in Java). What might be causing this?
CodePudding user response:
Your query is written to search for the user id again, for every individual message with a null user id. You could speed this up with an index tailored towards this subquery:
CREATE INDEX messages_name_id_idx
ON messages (username, user_id)
WHERE user_id IS NOT NULL
By putting username
first, this index is optimized towards looking up messages by their username. The WHERE
clause skips, in advance, the rows not used in your subquery.
You could also reduce the worst-case time complexity by instead first generating a list of username/id correspondences, then updating the messages accordingly:
WITH ids AS (
SELECT user_id, username
FROM messages
WHERE user_id IS NOT NULL
GROUP BY 1,2)
UPDATE messages
SET user_id = (
SELECT user_id
FROM ids
WHERE ids.username = messages.username
LIMIT 1)
WHERE user_id IS NULL
Now instead of searching your full messages
table anew for every row that needs fixing, you search it only two times.
These two ideas do similar things, so you might not get much more benefit from doing both than you would from doing one of them.
If there's a truly enormous number of usernames in your database, but a relatively small number of these that you'll need to "fix" messages for, you could try a multi-step process with temporary tables to limit what you look at to only the relevant usernames.
Note: I've assumed in all my examples that username
is never NULL
. If it sometimes is null, you should adjust to skip such rows.