I'm trying to update rows in a single table by splitting them into two "sets" of rows.
The top part of the set should have a status set to X
and the bottom one should have a status set to status Y
.
I've tried putting together a query that looks like this
WITH x_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
LIMIT 5
), y_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
OFFSET 5
)
UPDATE people
SET status = folks.status
FROM (values
((SELECT id from x_status), 'X'),
((SELECT id from y_status), 'Y')
) as folks (ids, status)
WHERE id IN (folks.ids);
When I run this query I get the following error:
pq: more than one row returned by a subquery used as an expression
This makes sense, folks.ids
is expected to return a list of ID
s, hence the IN
clause in the UPDATE
statement, but I suspect the problem is I can not return the list in the values
statement in the FROM
clause as it turns into something like this:
(1, 2, 3, 4, 5, 5)
(6, 7, 8, 9, 1)
Is there a way how this UPDATE
can be done using a CTE query at all? I could split this into two separate UPDATE
queries, but CTE query would be better and in theory faster.
CodePudding user response:
So after more tinkering, I've come up with a solution.
The problem with why the previous query fails is we are not grouping the ID
s in the subqueries into arrays so the result expands into a huge list as I suspected.
The solution is grouping the ID
s in the subqueries into ARRAY
-- that way they get returned as a single result (tuple) in ids
value.
This is the query that does the job. Note that we must unnest
the ID
s in the WHERE
clause:
WITH x_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
LIMIT 5
), y_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
OFFSET 5
)
UPDATE people
SET status = folks.status
FROM (values
(ARRAY(SELECT id from x_status), 'X'),
(ARRAY(SELECT id from y_status), 'Y')
) as folks (ids, status)
WHERE id IN (SELECT * from unnest(folks.ids));
CodePudding user response:
I think I understand now... if I get your problem, you want to set the status to 'X' for the oldest five records and 'Y' for everything else?
In that case I think the row_number()
analytic would work -- and it should do it in a single pass, two scans, and eliminating one order by
. Let me know if something like this does what you seek.
with ranked as (
select
id, row_number() over (order by date_registered desc) as rn
from people
)
update people p
set
status = case when r.rn <= 5 then 'X' else 'Y' end
from ranked r
where
p.id = r.id
Any time you do an update from another data set, it's helpful to have a where clause that defines the relationship between the two datasets (the non-ANSI join syntax). This makes it iron-clad what you are updating.
Also I believe this code is pretty readable so it will be easier to build on if you need to make tweaks.
Let me know if I missed the boat.