Let's assume I have a customer
table that has a customer_id
column and a has_orders
column among others. Let's also assume I have an order
table that has a customer_id
column referencing customers in the customer
table.
If I want to update the has_orders
column, I can run the following statement:
update customer set has_orders = customer_id in (select customer_id from order)
This works. However, it's slow, even though I have indexes on customer_id
in both tables. Is there another update statement (e.g. using joins) that can be faster?
CodePudding user response:
An index on customer_id
in the order
table should already be pretty optimal. You could write your update using exists instead:
UPDATE customer c
SET has_orders = EXISTS (SELECT 1 FROM order o WHERE o.customer_id = c.customer_id);
The possible advantage of the above versus using IN
is that EXISTS
will stop as soon as it finds a match.
CodePudding user response:
Are you really sure you need to update the customer table asynchronously? I guess it might be a poor practice as the integrity of the data can be lost. Moreover this approach breakes normal form idea as the data is duplicated. It seems that a much better way is to retreive the value of has_orders
selecting the flag (using exists
statement proposed by @Tim-Biegeleisen or any other) from order
table on demand and joining it to the dataset from customers
table.
If the excess column is inevitable i would update it any time upon inserting/updating/deleting the order table and using affected customer id
only. This can be done in a trigger. It will be rather fast.