Suppose I have a table with a large volume. Which of the following would be more optimized?
SELECT DISTINCT shipping_address_id AS address_id FROM table_A
UNION
SELECT DISTINCT billing_address_id AS address_id FROM table_A
vs.
SELECT DISTINCT address_id FROM
(
SELECT shipping_address_id AS address_id FROM table_A
UNION ALL
SELECT billing_address_id AS address_id FROM table_A
)
CodePudding user response:
I believe you should use:
SELECT DISTINCT shipping_address_id AS address_id FROM table_A
UNION
SELECT DISTINCT billing_address_id AS address_id FROM table_A
This one is naturally easier because it contains less characters (more useful for files) and is more human readable (you want to know what it means).
CodePudding user response:
Depending on what percentage is the unique addresses to total addresses you can find that GROUP BY will perform better. If the percentage is high is often a better way to achieve uniqueness. Try:
SELECT address_id FROM
(
SELECT shipping_address_id AS address_id FROM table_A
UNION ALL
SELECT billing_address_id AS address_id FROM table_A
)
GROUP BY address_id;
Also UNION (vs UNION ALL) can be slow and running 3 distinct operations would lead me to believe that this would be the worst performing. Let us know which is best, I love seeing real experimental data results.