Redshift SQL Optimization - UNION ALL and DISTINCT-CodePudding

Suppose I have a table with a large volume. Which of the following would be more optimized?

SELECT DISTINCT shipping_address_id AS address_id FROM table_A
UNION
SELECT DISTINCT billing_address_id AS address_id FROM table_A

vs.

SELECT DISTINCT address_id FROM
(
SELECT shipping_address_id AS address_id FROM table_A
UNION ALL 
SELECT billing_address_id AS address_id FROM table_A
)

CodePudding user response：

I believe you should use:

SELECT DISTINCT shipping_address_id AS address_id FROM table_A
UNION
SELECT DISTINCT billing_address_id AS address_id FROM table_A

This one is naturally easier because it contains less characters (more useful for files) and is more human readable (you want to know what it means).

CodePudding user response：

Depending on what percentage is the unique addresses to total addresses you can find that GROUP BY will perform better. If the percentage is high is often a better way to achieve uniqueness. Try:

SELECT address_id FROM
(
  SELECT shipping_address_id AS address_id FROM table_A
  UNION ALL 
  SELECT billing_address_id AS address_id FROM table_A
)
GROUP BY address_id;

Also UNION (vs UNION ALL) can be slow and running 3 distinct operations would lead me to believe that this would be the worst performing. Let us know which is best, I love seeing real experimental data results.