Home > Software design >  Redshift SQL Optimization - UNION ALL and DISTINCT
Redshift SQL Optimization - UNION ALL and DISTINCT

Time:11-23

Suppose I have a table with a large volume. Which of the following would be more optimized?

SELECT DISTINCT shipping_address_id AS address_id FROM table_A
UNION
SELECT DISTINCT billing_address_id AS address_id FROM table_A

vs.

SELECT DISTINCT address_id FROM
(
SELECT shipping_address_id AS address_id FROM table_A
UNION ALL 
SELECT billing_address_id AS address_id FROM table_A
)

CodePudding user response:

I believe you should use:

SELECT DISTINCT shipping_address_id AS address_id FROM table_A
UNION
SELECT DISTINCT billing_address_id AS address_id FROM table_A

This one is naturally easier because it contains less characters (more useful for files) and is more human readable (you want to know what it means).

CodePudding user response:

Depending on what percentage is the unique addresses to total addresses you can find that GROUP BY will perform better. If the percentage is high is often a better way to achieve uniqueness. Try:

SELECT address_id FROM
(
  SELECT shipping_address_id AS address_id FROM table_A
  UNION ALL 
  SELECT billing_address_id AS address_id FROM table_A
)
GROUP BY address_id;

Also UNION (vs UNION ALL) can be slow and running 3 distinct operations would lead me to believe that this would be the worst performing. Let us know which is best, I love seeing real experimental data results.

  • Related