I have 3 tables. Each of them contain (among others) the columns UserID
and Period
.
In order to get all the Periods
associated with a specific user, I'm using unions like this:
# Original Query
SELECT Period FROM table_1 WHERE UserID = :user
UNION SELECT Period FROM table_2 WHERE UserID = :user
UNION SELECT Period FROM table_3 WHERE UserID = :user
ORDER BY Period ASC;
The results I get are what I expect. Since I'm not using UNION ALL
, all duplicate results get automatically filtered out.
The debugger this project is using thinks the query MIGHT be slow. This is the query plan for it-
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | table_1 | ref | UserID | UserID | 4 | const | 29 | NULL |
2 | UNION | table_2 | ref | UserID | UserID | 4 | const | 5 | NULL |
3 | UNION | table_3 | ref | UserID | UserID | 4 | const | 4 | NULL |
NULL | UNION RESULT | <union1,2,3> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary |
I thought maybe using SELECT DISTINCT
instead of SELECT
would help since each SELECT
result set would get smaller before the UNION
.
# Updated Query
SELECT DISTINCT Period FROM table_1 WHERE UserID = :user
UNION SELECT DISTINCT Period FROM table_2 WHERE UserID = :user
UNION SELECT DISTINCT Period FROM table_3 WHERE UserID = :user
ORDER BY Period ASC;
However this only seems to make the query plan use more temp tables.
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | table_1 | ref | UserID | UserID | 4 | const | 29 | Using where; Using temporary |
2 | UNION | table_2 | ref | UserID | UserID | 4 | const | 5 | Using where; Using temporary |
3 | UNION | table_3 | ref | UserID | UserID | 4 | const | 4 | Using where; Using temporary |
NULL | UNION RESULT | <union1,2,3> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary |
I've also tried running this same query repeating lines 2-3 (the UNION
s) across 500 lines to see if there were noticeable differences. Time-wise, the results were pretty similar.
Analyzing the queries, using DISTINCT seemed to make the queries take less time returning the results. However, the query now had to clean up all temp tables, which ended up making the final time for both queries very similar.
My question(s) to all MySQL gurus, is there a way to make this query faster or more optimized without changing the table structures?
Maybe with some indexes? (On that note, Period
is a VARCHAR and the tables are all MyISAM
)
Do the order of the tables matter in the query? I purposedly tried to place the biggest table first.
CodePudding user response:
# Original Query
SELECT Period FROM table_1 WHERE UserID = :user
UNION SELECT Period FROM table_2 WHERE UserID = :user
UNION SELECT Period FROM table_3 WHERE UserID = :user
ORDER BY Period ASC;
It is processed as:
- Execute subquery 1 (output A)
- Execute subquery 2 (output B)
- Combine outputs A and B (output С)
- Sort combined output C and remove duplicates (output D)
- Execute subquery 3 (output E)
- Combine outputs C and E (output F)
- Sort combined output E and remove duplicates (output G)
- Return output G
# Updated Query
SELECT DISTINCT Period FROM table_1 WHERE UserID = :user
UNION SELECT DISTINCT Period FROM table_2 WHERE UserID = :user
UNION SELECT DISTINCT Period FROM table_3 WHERE UserID = :user
ORDER BY Period ASC;
It is processed as:
- Execute subquery 1 (output A)
- Sort output A and remove duplicates (output B)
- Execute subquery 2 (output C)
- Sort output C and remove duplicates (output D)
- Combine outputs B and D (output E)
- Sort output E and remove duplicates (output F)
- Execute subquery 3 (output G)
- Sort output G and remove duplicates (output H)
- Combine outputs F and H (output I)
- Sort output I and remove duplicates (output J)
- Return output J
# Recommended Query
SELECT DISTINCT Period
FROM ( SELECT Period FROM table_1 WHERE UserID = :user
UNION ALL
SELECT Period FROM table_2 WHERE UserID = :user
UNION ALL
SELECT Period FROM table_3 WHERE UserID = :user ) AS total
ORDER BY Period ASC;
- Execute subquery 1 (output A)
- Execute subquery 2 (output B)
- Combine outputs A and B (output С)
- Execute subquery 3 (output E)
- Combine outputs (output F)
- Sort combined output and remove duplicates (output G)
- Return output G
CodePudding user response:
If you have only a few dozen rows in the result, it will be hard to notice whether it is "slow".
Most of the tradeoffs discussed here lead to a conclusion of "it depends".
Suppose there were 1000 rows for the user in each table, but only 10 rows in the result. One formulation would be faster than the other.
Suppose there were almost no dups between the tables. Now the other formulation might be faster.
Since you desire do have the de-dupping, there will be a temp table somewhere.
I suggest you write the query in whatever way is "simplest" or "most logical" to you.
Also, if you want performance, switch from MyISAM to InnoDB. In almost all benchmarks, InnoDB is at least as fast.