Should I use many joins inside unions or pull them out using subqueries?-CodePudding

So, which should be more efficient?

I have a need to do 16 unions and each one joins with the same view. I need the total from 16 different views which are totaling values from 16 different tables. I need the data in the following format... project category total

1         rentals    $10
1         equipment  $200
2         rentals    $10
2         equipment  $300

Should I use subqueries and only have one join at the 'end' or keep the joins within each union?

view_project is a view on a table view_categoryX is a view which totals line items for the category by project_id. There are separate tables for each of these (not the best design, but it is out of scope to redesign the table structure).

First option

    select view_project.project_name, view_project.num, view_project.fiscal_year,
       view_category1.name, view_category1.total from view_category1 inner join view_project on view_project.project_id = view_category1.project_id 

where view_project.is_open = 1
union all
select view_project.project_name, view_project.num, view_project.fiscal_year,
       view_category2.name, view_category2.total
from view_category2
inner join view_project on view_project.project_id = view_category2.project_id
where view_project.is_open = 1
...
/* This continues on with the same pattern for a total of 16 unions */

or second option

select view_project.project_name, view_project.num, view_project.fiscal_year,
       view_category_total.name, view_category_total.total
       from view_project
       inner join 
       (
       select category_name, total from view_category1
       union ALL
       select category_name, total from view_category2
       ...
       /* unions continue on for a total of 16 unions */
       ) view_category_total
       On view_project.project_id = view_category_total.project_id
       where view_project.is_open = 1

CodePudding user response：

You may use CTE for encapsulating UNION section, then join it with other tables. like this SQL Snip code:

`WITH main AS (
SELECT view_project.project_name
    ,view_project.num
    ,view_project.fiscal_year
    ,view_category1.name
    ,view_category1.total
    ,project_id
FROM view_category1
UNION
SELECT view_project.project_name
    ,view_project.num
    ,view_project.fiscal_year
    ,view_category2.name
    ,view_category2.total
    ,project_id
FROM view_category2
) SELECT * FROM main INNER JOIN view_project ON view_project.project_id = main.project_id WHERE view_project.is_open = 1`

CodePudding user response：

Short answer (IMO): This depends

I kinda learned the hard way: There rarely is such a thing in SQL Server as "reliably predictable performance optimizations". True, there some general guidelines, but in the end: You kinda have to test it out (yeah, I know you did not want to hear that). Alternatively, you would need to provide way more info, as all of that has influence on performance:

Partitioning
Indices
Number of rows
Types and sizes of included columns
Available RAM and processor cores
General load of the server
Disk size and speed
Statistics
All sorts of weird server settings

and probably a lot more that I did not think of atm. The problem is, SQL Server is not always behaving as it should. For example, if you chain too many sub-queries, CTEs,... performance often crumbles to a halt and can be fixed by storing the sub-results in temporary tables (which btw. is exactly what SQL Server does if you use those sub-queries, CTEs etc.). In the end that has to do with what exact query plan SQL Server generates for you. Or, in the end, my suggestion (that I have been executing for quite a while with a quite good results):

Learn which possible options (performance-wise) exist that you could use.
Try it out and measure the results (first with test loads, then with production loads! - production can sometimes yield vastly different results due to usually far better servers, but also way more data and general server load, restrictive load balancers etc.)

As a last note: Since you are doing this on views (which sometimes can confuse the query plan generator as well), there are even more things you may consider, like:

store the results from the union all in a temp table.

or: copy the first table into a new table, then insert the other ones (probably only faster if those views contain really complex queries)

only then join the data from the temp table (or regular table) with view_project. However this also depends on whether that query shall be executable in parallel (like in a stored procedure that can be called by multiple users at the same time) ...