Home > database >  Use UNION ALL and GROUP BY to combine 3 TSQL statements into one query
Use UNION ALL and GROUP BY to combine 3 TSQL statements into one query

Time:02-02

I have 3 TSQL queries that work individually, but I need to combine them into one single query. In Microsoft Access, they were successfully combined using the following SQL statement; however, I am trying to reverse engineer all the data using TSQL. How can I do the same thing in TSQL? Keep in mind that ap.property_id and af1.property_id are from different tables.

SELECT property_id]
FROM [Insured (see TSQL Statement 1)] 
GROUP BY property_id
UNION ALL
SELECT property_id
FROM [Uninsured (see TSQL Statement 2)] 
GROUP BY property_id
UNION ALL
SELECT property_id
FROM [IE > 90 Days (see TSQL Statement 3)] 
GROUP BY property_id;

TSQL Statement 1 (Insured)

SELECT
    rtrim(STR_REPLACE(ap.region_name,' Region','')) AS 'Region',
    ap.property_id,
    ap.is_under_management_ind,
    ap.is_insured_ind,
    ap.has_active_assistance_ind,
    af1.is_pipeline_ind,

    CASE
        WHEN ap.is_insured_ind = "Y" THEN "Insured"
        ELSE "Other"
    END AS 'Classification 1',

    CASE
        WHEN ap.is_insured_ind = "Y" AND ap.is_under_management_ind = "Y" AND af1.is_pipeline_ind = "N" THEN "Insured Only"
        WHEN ap.is_insured_ind = "Y" AND ap.is_under_management_ind = "Y" AND af1.is_pipeline_ind = "Y" AND ap.has_active_assistance_ind = "Y" THEN "Insured and Assisted"
        ELSE "Other Insured"
    END AS 'Classification 2'

FROM rems_dmart.dbo.active_financing af1
    INNER JOIN rems_dmart.dbo.active_property ap
    ON af1.property_id = ap.property_id

GROUP BY
    ap.region_name,
    ap.property_id,
    ap.is_under_management_ind,
    ap.is_insured_ind,
    ap.has_active_assistance_ind,
    af1.is_pipeline_ind

HAVING
    ( ap.region_name <> "OHP"
    AND ap.is_under_management_ind = "Y"
    AND ap.is_insured_ind = "Y"
    AND af1.is_pipeline_ind = "N" )
    OR
    ( ap.region_name <> "OHP"
    AND ap.is_under_management_ind = "Y"
    AND ap.is_insured_ind = "Y"
    AND af1.is_pipeline_ind = "Y"
    AND ap.has_active_assistance_ind = "Y" )

TSQL Statement 2 (Uninsured)

SELECT
    rtrim(STR_REPLACE(ap.region_name,' Region','')) AS 'Region',
    ap.property_id,
    ap.is_under_management_ind,
    ap.is_insured_ind,
    ap.has_use_restriction_ind,
    ap.has_active_irp_ind,
    ap.has_active_assistance_ind,
    ap.is_service_coordinator_ind,

    CASE
        WHEN ap.is_insured_ind = "N" THEN "Uninsured"
        ELSE "Other"
    END AS 'Classification 1',

    CASE
        WHEN ap.is_insured_ind = "N" AND ap.has_active_assistance_ind = "Y" THEN "Assisted Only"
        WHEN ap.is_insured_ind = "N" AND ap.has_active_assistance_ind = "N" AND ap.has_use_restriction_ind = "Y" THEN "Use Agreement Only"
        WHEN ap.is_insured_ind = "N" AND ap.has_active_assistance_ind = "N" AND ap.has_use_restriction_ind = "N" AND ap.has_active_irp_ind = "Y" AND ap.is_service_coordinator_ind = "N" THEN "IRP"
        WHEN ap.is_insured_ind = "N" AND ap.has_active_assistance_ind = "N" AND ap.has_use_restriction_ind = "N" AND ap.has_active_irp_ind = "N" AND ap.is_service_coordinator_ind = "Y" THEN "Service Coordinator"
        WHEN ap.is_insured_ind = "N" AND ap.has_active_assistance_ind = "N" AND ap.has_use_restriction_ind = "N" AND ap.has_active_irp_ind = "Y" AND ap.is_service_coordinator_ind = "Y" THEN "IRP & Service Coordinator"
        ELSE "Other Uninsured"
    END AS 'Classification 2'

FROM rems_dmart.dbo.active_property ap

GROUP BY
    ap.region_name,
    ap.property_id,
    ap.is_under_management_ind,
    ap.is_insured_ind,
    ap.has_use_restriction_ind,
    ap.has_active_irp_ind,
    ap.has_active_assistance_ind,
    ap.is_service_coordinator_ind

HAVING
    ( ap.region_name <> "OHP"
    AND ap.is_under_management_ind = "Y"
    AND ap.is_insured_ind = "N"
    AND ap.has_active_assistance_ind = "Y" )
    OR 
    ( ap.region_name <> "OHP"
    AND ap.is_insured_ind = "N"
    AND ap.has_use_restriction_ind = "Y" )

TSQL Statement 3 (IE > 90 Days)

SELECT
    rtrim(STR_REPLACE(ap.region_name,' Region','')) AS 'Region',
    af1.property_id,
    af1.initial_endorsement_date,
    af1.final_endorsement_date,
    ap.is_under_management_ind,

    CASE
        WHEN af1.initial_endorsement_date IS NOT NULL AND af1.final_endorsement_date IS NULL THEN "IE > 90 Days"
        ELSE "Other"
    END AS 'Classification 1',

    CASE
        WHEN af1.initial_endorsement_date IS NOT NULL AND af1.final_endorsement_date IS NULL AND ap.is_under_management_ind = "Y" THEN "IE > 90 Days_Under Mgmt"
        WHEN af1.initial_endorsement_date IS NOT NULL AND af1.final_endorsement_date IS NULL AND ap.is_under_management_ind = "N" THEN "IE > 90 Days_Not Under Mgmt"
        ELSE "Other IE > 90 Days"
    END AS 'Classification 2'

FROM rems_dmart.dbo.active_property ap
    INNER JOIN rems_dmart.dbo.active_financing af1
        ON ap.property_id = af1.property_id

WHERE
    ( ap.region_name <> "OHP"
    AND DATEDIFF(DAY, af1.initial_endorsement_date, CONVERT(VARCHAR(20), GETDATE(), 101)) > 90
    AND af1.final_endorsement_date IS NULL 
    AND ap.is_under_management_ind = "N" )
    OR 
    ( ap.region_name <> "OHP"
    AND DATEDIFF(DAY, af1.initial_endorsement_date, CONVERT(VARCHAR(20), GETDATE(), 101)) > 90
    AND af1.final_endorsement_date IS NULL 
    AND ap.is_under_management_ind = "Y" )

CodePudding user response:

In the first place, the final results you want are only the property_id values. A natural first step, then, would be to remove everything else from the select lists of (copies of) the TSQL queries. That already makes them much simpler.

In the second place, the first two original TSQL queries use GROUP BY but not any aggregate functions. They do include all the grouping columns in their select lists. This has one primary and at least one secondary effect:

  • the primary effect is de-duplication of the rows, equivalent to a select distinct
  • the secondary effect may be the real point: the filter conditions are applied via a having clause, such that (logically) they are are tested after row de-duplication.

However, if there are no duplicate property_id values in either of the two base tables then grouping and distinct selection in these two queries are both moot, as inclusion of one of the property_ids among the grouping columns means that there cannot be any groups with more than one row, nor would there be duplicate result rows in any case.

This suggests removing the GROUP BY clauses and converting the HAVING clauses to WHERE clauses.

In the third place, the three queries are all selecting from the same join of the same two tables. (Technically, the second is selecting from a single one of the tables, but since there are no duplicates in the join columns, it's pretty much the same thing.) Additionally, although you claim that they are selecting property_id columns from different tables, that's a technicality. Query 3 could be modified to select ap.property_id instead of af1.property_id without changing the results, because these are guaranteed to be equal in every row of the rowset.

Thus, these three queries can usefully be combined into a single one without relying on a union. They can be adjusted to be ordinary queries with the same select list and source row set, so all that remains is unifying the filter conditions. Before working on the filter condition, that's this:

-- can be made SELECT DISTINCT if duplicates are present after all:
SELECT ap.property_id

FROM rems_dmart.dbo.active_financing af1
    INNER JOIN rems_dmart.dbo.active_property ap
        ON af1.property_id = ap.property_id

WHERE
    ( ap.region_name <> "OHP"
    AND ap.is_under_management_ind = "Y"
    AND ap.is_insured_ind = "Y"
    AND af1.is_pipeline_ind = "N" )
    OR
    ( ap.region_name <> "OHP"
    AND ap.is_under_management_ind = "Y"
    AND ap.is_insured_ind = "Y"
    AND af1.is_pipeline_ind = "Y"
    AND ap.has_active_assistance_ind = "Y" )
    OR
    ( ap.region_name <> "OHP"
    AND ap.is_under_management_ind = "Y"
    AND ap.is_insured_ind = "N"
    AND ap.has_active_assistance_ind = "Y" )
    OR 
    ( ap.region_name <> "OHP"
    AND ap.is_insured_ind = "N"
    AND ap.has_use_restriction_ind = "Y" )
    OR
    ( ap.region_name <> "OHP"
    AND DATEDIFF(DAY, af1.initial_endorsement_date, CONVERT(VARCHAR(20), GETDATE(), 101)) > 90
    AND af1.final_endorsement_date IS NULL 
    AND ap.is_under_management_ind = "N" )
    OR 
    ( ap.region_name <> "OHP"
    AND DATEDIFF(DAY, af1.initial_endorsement_date, CONVERT(VARCHAR(20), GETDATE(), 101)) > 90
    AND af1.final_endorsement_date IS NULL 
    AND ap.is_under_management_ind = "Y" )

Again, as long as there are no duplicate property_id values in either table, there is no need for a GROUP BY or a DISTINCT selection, but if you have to accommodate duplicates after all then you can just change the SELECT to SELECT DISTINCT.

That filter condition could be better factored, but at this point it's highly reflective of the original queries. That may make it easier to validate or check. I would consider at least lifting the condition on ap.region_name out of all the the individual alternatives, but I leave that and any other refactoring you want to perform to you.

CodePudding user response:

Here is an example of how you could combine three TSQL statements into one query using the UNION ALL operator and GROUP BY clause:

WITH CTE AS ( SELECT column1, column2, SUM(column3) as Total_Column3 FROM table1 WHERE column4 = 'value1' GROUP BY column1, column2

UNION ALL

SELECT column1, column2, SUM(column3) as Total_Column3 FROM table2 WHERE column5 = 'value2' GROUP BY column1, column2

UNION ALL

SELECT column1, column2, SUM(column3) as Total_Column3 FROM table3 WHERE column6 = 'value3' GROUP BY column1, column2 ) SELECT column1, column2, SUM(Total_Column3) as Grand_Total FROM CTE GROUP BY column1, column2

This query creates a common table expression (CTE) from the results of three SELECT statements, each of which aggregates data from a different table and filters it based on a specific condition. The UNION ALL operator combines the results of these SELECT statements into a single result set. Finally, the query uses the GROUP BY clause to aggregate the Grand Total of Total_Column3 by column1 and column2.

CodePudding user response:

Your code can be translated into

--1
SELECT
        AP.property_id
FROM rems_dmart.dbo.active_financing AS AF
INNER JOIN rems_dmart.dbo.active_property AS AP
    ON AF.property_id = AP.property_id
WHERE AP.region_name <> 'OHP' AND AP.is_under_management_ind = 'Y' AND AP.is_insured_ind = 'Y' AND AF.is_pipeline_ind = 'N'

UNION 

--2
SELECT
        property_id
FROM rems_dmart.dbo.active_property
WHERE region_name <> 'OHP' AND is_insured_ind = 'N' AND has_active_assistance_ind = 'Y'

UNION

--3
SELECT
        AF.property_id
FROM rems_dmart.dbo.active_property AS AP
INNER JOIN rems_dmart.dbo.active_financing AS AF
    ON AP.property_id = AF.property_id
WHERE AP.region_name <> 'OHP' AND DATEDIFF(DAY, AF.initial_endorsement_date, CONVERT(VARCHAR(20), GETDATE(), 101)) > 90 AND AF.final_endorsement_date IS NULL AND AP.is_under_management_ind IN ('N','Y')

or even further to

SELECT
        AP.property_id
FROM rems_dmart.dbo.active_financing AS AF
INNER JOIN rems_dmart.dbo.active_property AS AP
    ON AF.property_id = AP.property_id
WHERE AP.region_name <> 'OHP' AND
(
    (AP.is_under_management_ind = 'Y' AND AP.is_insured_ind = 'Y' AND AF.is_pipeline_ind = 'N') 
    OR 
    (DATEDIFF(DAY, AF.initial_endorsement_date, CONVERT(VARCHAR(20), GETDATE(), 101)) > 90 AND AF.final_endorsement_date IS NULL AND AP.is_under_management_ind IN ('N','Y'))
)

UNION 

SELECT
        property_id
FROM rems_dmart.dbo.active_property
WHERE region_name <> 'OHP' AND is_insured_ind = 'N' AND has_active_assistance_ind = 'Y'

@john-bollinger said many good things for your query, so try to use his answer as well. Also you need to have good indices on the two table for your query to run smooth.

  • Related