Creating data quality rules- and in this case, I'm doing a null check on a few columns of a table. What I'm trying to figure out, is how to get the total record\row count for the table to be queried once(to minimize stress on the server), but populate in all rows of the result set in the column TOTAL_COUNT. I believe that I need a left join- just trying to learn how & where to use it.
So the code that I have, so far is:
SELECT
CAST('RULE_ID_1') AS VARCHAR(255)) AS RULE_ID
CAST('DB_NAME_HERE') AS VARCHAR(255)) AS DB_NAME
CAST('1STCOLUMN') AS VARCHAR(255)) AS COLUMN_NAME
A.DEFECT_COUNT AS DEFECT_COUNT
FROM
(SELECT COUNT(1) AS DEFECT_COUNT
FROM [Name of Db]
WHERE [Name of column 1] IS NULL
AND DateField = DATEADD(DAY, -1, CAST(GETDATE() AS DATE))) A
UNION ALL
SELECT
CAST('RULE_ID_2') AS VARCHAR(255)) AS RULE_ID
CAST('DB_NAME_HERE') AS VARCHAR(255)) AS DB_NAME
CAST('2NDCOLUMN') AS VARCHAR(255)) AS COLUMN_NAME
A.DEFECT_COUNT AS DEFECT_COUNT
FROM
(SELECT COUNT(1) AS DEFECT_COUNT
FROM [Name of Db]
WHERE [Name of column 2] IS NULL
AND DateField = DATEADD(DAY, -1, CAST(GETDATE() AS DATE))) A
UNION ALL
SELECT
CAST('RULE_ID_3') AS VARCHAR(255)) AS RULE_ID
CAST('DB_NAME_HERE') AS VARCHAR(255)) AS DB_NAME
CAST('3RDCOLUMN') AS VARCHAR(255)) AS COLUMN_NAME
A.DEFECT_COUNT AS DEFECT_COUNT
FROM
(SELECT COUNT(1) AS DEFECT_COUNT
FROM [Name of Db]
WHERE [Name of column 3] IS NULL
AND DateField = DATEADD(DAY, -1, CAST(GETDATE() AS DATE))) A
LEFT JOIN
(SELECT 'DB_NAME_HERE' AS DB_NAME, COUNT(*) AS TOTAL_COUNT FROM [Name of Db] WHERE [What goes here?] IS NULL) B
ON
A.DB_NAME=B.DB_NAME
The output| that i need to report out:
RULE_ID | DB_NAME | COLUMN_NAME | DEFECT_COUNT(Nulls) | TOTAL_COUNT(#Rows) |
---|---|---|---|---|
RULE_ID_1 | DB_NAME_HERE | 1STCOLUMN | 5 | 100 |
RULE_ID_2 | DB_NAME_HERE | 2NDCOLUMN | 0 | 100 |
RULE_ID_3 | DB_NAME_HERE | 3RDCOLUMN | 2 | 100 |
CodePudding user response:
The way you've laid your query out is creating a lot of extra effort for both you and the SQL Server.
First, You can do most of in in a single query, if you adopt a very slightly different structure using "conditional aggregation"; that is COUNT()
with a CASE
expression inside it.
Then you can simply add another column for the TOTAL_COUNT
. Though, I'm not entirely sure what you want for that total, so I've included a few options...
SELECT
CAST('DB_NAME_HERE') AS VARCHAR(255)) AS DB_NAME,
CAST('1STCOLUMN') AS VARCHAR(255)) AS RULE_ID_1_COLUMN_NAME,
COUNT(CASE WHEN [Name of column 1] IS NULL THEN 1 END) AS RULE_ID_1_DEFECT_COUNT,
CAST('2NDCOLUMN') AS VARCHAR(255)) AS RULE_ID_2_COLUMN_NAME,
COUNT(CASE WHEN [Name of column 2] IS NULL THEN 1 END) AS RULE_ID_2_DEFECT_COUNT,
CAST('3RDCOLUMN') AS VARCHAR(255)) AS RULE_ID_3_COLUMN_NAME,
COUNT(CASE WHEN [Name of column 3] IS NULL THEN 1 END) AS RULE_ID_3_DEFECT_COUNT,
COUNT(*) AS TOTAL_DAILY_ROWS,
COUNT(CASE WHEN [Name of column 1] IS NULL
OR [Name of column 2] IS NULL
OR [Name of column 3] IS NULL THEN 1 END) AS TOTAL_DAILY_ROWS_WITH_NULLS,
(SELECT COUNT(*) FROM [Name of Db]) AS TABLE_TOTAL_ROWS
FROM
[Name of Db]
WHERE
DateField = DATEADD(DAY, -1, CAST(GETDATE() AS DATE))
Then, if you really must have your existing format, you can un-pivot it...
WITH
yesterday_summary AS
(
SELECT
CAST('DB_NAME_HERE') AS VARCHAR(255)) AS DB_NAME,
COUNT(CASE WHEN [Name of column 1] IS NULL THEN 1 END) AS RULE_ID_1_DEFECT_COUNT,
COUNT(CASE WHEN [Name of column 2] IS NULL THEN 1 END) AS RULE_ID_2_DEFECT_COUNT,
COUNT(CASE WHEN [Name of column 3] IS NULL THEN 1 END) AS RULE_ID_3_DEFECT_COUNT
FROM
[Name of Db]
WHERE
DateField = DATEADD(DAY, -1, CAST(GETDATE() AS DATE))
)
SELECT
y.DB_NAME,
p.RULE_ID,
p.COLUMN_NAME,
p.DEFECT_COUNT,
(SELECT COUNT(*) FROM [Name of Db]) AS TABLE_TOTAL_ROWS
FROM
yesterday_summary AS y
CROSS APPLY
(
VALUES
('RULE_ID_1', '1STCOLUMN', y.RULE_ID_1_DEFECT_COUNT),
('RULE_ID_2', '2NDCOLUMN', y.RULE_ID_2_DEFECT_COUNT),
('RULE_ID_3', '3RDCOLUMN', y.RULE_ID_3_DEFECT_COUNT)
)
AS p(RULE_ID, COLUMN_NAME, DEFECT_COUNT)