I have been given a task to find how many nulls there are for each column in a given table. The table has many columns (50-80, depending on the individual table). I would like the result pivoted so the column names are records, like this:
column_name | null_count |
---|---|
columnA | 253 |
columnB | 25 |
columnC | 0 |
columnD | 456 |
... | ... |
Currently, I do
SELECT 'columnA' as column_name, sum(case when columnA IS NULL then 1 else 0 end) null_count from [table] UNION
SELECT 'columnB', sum(case when columnB IS NULL then 1 else 0 end) from [table] UNION
...
for all the rows. This is kind of tedious and I would like to know if there is a more flexible way to do this in MS Sql Server management studio. Maybe something that can step through each record in [database].INFORMATION_SCHEMA.COLUMNS.
CodePudding user response:
Please try the following solution.
It is using SQL Server's XML and XQuery magic. No need for dynamic SQL.
We are leveraging a fact that FOR XML ...
clause omits columns with NULL values.
SQL
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.tbl;
-- DDL and sample data population, start
CREATE TABLE dbo.tbl (ID INT IDENTITY PRIMARY KEY, columnA INT, columnB VARCHAR(5), columnC INT);
INSERT dbo.tbl (columnA, columnB, columnC) VALUES
(1, NULL, NULL),
(2, 'city', NULL),
(NULL, 'cat', NULL),
(100, NULL, NULL);
-- DDL and sample data population, end
DECLARE @total_row_counter BIGINT = (SELECT COUNT_BIG(*) FROM dbo.tbl);
;WITH rs AS
(
SELECT column_name = c.value('local-name(.)', 'sysname')
FROM dbo.tbl AS p
CROSS APPLY (SELECT *
FROM dbo.tbl AS c
WHERE c.ID = p.ID
FOR XML PATH('root'), TYPE) AS t1(x)
CROSS APPLY x.nodes('/root/*') AS t2(c)
)
SELECT sch.column_name, null_counter = @total_row_counter - COUNT_BIG(rs.column_name)
FROM INFORMATION_SCHEMA.COLUMNS AS sch
LEFT OUTER JOIN rs ON sch.COLUMN_NAME = rs.column_name
WHERE TABLE_CATALOG = 'TEMPDB'
AND TABLE_SCHEMA = 'dbo'
AND TABLE_NAME = 'tbl'
GROUP BY sch.column_name
ORDER BY sch.column_name;
SQL #2
It covers table column names with spaces. A minor FOR XML ...
clause change automatically converts spaces into _x0020_
in the XML element names.
<root>
<ID>3</ID>
<column_x0020_B>cat</column_x0020_B>
</root>
The rest is identical.
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.tbl;
-- DDL and sample data population, start
CREATE TABLE dbo.tbl (ID INT IDENTITY PRIMARY KEY, columnA INT, [column B] VARCHAR(5), columnC INT);
INSERT dbo.tbl (columnA, [column B], columnC) VALUES
(1, NULL, NULL),
(2, 'city', NULL),
(NULL, 'cat', NULL),
(100, NULL, NULL);
-- DDL and sample data population, end
DECLARE @total_row_counter BIGINT = (SELECT COUNT_BIG(*) FROM dbo.tbl);
;WITH rs AS
(
SELECT column_name = REPLACE(c.value('local-name(.)', 'sysname'), '_x0020_', SPACE(1))
FROM dbo.tbl AS p
CROSS APPLY (SELECT *
FROM dbo.tbl AS [root]
WHERE [root].ID = p.ID
FOR XML AUTO, ELEMENTS, TYPE) AS t1(x)
CROSS APPLY x.nodes('/root/*') AS t2(c)
)
SELECT sch.column_name, null_counter = @total_row_counter - COUNT_BIG(rs.column_name)
FROM INFORMATION_SCHEMA.COLUMNS AS sch
LEFT OUTER JOIN rs ON sch.COLUMN_NAME = rs.column_name
WHERE TABLE_CATALOG = 'TEMPDB'
AND TABLE_SCHEMA = 'dbo'
AND TABLE_NAME = 'tbl'
GROUP BY sch.column_name
ORDER BY sch.column_name;
Output
------------- --------------
| column_name | null_counter |
------------- --------------
| columnA | 1 |
| columnB | 2 |
| columnC | 4 |
| ID | 0 |
------------- --------------