Home > Back-end >  MS SQL Server counting nulls by column in a table
MS SQL Server counting nulls by column in a table

Time:07-07

I have been given a task to find how many nulls there are for each column in a given table. The table has many columns (50-80, depending on the individual table). I would like the result pivoted so the column names are records, like this:

column_name null_count
columnA 253
columnB 25
columnC 0
columnD 456
... ...

Currently, I do

SELECT 'columnA' as column_name, sum(case when columnA IS NULL then 1 else 0 end) null_count from [table] UNION
SELECT 'columnB', sum(case when columnB IS NULL then 1 else 0 end) from [table] UNION
...

for all the rows. This is kind of tedious and I would like to know if there is a more flexible way to do this in MS Sql Server management studio. Maybe something that can step through each record in [database].INFORMATION_SCHEMA.COLUMNS.

CodePudding user response:

Please try the following solution.

It is using SQL Server's XML and XQuery magic. No need for dynamic SQL.

We are leveraging a fact that FOR XML ... clause omits columns with NULL values.

SQL

USE tempdb;
GO

DROP TABLE IF EXISTS dbo.tbl;

-- DDL and sample data population, start
CREATE TABLE dbo.tbl (ID INT IDENTITY PRIMARY KEY, columnA INT, columnB VARCHAR(5), columnC INT);
INSERT dbo.tbl (columnA, columnB, columnC) VALUES
(1, NULL, NULL),
(2, 'city', NULL),
(NULL, 'cat', NULL),
(100, NULL, NULL);
-- DDL and sample data population, end

DECLARE @total_row_counter BIGINT = (SELECT COUNT_BIG(*) FROM dbo.tbl);

;WITH rs AS
(
    SELECT column_name = c.value('local-name(.)', 'sysname')
    FROM dbo.tbl AS p
        CROSS APPLY (SELECT * 
                    FROM dbo.tbl AS c 
                    WHERE c.ID = p.ID
                    FOR XML PATH('root'), TYPE) AS t1(x)
        CROSS APPLY x.nodes('/root/*') AS t2(c)
)
SELECT sch.column_name, null_counter = @total_row_counter - COUNT_BIG(rs.column_name)
FROM INFORMATION_SCHEMA.COLUMNS AS sch
    LEFT OUTER JOIN rs ON sch.COLUMN_NAME = rs.column_name
WHERE TABLE_CATALOG = 'TEMPDB'
   AND TABLE_SCHEMA = 'dbo'
   AND TABLE_NAME = 'tbl'
GROUP BY sch.column_name
ORDER BY sch.column_name;

SQL #2

It covers table column names with spaces. A minor FOR XML ... clause change automatically converts spaces into _x0020_ in the XML element names.

<root>
  <ID>3</ID>
  <column_x0020_B>cat</column_x0020_B>
</root>

The rest is identical.

USE tempdb;
GO

DROP TABLE IF EXISTS dbo.tbl;

-- DDL and sample data population, start
CREATE TABLE dbo.tbl (ID INT IDENTITY PRIMARY KEY, columnA INT, [column B] VARCHAR(5), columnC INT);
INSERT dbo.tbl (columnA, [column B], columnC) VALUES
(1, NULL, NULL),
(2, 'city', NULL),
(NULL, 'cat', NULL),
(100, NULL, NULL);
-- DDL and sample data population, end
        
DECLARE @total_row_counter BIGINT = (SELECT COUNT_BIG(*) FROM dbo.tbl);

;WITH rs AS
(
    SELECT column_name = REPLACE(c.value('local-name(.)', 'sysname'), '_x0020_', SPACE(1))
    FROM dbo.tbl AS p
        CROSS APPLY (SELECT * 
                    FROM dbo.tbl AS [root]
                    WHERE [root].ID = p.ID
                    FOR XML AUTO, ELEMENTS, TYPE) AS t1(x)
        CROSS APPLY x.nodes('/root/*') AS t2(c)
)
SELECT sch.column_name, null_counter = @total_row_counter - COUNT_BIG(rs.column_name)
FROM INFORMATION_SCHEMA.COLUMNS AS sch
    LEFT OUTER JOIN rs ON sch.COLUMN_NAME = rs.column_name
WHERE TABLE_CATALOG = 'TEMPDB'
   AND TABLE_SCHEMA = 'dbo'
   AND TABLE_NAME = 'tbl'
GROUP BY sch.column_name
ORDER BY sch.column_name;

Output

 ------------- -------------- 
| column_name | null_counter |
 ------------- -------------- 
| columnA     |            1 |
| columnB     |            2 |
| columnC     |            4 |
| ID          |            0 |
 ------------- -------------- 
  • Related