TSQL: Find columns that deviate from default value (as performant as possible)-CodePudding

I'm an intermediate with MS SQL so i'm looking for some feedback on how to optimize my current script.

The Objective is to check if customers of 3rd-party software maintain certain fields in their database or if those fields are always holding their default values.

So, this is a sample of what i came up with: (I like the temporary table because it allows me to do some further inqueries)

create table #TableUsage
(
    TableName varchar(100) NOT NULL,
    FieldName varchar(100) NOT NULL,
    Used bit NOT NULL
)


insert into #TableUsage (TableName, FieldName, Used)
values
('Value Entry', 'Description',            iif(exists (select top 1 [Entry No_] from [CRONUS CH$Value Entry$437dbf0e-84ff-417a-965d-ed2bb9650972] r where r.[Description] <> ''),1,0)),
('Value Entry', 'Item Ledger Entry Type', iif(exists (select top 1 [Entry No_] from [CRONUS CH$Value Entry$437dbf0e-84ff-417a-965d-ed2bb9650972] r where r.[Item Ledger Entry Type] <> 0),1,0)),
('Value Entry', 'Valued Quantity',        iif(exists (select top 1 [Entry No_] from [CRONUS CH$Value Entry$437dbf0e-84ff-417a-965d-ed2bb9650972] r where r.[Valued Quantity] <> 0),1,0)),
('Value Entry', 'Type',                   iif(exists (select top 1 [Entry No_] from [CRONUS CH$Value Entry$437dbf0e-84ff-417a-965d-ed2bb9650972] r where r.[Type] <> 0),1,0))

select * from #TableUsage where Used = 0

I'm generating those SQL Queries from other datasets. So, it doesn't have to be generic... But I'm not really confident that the iif(exists (select top 1 [Entry No_] from [CRONUS CH$Value Entry$437dbf0e-84ff-417a-965d-ed2bb9650972] r where r.[Type] <> 0),1,0) element for each field in a table is the most efficient way to do this.

(And it has to analyze more than 200 Tables, 3000 Fields on databases up to 1.5TB in size)

Thanks for your input in advance.

CodePudding user response：

EXISTS is indeed the fastest way to check whether a deviating value exists. (You don't have to teach the DBMS that EXISTS means it suffices to find one row, though; you can remove the TOP clause :-)

With EXISTS the DBMS can even use an index, if one exists. But you are not looking for a deviating value in one column of the table. You select column by column, and while this is fast for columns that are indexed, the DBMS will have to read the whole table for columns that are not. Let's say your table has then columns, of which three are indexed. That means you have three index-only reads plus seven full table scans.

You don't want to read the whole table again and again.

Instead read the table once with conditional aggregation. No index will be used, but the table only read once:

create table all_columns as
select
  max(case when [Description]            <> '' then 1 else 0 end) as description_used,
  max(case when [Item Ledger Entry Type] <> 0  then 1 else 0 end) as ilet_used,
  max(case when [Valued Quantity]        <> 0  then 1 else 0 end) as valued_quantity_used,
  max(case when [Type]                   <> 0  then 1 else 0 end) as type_used
from [CRONUS CH$Value Entry$437dbf0e-84ff-417a-965d-ed2bb9650972];

Then use this result row to fill your table:

insert into #TableUsage (TableName, FieldName, Used)
select 'Value Entry', 'Description',            description_used     from all_columns
union all
select 'Value Entry', 'Item Ledger Entry Type', ilet_used            from all_columns
union all
select 'Value Entry', 'Valued Quantity',        valued_quantity_used from all_columns
union all
select 'Value Entry', 'Type',                   type_used            from all_columns;

(Disclaimer: I don't use SQL Server, so there may be a more SQL Server like way to do this, but you get the idea :-)