Rewriting the query-CodePudding

I have a table with the following sample data. The table actually contains more than 10 million rows.

tableid	Id	type
1	1	su1
2	2	su1
3	2	su2
4	3	su3
5	4	su1

I have to get a count of all the ids that only have type su1. If the id has su1 but also another type then it should not be counted. This is the query I came up with.

Select Count(*) From (
Select id
From table t
Where exists (select null from table t1 where t.id = t1.id and t1.type = 'su1')
Group by id
Having Count(*) = 1) a

tableid is the primary key. Id has a non-clustered index on it. Are there any other ways of writing this query?

CodePudding user response：

Given this table and sample data:

CREATE TABLE dbo.[table]
(
  tableid int, 
  Id      int, 
  type    char(3), 
  INDEX   IX_table CLUSTERED (Id, type)
);

INSERT dbo.[table](tableid, Id, type) VALUES
(1, 1,  'su1'),
(2, 2,  'su1'),
(3, 2,  'su2'),
(4, 3,  'su3'),
(5, 4,  'su1');

One way would be:

;WITH agg AS
(
  SELECT tableid, Id, type, 
    mint = MIN(Type) OVER (PARTITION BY Id),
    maxt = MAX(Type) OVER (PARTITION BY Id)
  FROM dbo.[table]
)
SELECT tableid, Id, type 
  FROM agg 
  WHERE mint = maxt AND mint = 'su1';

If your clustered index is on Id, type this will allow for a single clustered index scan:

Though it is a little messy with some spools that we might not want. How about David's suggestion (assuming you're on SQL Server 2017 or better):

SELECT tableid = MIN(tableid), Id
  FROM dbo.[table]
  GROUP BY Id 
  HAVING STRING_AGG(type, ',') = 'su1';

Oh yeah, that's much better:

Examples on db<>fiddle

CodePudding user response：

I'm not entirely sure why you have Having Count(*) = 1 as it doesn't seem to be reflected in the requirements.

But this query is much better written as follows

SELECT COUNT(*)
FROM (
    SELECT id
    FROM [table] t
    GROUP BY id
    HAVING COUNT(CASE WHEN t1.type <> 'su1' THEN 1 END) = 0
) t;

And for that, you would need the following index

[table] (id) INCLUDE (type)

CodePudding user response：

Maybe I'm missing something but can't you do a COUNT DISTINCT after filtering out everything except the items with type='su1'. In that case we'd just have:

WITH tbl (tableid, id, type) AS (
    select * from values (1,1,'su1'), (2,2,'su1'), (3,2,'su2'), (4,3,'su3'), (5,4,'su1')
)
SELECT COUNT(DISTINCT id) FROM tbl
WHERE id NOT IN (SELECT id FROM tbl WHERE type != 'su1')
-- 2

Here is the SQL Fiddle, I removed the COUNT DISTINCT so you can see the individual results and it's easier to check here.