SQL Find duplicated names and count it for each status-CodePudding

I found 85 duplicated names in my base with

SELECT Name, COUNT(*) 
FROM Tab 
GROUP BY Name
HAVING COUNT(*) > 1
ORDER BY 2 DESC, 1;

with the result set:

Name    COUNT(*)
a   28
b   12
c   10
d   8
e   5
f   4
g   3
h   3
i   2
...

My concern is to sort these duplicated names (303 entries in total) per status

SELECT Status, COUNT(*) FROM Tab
WHERE Name IN (SELECT Name FROM Tab GROUP BY Name HAVING COUNT(*) > 1)
GROUP BY Status
ORDER by Name;

give me this result:

Status  COUNT(*)
Ended   38
Deleted 21
InUse   244

Now I would like a combinaison of both queries meaning to show the duplicated names counted by status.

Name	Ended	Deleted	InUse
a	6	2	20
b	0	0	12
c	0	8	2
d	6	1	1
e	4	0	1
f	0	3	1
g	1	1	1
h	1	2	0
i	1	0	1

If I can have an extra column as Total will be great also but my main goal is to build a query for this result set

CodePudding user response：

An example of Nathan_Sav's suggestion -

SELECT
    Name,
    SUM(Status = 'Ended') AS Ended,
    SUM(Status = 'Deleted') AS Deleted,
    SUM(Status = 'InUse') AS InUse,
    COUNT(*) AS Total
FROM Tab 
GROUP BY Name 
HAVING COUNT(*) > 1
ORDER BY InUse DESC

CodePudding user response：

You can achieve this by combining the two queries you provided and using a subquery to join the Tab table with a subquery that retrieves the duplicate names, and then group by both the Name and Status columns.

SELECT 
  sub.Name, 
  SUM(CASE WHEN sub.Status = 'Ended' THEN sub.count END) AS Ended,
  SUM(CASE WHEN sub.Status = 'Deleted' THEN sub.count END) AS Deleted,
  SUM(CASE WHEN sub.Status = 'InUse' THEN sub.count END) AS InUse,
  SUM(sub.count) as Total
FROM (
  SELECT 
    Tab.Name, 
    Tab.Status,
    COUNT(*) as count
  FROM Tab
  WHERE Tab.Name IN (
    SELECT Name FROM Tab GROUP BY Name HAVING COUNT(*) > 1
  )
  GROUP BY Tab.Name, Tab.Status
) as sub
GROUP BY sub.Name
ORDER BY sub.Name

The first subquery will select all the rows that have a duplicate name, and group them by Name, Status and count them. The outer query will then take that subquery and group it by name and status, using the SUM function with a case statement to sum up the count of each status. The last column is the sum of the count column from the first subquery

Please note that, this query will work only if you have a SQL version that supports subqueries in the FROM clause

CodePudding user response：

you can use Window function For Calculation Count and Create To Result with Pivot

This Code work to use sql server

DECLARE @cols AS NVARCHAR(MAX),@scols  AS NVARCHAR(MAX),
    @query  AS NVARCHAR(MAX)
select @cols = STUFF((SELECT distinct ','   QUOTENAME([Status])  from Tab
                 FOR XML PATH(''), TYPE  ).value('.', 'NVARCHAR(MAX)')   ,1,1,'' )
 
 
select @scols = STUFF((SELECT distinct ',ISNULL('   QUOTENAME([Status])  ',''0'') as '   QUOTENAME([Status])
                 from Tab  FOR XML PATH(''), TYPE  ).value('.', 'NVARCHAR(MAX)')  ,1,1,'')
 

 set @query = 'SELECT   Name,' @scols ' from 
             (
               
                select Name,[Status],CounStatus,Total
                from (
                select *,count(*) over (partition by Name,[Status]) as CounStatus
                ,count(*) over (partition by Name ) as Total
                ,ROW_NUMBER() over (partition by Name,[Status] order by [Status]) as rw
                from  Tab
                )d
                where  rw=1

            ) x
            pivot 
            (
               sum( CounStatus)  for [Status] in ('   @cols   ')
            ) p 
            
            '

execute(@query)

You can to insert the basic data with the following codes

create table Tab(Name nvarchar(100),Status nvarchar(100))
insert into tab (Name,Status)
select 'a' as Name,'Ended' as Status
go 6

insert into tab (Name,Status)
select 'a','Deleted'
go 2

insert into tab (Name,Status)
select 'a','InUse'
go 20

insert into tab (Name,Status)
select 'b','InUse'
go 12


insert into tab (Name,Status)
select 'c','Deleted'
go 8

insert into tab (Name,Status)
select 'c','InUse'
go 2

select Name,[Status],CounStatus,Total
from (
select *,count(*) over (partition by Name,[Status]) as CounStatus
,count(*) over (partition by Name ) as Total
,ROW_NUMBER() over (partition by Name,[Status] order by [Status]) as rw
from  Tab
)d
where  rw=1