Finding created on dates for duplicates in SQL-CodePudding

I have one table of contact records and I'm trying to get the count of duplicate records that were created on each date. I'm not looking to include the original instance in the count. I'm using SQL Server. Here's an example table

| email         | created_on |    
| ------------- | ---------- | 
| [email protected] | 08-16-22   |    
| [email protected] | 08-16-22   |
| [email protected] | 08-16-22   |    
| [email protected] | 07-12-22   | 
| [email protected] | 07-12-22   |    
| [email protected] | 06-08-22   |    
| [email protected] | 06-08-22   |    
| [email protected] | 04-21-22   |

And I'm expecting to return

| created_on | dupe_count | 
| ---------- | ---------- | 
| 08-16-22   | 3          | 
| 07-12-22   | 2          | 
| 06-08-22   | 0          | 
| 04-21-22   | 0          |

CodePudding user response：

I created a sub table based on email and created date row number. Then, you query that, and ignore the date when the email first was created (row number 1). Works perfectly fine in this case.

Entire code:

Create table #Temp
(
    email           varchar(50),
    dateCreated     date
)

insert into #Temp
(email, dateCreated) values
('[email protected]',  '08-16-22'),   
('[email protected]',  '08-16-22'),
('[email protected]',  '08-16-22'),     
('[email protected]',  '07-12-22'), 
('[email protected]',  '07-12-22'),   
('[email protected]',  '06-08-22'),   
('[email protected]',  '06-08-22'),   
('[email protected]',  '04-21-22')   

select datecreated, sum(case when r = 1 then 0 else 1 end) as duplicates
from 
(
    Select email, datecreated, ROW_NUMBER() over(partition by email 
    order by datecreated) as r from #Temp
) b
group by dateCreated
drop table #Temp

Output:

datecreated duplicates
2022-04-21  0
2022-06-08  0
2022-07-12  2
2022-08-16  3

CodePudding user response：

You could count the the distinct email addresses that occur after the first occurence:

with d as (
    select *, Min(created_on) over(partition by email) d1
    from t
)
select created_on, Count(distinct Iif(created_on > d1, email, null)) Dupe_count
from d
group by created_on;

See Example Fiddle

CodePudding user response：

You can calculate the difference between total count of emails for every day and the count of unique emails for the day:

    select created_on, 
count(email) - count(distinct email) as dupe_count
    from cte
    group by created_on

It seems I have misunderstood your request, and you wanted to consider previous created_on dates' too:

   ct as (
select created_on,
(select case when (select count(*)
    from cte t2
 where t1.email = t2.email and t1.created_on > t2.created_on
 ) > 0 then email end) as c
 from cte t1)
    
 select created_on,
  count(distinct c) as dupe_count
  from ct
    group by created_on
    order by 1

It seems that in oracle it is also possible to aggregate it using one query:

select created_on,
count(distinct case when (select count(*)
    from cte t2
 where t1.email = t2.email and t1.created_on > t2.created_on
 ) > 0 then email end) as c
 from cte t1
 group by created_on
 order by 1