I have one table of contact records and I'm trying to get the count of duplicate records that were created on each date. I'm not looking to include the original instance in the count. I'm using SQL Server. Here's an example table
| email | created_on |
| ------------- | ---------- |
| [email protected] | 08-16-22 |
| [email protected] | 08-16-22 |
| [email protected] | 08-16-22 |
| [email protected] | 07-12-22 |
| [email protected] | 07-12-22 |
| [email protected] | 06-08-22 |
| [email protected] | 06-08-22 |
| [email protected] | 04-21-22 |
And I'm expecting to return
| created_on | dupe_count |
| ---------- | ---------- |
| 08-16-22 | 3 |
| 07-12-22 | 2 |
| 06-08-22 | 0 |
| 04-21-22 | 0 |
CodePudding user response:
I created a sub table based on email and created date row number. Then, you query that, and ignore the date when the email first was created (row number 1). Works perfectly fine in this case.
Entire code:
Create table #Temp
(
email varchar(50),
dateCreated date
)
insert into #Temp
(email, dateCreated) values
('[email protected]', '08-16-22'),
('[email protected]', '08-16-22'),
('[email protected]', '08-16-22'),
('[email protected]', '07-12-22'),
('[email protected]', '07-12-22'),
('[email protected]', '06-08-22'),
('[email protected]', '06-08-22'),
('[email protected]', '04-21-22')
select datecreated, sum(case when r = 1 then 0 else 1 end) as duplicates
from
(
Select email, datecreated, ROW_NUMBER() over(partition by email
order by datecreated) as r from #Temp
) b
group by dateCreated
drop table #Temp
Output:
datecreated duplicates
2022-04-21 0
2022-06-08 0
2022-07-12 2
2022-08-16 3
CodePudding user response:
You could count the the distinct email addresses that occur after the first occurence:
with d as (
select *, Min(created_on) over(partition by email) d1
from t
)
select created_on, Count(distinct Iif(created_on > d1, email, null)) Dupe_count
from d
group by created_on;
See Example Fiddle
CodePudding user response:
You can calculate the difference between total count of emails for every day and the count of unique emails for the day:
select created_on,
count(email) - count(distinct email) as dupe_count
from cte
group by created_on
It seems I have misunderstood your request, and you wanted to consider previous created_on dates' too:
ct as (
select created_on,
(select case when (select count(*)
from cte t2
where t1.email = t2.email and t1.created_on > t2.created_on
) > 0 then email end) as c
from cte t1)
select created_on,
count(distinct c) as dupe_count
from ct
group by created_on
order by 1
It seems that in oracle it is also possible to aggregate it using one query:
select created_on,
count(distinct case when (select count(*)
from cte t2
where t1.email = t2.email and t1.created_on > t2.created_on
) > 0 then email end) as c
from cte t1
group by created_on
order by 1