I have a table in SQL Server called Data
:
ColA | ColB | ColC |
---|---|---|
John | Smith | 1234 |
David | Smith | 3456 |
Scott | Clark | 198 |
Brian | Williams | 155 |
Jeremy | Williams | 765 |
I need rows from Data
, but ONLY the rows where the values in ColB
appears more than once.
For example, I would want my query to return rows 1 and 2 since 'Smith' is repeated in ColB
, and rows 4 and 5 to return since 'Williams' is repeated in ColB
. Row 3 would NOT be returned since 'Clark' only appears once.
How would I go about writing this query? I am not too sure where to start. I have a feeling I will be using COUNT
but not too sure of what else.
CodePudding user response:
Just another option is using WITH TIES
in concert with sign()
and the window function sum() over()
Example
Select top 1 with ties *
From Data
Order by sign(sum(1) over (partition by ColB)-1) desc
Results
ColA ColB ColC
John Smith 1234
David Smith 3456
Brian Williams 155
Jeremy Williams 76
CodePudding user response:
The key here is to create a subquery that returns all ColB
that appear more than once using HAVING
after the group by, then selecting only the rows that those values appear in
with cte as (
select ColB as [duplicate]
from Data
group by ColB
having count(*) > 1
)
select * from Data where ColB in ( select duplicate from cte )
CodePudding user response:
You may use the EXISTS
operator with correlated subquery as the following:
SELECT ColA, ColB, ColC
FROM data T
WHERE EXISTS(SELECT 1 FROM data D
WHERE D.ColB = T.ColB AND
(D.ColA <> T.ColA OR D.ColC <> T.ColC)
)
See a demo.
CodePudding user response:
WITH cte AS (
SELECT
colb,
COUNT(*) AS dup
FROM
sales
GROUP BY
colb
)
SELECT
cola
FROM
data
INNER JOIN cte ON data.colb = cte.colb
WHERE
cte.colb > 1;