Finding rows that only have duplicate values in a specific column-CodePudding

I have a table in SQL Server called Data:

ColA	ColB	ColC
John	Smith	1234
David	Smith	3456
Scott	Clark	198
Brian	Williams	155
Jeremy	Williams	765

I need rows from Data, but ONLY the rows where the values in ColB appears more than once.

For example, I would want my query to return rows 1 and 2 since 'Smith' is repeated in ColB, and rows 4 and 5 to return since 'Williams' is repeated in ColB. Row 3 would NOT be returned since 'Clark' only appears once.

How would I go about writing this query? I am not too sure where to start. I have a feeling I will be using COUNT but not too sure of what else.

CodePudding user response：

Just another option is using WITH TIES in concert with sign() and the window function sum() over()

Example

Select top 1 with ties * 
 From  Data
 Order by sign(sum(1) over (partition by ColB)-1) desc

Results

ColA    ColB        ColC
John    Smith       1234
David   Smith       3456
Brian   Williams    155
Jeremy  Williams    76

CodePudding user response：

The key here is to create a subquery that returns all ColB that appear more than once using HAVING after the group by, then selecting only the rows that those values appear in

with cte as ( 
 select ColB as [duplicate]
 from Data
 group by ColB
 having count(*) > 1
)

select * from Data where ColB in ( select duplicate from cte )

CodePudding user response：

You may use the EXISTS operator with correlated subquery as the following:

SELECT ColA, ColB, ColC
FROM data T
WHERE EXISTS(SELECT 1 FROM data D 
             WHERE D.ColB = T.ColB AND 
                   (D.ColA <> T.ColA OR D.ColC <> T.ColC)
            )

See a demo.

CodePudding user response：

WITH cte AS (
    SELECT
        colb,
        COUNT(*) AS dup
    FROM
        sales
    GROUP BY
        colb
)
SELECT
    cola
FROM
         data
    INNER JOIN cte ON data.colb = cte.colb
WHERE
    cte.colb > 1;