I have a table:
personId | Date | location |
---|---|---|
abc123 | 15-09-2022 | London |
abc123 | 15-09-2022 | Nottingham |
efg321 | 12-09-2022 | Leeds |
abc123 | 13-09-2022 | Birmingham |
I want to select and return the duplicate rows based on Date
and location
columns, for example, in the above table: personId 'abc123' is present at location both 'London' and 'Nottingham' on the same date, so I would like to return these rows.
I have tried this query:
SELECT personId, Date FROM sampleTable GROUP BY personId, Date HAVING COUNT(*) > 1
But it gives me the count. I want the rows with all three columns. Expected result:
personId | Date | location |
---|---|---|
abc123 | 15-09-2022 | London |
abc123 | 15-09-2022 | Nottingham |
Can anyone please help me with this? Thanks
CodePudding user response:
Try something like this:
SELECT
sampleTable.*
FROM
sampleTable
INNER JOIN -- acts as a filter here
(
SELECT
personId,
Date
FROM
sampleTable
GROUP BY
personId,
Date
HAVING
COUNT(*) > 1
) problemTable
ON sampleTable.personId = problemTable.personId
AND sampleTable.Date = problemTable.Date
ORDER BY
sampleTable.personId,
sampleTable.Date,
sampleTable.location
;
The derived problemTable
calculates personId/Date
combos that have multiple sampleTable
rows. INNER JOINing sampleTable
with problemTable
, by nature of an INNER JOIN, returns an abridged version of sampleTable
: one that only contains combos found within problemTable
as well—and those are the ones you care about!
Using INNER JOIN as a filter mechanism is a common theme in SQL, so keep it in the back of your mind.
CodePudding user response:
Its pretty easy using window functions.
Inner SQL returns same table with extra col that marks duplicate rows. Then outer sql filters rows that has duplicate
inner sql result
personid date location check
abc123 13-09-2022 Birmingham 1
abc123 15-09-2022 London 2
abc123 15-09-2022 Nottingham 2
efg321 12-09-2022 Leeds 1
final
personid date location check
abc123 15-09-2022 London 2
abc123 15-09-2022 Nottingham 2
SQL
WITH temp AS (
SELECT
personid,
datecol,
location,
COUNT( personid ) OVER (PARTITION BY personid, datecol) AS check
FROM sampletable
)
SELECT *
FROM temp
WHERE check > 1