I have table with 22 columns and multiple rows.
I want to get a new table that will contain only rows for which the values in the columns V1, V3, V4, V13 appear twice in the table in correlation with each other (these repetitions)
Works in Microsoft SQL Server Management Studio.
I've tried something like this
SELECT
V1, V3, V4, V13
FROM
table
GROUP BY
V1, V3, V4, V13
HAVING
(COUNT (*) = 2)
CodePudding user response:
You can do it by first counting how many values you have for each "V1", "V3", "V4", "V13" fields with the COUNT
window function. Then join back with your original table, though keeping those rows which have count bigger than 1.
WITH cte AS (
SELECT <your_table_identifier_field(s)>,
COUNT(*) OVER(PARTITION BY V1, V3, V4, V13) as cnt
FROM table
)
SELECT *
FROM tab
INNER JOIN cte
ON tab.<your_table_identifier_field(s)> = cte.<your_table_identifier_field(s)>
AND cte.cnt > 1
EDIT: What if there's no specific row identifier? You're forced to match all column values then.
WITH cte AS (
SELECT *,
COUNT(*) OVER(PARTITION BY V1, V3, V4, V13) as cnt
FROM table
)
SELECT *
FROM tab
INNER JOIN cte
ON tab.<field1> = cte.<field1>
AND tab.<field2> = cte.<field2>
AND ...
AND cte.cnt > 1
CodePudding user response:
Why over-complicate things? Just take your original query, put it in a CTE and then join the CTE to the original table.
with cte as (
SELECT V1, V3, V4, V13 FROM dbo.table
GROUP BY V1, V3, V4, V13
HAVING COUNT (*) > 1
)
select t1.*
from dbo.table as t1
inner join cte on t1.V1 = cte.V1 and t1.V2 = cte.V2 ...
order by ...
;
I've made some changes to incorporate best practices. Added the schema name (assumed dbo) to your table name, statement terminator, ORDER BY clause (because it usually matters). I changed the HAVING clause since you asked for duplicates which means the count is > 1. Using count = 2 means you are limiting the result to those rows where there are exactly two duplicates. That is a very unusual requirement but change it if needed. Normally I would not use "*" as the column list as it is generally far better to explicitly include the columns you need. I leave that to you as well.
Finally, consider changing your schema to prevent duplicates. That will avoid a lot of future work.