Filter rows by count of two column values in SQL-CodePudding

In SQL Server, I have the following table:

CREATE TABLE x
(
    ColumnA CHAR NOT NULL
  , ColumnB CHAR NOT NULL
  , ColumnC INT NOT NULL
  , CONSTRAINT CompositeKey PRIMARY KEY (ColumnA, ColumnB)
);

With the following data:

INSERT INTO x
VALUES
 ('A', 'X', 2)
,('A', 'Y', 1)
,('B', 'X', 4)
,('B', 'Z', 2)
,('C', 'X', 9)
,('C', 'Y', 3)
,('C', 'P', 2)
,('D', 'X', 6)
,('D', 'Y', 4)
,('E', 'P', 2)

I am writing a query with the following 2 predicates:

Count(ColumnA) > 1
Count(ColumnB) > 2

MY QUERY

SELECT *
FROM
(
    SELECT *
          ,COUNT(*) OVER (PARTITION BY [ColumnA]) AS columnA_cnt
          ,COUNT(*) OVER (PARTITION BY [ColumnB]) AS columnB_cnt
    FROM @DataSource
) DS
WHERE columnA_cnt >= 2
  AND columnB_cnt >= 3

EXPECTED RESULT

A	B	C
A	X	2
A	Y	1
C	X	9
C	Y	3
D	X	6
D	Y	4

ACTUAL RESULT

A	B	C
A	X	2
A	Y	1
B	X	4
C	X	9
C	Y	3
D	X	6
D	Y	4

CodePudding user response：

This query meet the requirement with your expected result

CREATE TABLE x
(
    ColumnA CHAR NOT NULL
  , ColumnB CHAR NOT NULL
  , ColumnC INT NOT NULL
  , CONSTRAINT CompositeKey PRIMARY KEY (ColumnA, ColumnB)
);

INSERT INTO x
VALUES
 ('A', 'X', 2)
,('A', 'Y', 1)
,('B', 'X', 4)
,('B', 'Z', 2)
,('C', 'X', 9)
,('C', 'Y', 3)
,('C', 'P', 2)
,('D', 'X', 6)
,('D', 'Y', 4)
,('E', 'P', 2)

with cte as (
select * from x
where ColumnA in (select ColumnA from x group by ColumnA having count(ColumnA) >1)
and ColumnB in (select ColumnB from x group by ColumnB having count(ColumnA) >2)
)
select * from cte
where ColumnA in (select ColumnA from cte group by ColumnA having count(ColumnA) >1)
and ColumnB in (select ColumnB from cte group by ColumnB having count(ColumnA) >2)

CodePudding user response：

Will this work for you (with a second condition columnA > 1, columnB > 2) ?:

select 
  ColumnA
  ,ColumnB
  ,ColumnC
from (
  select 
    ColumnA
    ,ColumnB
    ,ColumnC
    ,COUNT(ColumnA) OVER (PARTITION BY ColumnA) AS columnA_cnt
    ,COUNT(ColumnB) OVER (PARTITION BY ColumnB) AS columnB_cnt
  from (
      SELECT *
            ,COUNT(ColumnA) OVER (PARTITION BY ColumnA) AS columnA_cnt
            ,COUNT(ColumnB) OVER (PARTITION BY ColumnB) AS columnB_cnt
      FROM x
  ) ds where columnA_cnt >= 2 and columnB_cnt >= 3
) ds2 where columnA_cnt > 1 and columnB_cnt > 2
order by columnA, columnB;

CodePudding user response：

(ColumnA, ColumnB) is the primary key. Hence

COUNT([ColumnA]) OVER (PARTITION BY [ColumnB]) AS columnA_cnt

is the same as

COUNT(*) OVER (PARTITION BY [ColumnB]) AS columnA_cnt

You look at a row's ColumnB and count how many times it occurs in the table. So a better name than columnA_cnt would probably be columnB_cnt ;-)

SELECT *
FROM
(
    SELECT *
          ,COUNT(*) OVER (PARTITION BY [ColumnA]) AS columnA_cnt
          ,COUNT(*) OVER (PARTITION BY [ColumnB]) AS columnB_cnt
    FROM @DataSource
) DS
WHERE columnA_cnt >= 2
  AND columnB_cnt >= 3