SQL add two row counts based on a column and filter afterwards-CodePudding

I am using SQL Server and I have the following data, where "condition" is the condition of "ID1" and is either 0 or 1. Obviously my real example is a lot more complex, I am joining two tables in each statement with another WHERE condition (I need to match export date with sold date) etc., but hopefully this simplified example works.

ID1	ID2	condition	col1
123	1	1	456
123	2	1	5445
456	3	0	5646
456	3	0	1524
789	1	1	145

col1 on which I want to perform aggregate functions (AVG, MIN, MAX)

And a table of ID1s I want to filter, the ID1s not present in this table I want to discard

ID1	condition
123	1
456	0
789	1

And I want to do the following:

for each ID2, I want to have the count of rows grouped by ID2 (how many times the ID2 shows in the whole dataset), the count of rows grouped by ID2 but only when ID1's 'condition' column is equal to 1 (how many times ID2 shows in the dataset where "condition = 1" is applied), and then filtered so that only ID1s in the list of wanted ID1s appear.

The result of my query should be the following:

ID2	Count of all rows	count of rows with condition = 1
1	2	2
2	1	1
3	2	0

Currently, I solved this using a table from which I calculate the row count for all and then I filter the ID1s and then join the tables to get the result:

WITH CountOfAllRows AS 
(
    SELECT
        (ID2),
        COUNT(ID2) AS count_of_all_rows
    FROM
        MyTable
    GROUP BY
        ID2
)
SELECT
    ID2,
    MIN(col1) AS col1_min,
    COUNT(ID2) AS count_of_rows_with_condition,
    AVG(CountOfAllRows.count_of_all_rows) AS count_of_all_rows
FROM 
    MyTable
JOIN 
    CountOfAllRows ON MyTable.ID2 = CountOfAllRows.ID2
WHERE 
    MyTable.ID1 IN (SELECT [ID1]
                    FROM ListOfID1sIWant
                    WHERE condition = 1)
GROUP BY 
    MyTable.ID2

But it seems rather clunky, despite the fact it works. I need to use AVG() to get the count_of_all_rows (an average of the same set of numbers is that number) because it is a GROUP BY statement, and I also need to have a separate table using WITH.

I tried window functions (OVER, PARTITION BY) but it seems like I either found the best solution or I am not experienced enough to come up with a better one.

CodePudding user response：

Use this:

WITH CountOfAllRows AS 
(
    SELECT
        ID2,
        COUNT(ID2) AS count_of_all_rows
    FROM
        MyTable
    GROUP BY
        ID2
)
SELECT
    CountOfAllRows.ID2,
    MIN(col1) AS col1_min,
    COUNT(ID2) AS count_of_rows_with_condition,
    AVG(CountOfAllRows.count_of_all_rows) AS count_of_all_rows_2
FROM 
    MyTable
JOIN 
    CountOfAllRows ON MyTable.ID2 = CountOfAllRows.ID2
WHERE 
    MyTable.ID1 IN (SELECT [ID1]
                    FROM ListOfID1sIWant
                    WHERE condition = 1)
GROUP BY 
    MyTable.ID2

CodePudding user response：

I suppose that you need a query like this

SELECT
    t.id2,
    COUNT(*) AS count_id2_all_rows,
    COUNT(CASE WHEN t.condition = 1 THEN 1 END) AS count_condition_1,
    MIN(t.col1) AS col1_min,
    MAX(t.col1) AS col1_max
FROM ListOfID1sIWant l
JOIN mytable t ON l.id1 = t.id1
GROUP BY t.id2

Output

id2	count_id2_all_rows	count_condition_1	col1_min	col1_max
1	2	2	145	456
2	1	1	5445	5445
3	2	0	1524	5646

dbfiddle

CodePudding user response：

Try the following:

Select M.ID2, 
       COUNT(*) AS [Count of all rows],
       COUNT(Case When D.condition = 1 Then 1 End) As [count of rows with condition = 1],
       MAX(M.col1) as [col1_MAX],
       MIN(M.col1) as [col1_MIN]
From
MyTable M
Join
ListOfID1sIWant D
On M.ID1 = D.ID1
Group By M.ID2

See a demo from db<>fiddle.