I am using SQL Server and I have the following data, where "condition" is the condition of "ID1" and is either 0 or 1. Obviously my real example is a lot more complex, I am joining two tables in each statement with another WHERE condition (I need to match export date with sold date) etc., but hopefully this simplified example works.
ID1 | ID2 | condition | col1 |
---|---|---|---|
123 | 1 | 1 | 456 |
123 | 2 | 1 | 5445 |
456 | 3 | 0 | 5646 |
456 | 3 | 0 | 1524 |
789 | 1 | 1 | 145 |
col1 on which I want to perform aggregate functions (AVG, MIN, MAX)
And a table of ID1s I want to filter, the ID1s not present in this table I want to discard
ID1 | condition |
---|---|
123 | 1 |
456 | 0 |
789 | 1 |
And I want to do the following:
for each ID2, I want to have the count of rows grouped by ID2 (how many times the ID2 shows in the whole dataset), the count of rows grouped by ID2 but only when ID1's 'condition' column is equal to 1 (how many times ID2 shows in the dataset where "condition = 1" is applied), and then filtered so that only ID1s in the list of wanted ID1s appear.
The result of my query should be the following:
ID2 | Count of all rows | count of rows with condition = 1 | col1_MAX | col1_MIN | ... |
---|---|---|---|---|---|
1 | 2 | 2 | |||
2 | 1 | 1 | |||
3 | 2 | 0 |
Currently, I solved this using a table from which I calculate the row count for all and then I filter the ID1s and then join the tables to get the result:
WITH CountOfAllRows AS
(
SELECT
(ID2),
COUNT(ID2) AS count_of_all_rows
FROM
MyTable
GROUP BY
ID2
)
SELECT
ID2,
MIN(col1) AS col1_min,
COUNT(ID2) AS count_of_rows_with_condition,
AVG(CountOfAllRows.count_of_all_rows) AS count_of_all_rows
FROM
MyTable
JOIN
CountOfAllRows ON MyTable.ID2 = CountOfAllRows.ID2
WHERE
MyTable.ID1 IN (SELECT [ID1]
FROM ListOfID1sIWant
WHERE condition = 1)
GROUP BY
MyTable.ID2
But it seems rather clunky, despite the fact it works. I need to use AVG() to get the count_of_all_rows (an average of the same set of numbers is that number) because it is a GROUP BY statement, and I also need to have a separate table using WITH.
I tried window functions (OVER, PARTITION BY) but it seems like I either found the best solution or I am not experienced enough to come up with a better one.
CodePudding user response:
Use this:
WITH CountOfAllRows AS
(
SELECT
ID2,
COUNT(ID2) AS count_of_all_rows
FROM
MyTable
GROUP BY
ID2
)
SELECT
CountOfAllRows.ID2,
MIN(col1) AS col1_min,
COUNT(ID2) AS count_of_rows_with_condition,
AVG(CountOfAllRows.count_of_all_rows) AS count_of_all_rows_2
FROM
MyTable
JOIN
CountOfAllRows ON MyTable.ID2 = CountOfAllRows.ID2
WHERE
MyTable.ID1 IN (SELECT [ID1]
FROM ListOfID1sIWant
WHERE condition = 1)
GROUP BY
MyTable.ID2
CodePudding user response:
I suppose that you need a query like this
SELECT
t.id2,
COUNT(*) AS count_id2_all_rows,
COUNT(CASE WHEN t.condition = 1 THEN 1 END) AS count_condition_1,
MIN(t.col1) AS col1_min,
MAX(t.col1) AS col1_max
FROM ListOfID1sIWant l
JOIN mytable t ON l.id1 = t.id1
GROUP BY t.id2
Output
id2 | count_id2_all_rows | count_condition_1 | col1_min | col1_max |
---|---|---|---|---|
1 | 2 | 2 | 145 | 456 |
2 | 1 | 1 | 5445 | 5445 |
3 | 2 | 0 | 1524 | 5646 |
CodePudding user response:
Try the following:
Select M.ID2,
COUNT(*) AS [Count of all rows],
COUNT(Case When D.condition = 1 Then 1 End) As [count of rows with condition = 1],
MAX(M.col1) as [col1_MAX],
MIN(M.col1) as [col1_MIN]
From
MyTable M
Join
ListOfID1sIWant D
On M.ID1 = D.ID1
Group By M.ID2
See a demo from db<>fiddle.