DB fiddle: https://dbfiddle.uk/QLKIuW1G
For a given table, I have grouped by two columns and calculated the average of a third column. However, I would like to only get rows that have a value for all possible values in the category column (these are 1
and 2
). How can this result be achieved?
Example table:
| name | category | number |
| -------- | -------- | ------ |
| jack | 1 | 12.30 |
| jack | 1 | 12.50 |
| jack | 2 | 13.35 |
| jack | 2 | 13.35 |
| jack | 2 | 13.35 |
| james | 1 | 18.76 |
| james | 1 | 20.38 |
| kate | 1 | 22.14 |
| kate | 1 | 22.18 |
| kate | 2 | 21.80 |
| kate | 2 | 22.00 |
Current status:
SELECT name, category, AVG(number) AS average_number
FROM dummy_table
GROUP BY name, category
-- Gives:
| name | category | average_number |
| -------- | -------- | -------------- |
| jack | 1 | 12.40 |
| jack | 2 | 13.35 |
| james | 1 | 19.57 |
| kate | 1 | 22.16 |
| kate | 2 | 21.90 |
Desired result:
-- `James` is dropped because he only has values for category 1, and not category 2.
| name | category | average_number |
| -------- | -------- | -------------- |
| jack | 1 | 12.40 |
| jack | 2 | 13.35 |
| kate | 1 | 22.16 |
| kate | 2 | 21.90 |
DDL (from fiddle, different values):
Create Table Emps(person_name VARCHAR(50), category INT, number DECIMAL);
Insert Into Emps Values
('steve',1,15.30),
('steve',1,14.30),
('steve',2,14.30),
('steve',2,13.30),
('john',1,18.76),
('john',1,17.36),
('john',1,18.42),
('angela',1,23.84),
('angela',1,24.48),
('angela',2,22.00),
('angela',2,22.94);
CodePudding user response:
My first naive solution would be to include a WHERE-clause that checks if there are records in the table for the same name
but with different category
values. That can be done with a subquery.
Something like this:
SELECT name, category, AVG(number) AS average_number
FROM dummy_table
WHERE name IN (SELECT name
FROM dummy_table
WHERE category IN (1, 2)
GROUP BY name
HAVING COUNT(DISTINCT category) = 2)
GROUP BY name, category
I included an additional check (WHERE-clause) in the subquery to only regard categories 1 and 2. You could omit that inner WHERE-clause if there are only 2 categories.
Perhaps there are more elegant solutions, so I will keep an eye on the other answers in this post. ;)
CodePudding user response:
Try to count distinct categories in a subquery and filter names by it:
SELECT name, category, AVG(number) AS average_number
FROM dummy_table
WHERE name IN (SELECT name FROM dummy_table GROUP BY name HAVING COUNT(DISTINCT category) = 2)
GROUP BY name, category