Home > OS >  SQL: filter "after" aggregation
SQL: filter "after" aggregation

Time:11-17

I have a set of normalized tables with one-to-many and many-to-many relationships.

For most queries I am really interested in a highly denormalized view which shows:

  1. Entity-per-row for one specific table
  2. Aggregated information about related entities from other tables

Here is a DB fiddle, or consider the following DDL:

CREATE TABLE people(
    name TEXT PRIMARY KEY
);
CREATE TABLE fruit(
    name TEXT PRIMARY KEY
);
CREATE TABLE likes(
    person TEXT REFERENCES people (name),
    fruit TEXT REFERENCES fruit (name),
    PRIMARY KEY (person, fruit)
);

INSERT INTO people VALUES ('Joe'), ('Brandon'), ('Miranda');

INSERT INTO fruit VALUES ('Banana'), ('Apple'), ('Orange'), ('Strawberry');

INSERT INTO likes VALUES ('Joe', 'Banana'), ('Joe', 'Apple'), ('Joe', 'Orange');
INSERT INTO likes VALUES ('Brandon', 'Apple');
INSERT INTO likes VALUES ('Miranda', 'Apple'), ('Miranda', 'Banana'), ('Miranda', 'Orange'), ('Miranda', 'Strawberry');

I want to show my users a filtered list of who likes which fruit so I construct the following query:

SELECT person, string_agg(DISTINCT fruit, ', ' ORDER BY fruit) AS fruit
FROM likes
GROUP BY person;

This shows me a table with each person's name and a list of fruit they like. So far, so good. But now if I filter the list:

SELECT person, string_agg(DISTINCT fruit, ', ' ORDER BY fruit) AS fruit
FROM likes
WHERE fruit='Orange'
GROUP BY person;

I get a list of people who like oranges, without the information about what other fruit those same people like.

Now, I understand why this happens: WHERE takes precedence over SELECT. What I would like to know is, is there a better way? How can I subset by fruit='Orange', but still get a list of all fruit that a person likes in one row?

Do I have to use a CTE or an INTERSECT? The below gets me what I want, but I am afraid that it can get costly with multiple tables and multiple joins in each subquery because my real problem is obviously much larger, and with more tables like that:

WITH select_people AS (
    SELECT person
    FROM likes
    WHERE fruit = 'Orange'
)
SELECT person, string_agg(DISTINCT fruit, ', ' ORDER BY fruit) AS fruit
FROM likes
WHERE person in (select person from select_people)
GROUP BY person;

Thanks!

CodePudding user response:

You can filter on the results of aggregates by using HAVING, or by wrapping the query into a subquery and then using WHERE in the outer query. The problem with HAVING is that you can't refer to the aggregate by alias, so you have to repeat the expression, which is ugly and fragile as they can get out of sync, so I would usually use the 2nd method instead. But I changed your string_agg to an array_agg, as there is no clean way to do the contains expression @> on the string_agg.

select * from (
  SELECT person, array_agg(DISTINCT fruit ORDER BY fruit) AS fruit
  FROM likes
  GROUP BY person
) foo
WHERE fruit @> ARRAY['Orange']

If few people like Orange and you have the right indexes, then the self-join method already given should be faster than this, but I think this method is more flexible and easier to understand.

CodePudding user response:

You could self join likes with an inner join to limit this to people that have Orange records:

SELECT likes.person, string_agg(DISTINCT likes.fruit, ', ' ORDER BY likes.fruit) AS fruit
FROM likes
inner join likes orange_filter
  on likes.person = orange_filter.person
  and orange_filter.fruit = 'Orange'
GROUP BY likes.person;

Fiddle Link

  • Related