Is it possible to group sql join results by a common column? Please see the description-CodePudding

I'm super rusty with databases and haven't got the chance to work too much with them in the past 6-8 years, so there are some things that are uncertain for me.

Let's assume I have a simple bi-directional one-to-many situation:

Parent
------
id

and

Child
-----
id
age
parent_id

and I want to perform a query which will return all the parents together with the matching children that follow some rules.

For example let's assume we have

Parent: [Child (age 5), Child (age 8), Child (age 10)]
Parent: [Child (age 8), Child (age 15)]
Parent: [Child (age 20)]

and I want to obtain the parents together with their children filtered for example by age (less than 10).

This situations screams a join with a where statement - something like:

SELECT Parent.id, Child.id, Child.age
    FROM Parent LEFT JOIN Child ON Parent.id = Child.parent_id 
    WHERE Child.age < 10;

And this will work fine and return 3 entries - two rows for parent1 with the children that match the rule and one for parent two and its children.

However, my question is - can I group it somehow based on Parent Id? Can I use LIMIT for limiting the number of Parents? (For example in this case limit 2 will return all results - because there are two parents that match the criteria)

Background context - I'm trying to implement this with Java, Hibernate and a Postgres db. I'm using OneToMany/ManyToOne relationships and CriteriaQuery in order to interogate the DB.

CodePudding user response：

Yes, you can use a GROUP BY clause on a SQL join result the same as on a select from a single table. However, as in all uses of GROUP BY, every column in the output that is not also in the group clause needs to be the result of some aggregate function. For example, you could write

SELECT Parent.id, min(child_age) 
    FROM Parent LEFT JOIN Child ON Parent.id = Child.parent_id 
    WHERE Child.age < 10
    GROUP BY Parent.id; -- Add LIMIT 2 here if you want

To get each parent with a young child and the age of their youngest child.

If all you want is an arbitrary value from the child columns, you can instead use DISTINCT ON(Parent.Id). For example,

SELECT DISTINCT ON (Parent.id) parent.id, child.id as child_id, child.age as child_age
    FROM Parent LEFT JOIN Child ON Parent.id = Child.parent_id 
    Limit 2;

Will give you a single row per parent id with some child id and child age for that parent.

If you don't actually need the child columns to be returned, just used in the join, then just don't include them in the select.

CodePudding user response：

If I understand you, you want to limit two parents, but this could mean more than two rows because a parent can have more than one child under the age of 10? If so, the count analytic function can start numbering parents, and you can set the limit by taking the resulting count <= 2:

with qualifying_parents as (
  select
    c.id, c.age, c.parent_id,
    count (parent_id) over (partition by parent_id) as cnt
  from
    child c
    join parent p on c.parent_id = p.id
  where
    c.age < 10
)
select id as child_id, age, parent_id
from qualifying_parents
where cnt <= 2

This would give you three rows from your example but only two distinct parent ids.

Also, in this example, the parent table is unnecessary, but I assume it has more attributes than just ID that you intend to use in your query.

Alternatively, you can leverage Postgres` most excellent array functionality and then invoke a simple limit clause at the end of your query:

select
  parent_id, array_agg(age) as children_ages
from child c
where age < 10
group by parent_id
limit 2

(I omitted parent there, but you can put it back obviously).