PostgreSQL how to GROUP BY single field from returned table-CodePudding

So I have complicated query, to simplify let it be like

SELECT
    t.*,
    SUM(a.hours) AS spent_hours
FROM (
    SELECT
        person.id,
        person.name,
        person.age,
        SUM(contacts.id) AS contact_count 
    FROM
        person
        JOIN contacts ON contacts.person_id = person.id
) AS t
JOIN activities AS a ON a.person_id = t.id
GROUP BY t.id

Such query works fine in MySQL, but Postgres needs to know that GROUP BY field is unique, and despite it actually is, in this case I need to GROUP BY all returned fields from returned t table. I can do that, but I don't believe that will work efficiently with big data. I can't JOIN with activities directly in first query, as person can have several contacts which will lead query counting hours of activity several time for every joined contact.

Is there a Postgres way to make this query work? Maybe force to treat Postgres t.id as unique or some other solution that will make same in Postgres way?

CodePudding user response：

This query will not work on both database system, there is an aggregate function in the inner query but you are not grouping it(unless you use window functions). Of course there is a special case for MySQL, you can use it with disabling "sql_mode=only_full_group_by". So, MySQL allows this usage because of it' s database engine parameter, but you cannot do that in PostgreSQL.

CodePudding user response：

I knew MySQL allowed indeterminate grouping, but I honestly never knew how it implemented it... it always seemed imprecise to me, conceptually.

So depending on what that means (I'm too lazy to look it up), you might need one of two possible solutions, or maybe a third.

If you intent is to see all rows (perform the aggregate function but not consolidate/group rows), then you want a windowing function, invoked by partition by. Here is a really dumbed down version in your query:

SELECT
  t.*,
  SUM (a.hours) over (partition by t.id) AS spent_hours
FROM t
JOIN activities AS a ON a.person_id = t.id

This means you want all records in table t, not one record per t.id. But each row will also contain a sum of the hours for all values that value of id.

For example the sum column would look like this:

Name    Hours   Sum Hours
-----   -----   ---------
Smith      20       120
Jones      30        30
Smith     100       120

Whereas a group by would have had Smith once and could not have displayed the hours column in detail.

If you really did only want one row per t.id, then Postgres will require you to tell it how to determine which row. In the example above for Smith, do you want to see the 20 or the 100?
There is another possibility, but I think I'll let you reply first. My gut tells me option 1 is what you're after and you want the analytic function.