Home > Software design >  Optimal SQL to perform multiple aggregate functions with different group by fields
Optimal SQL to perform multiple aggregate functions with different group by fields

Time:12-09

To simplify a complex query I am working on, I feel like solving this is key.

I have the following table

id city Item
1 chicago 1
2 chicago 2
3 chicago 1
4 cedar 2
5 cedar 1
6 cedar 2
7 detroit 1

I am trying to find the ratio of number of rows grouped by city and item to the number of rows grouped by just the items for each and every unique city-item pair.

So I would like something like this

City Item groupCityItemCount groupItemCount Ratio
chicago 1 2 4 2/4
chicago 2 1 3 1/3
cedar 1 1 4 1/4
cedar 2 2 3 2/3
detroit 1 1 4 1/4

This is my current solution but its too slow.

Select city, item, (count(*) / (select count(*) from records t2 where t1.item=t2.item)) AS pen_ratio
From records t1
Group By city, item

Also replaced where with groupBy and having but that is also slow.

Select city, item, (count(*) / (select count(*) from records t2 group by item having t1.item=t2.item)) AS pen_ratio
From records t1
Group By city, item

(Note: I have removed column3 and column4 from the solution for smaller code)

(Edit: Typo as pointed out by xQbert and MatBailie)

CodePudding user response:

Is it slow because it's evaluating each row separately with the subquery in the select statement? It may be operating as a correlated subquery.

If that's the case it might be faster if you get the values out of a join and go from there -

Select city, t1.item, (COUNT(t1.item) / MAX(t2.it_count)) AS pen_ratio
from records t1
JOIN (SELECT item, count(item) AS it_count
      FROM records
      group by item) t2
        ON t2.item = t1.item
GROUP BY city, t1.item

Updated some errors and included the fiddle based off the starting point from xQbert. I had to CAST as float in the fiddle, but you may not need to CAST and use the above query in yours depending on datatypes.

I believe this follows the intent of your original query.

https://dbfiddle.uk/?rdbms=postgres_13&fiddle=d77a715175159304b9192a16ad903347

CodePudding user response:

You can approach it in two parts.

First, aggregate to the level you're interested in, as normal.

Then, use analytical functions to work out subtotals across your partitions (item, in your case).

WITH
  aggregate AS
(
  SELECT
    city,
    item,
    COUNT(*) AS row_count
  FROM
    records
  GROUP BY
    city,
    item
)
SELECT
  city,
  item,
  row_count                               AS groupCityItemCount,
  SUM(row_count) OVER (PARTITION BY item) AS groupItemCount
FROM
  aggregate

Fiddle borrowed from xQbert

  • Related