To simplify a complex query I am working on, I feel like solving this is key.
I have the following table
id | city | Item |
---|---|---|
1 | chicago | 1 |
2 | chicago | 2 |
3 | chicago | 1 |
4 | cedar | 2 |
5 | cedar | 1 |
6 | cedar | 2 |
7 | detroit | 1 |
I am trying to find the ratio of number of rows grouped by city and item to the number of rows grouped by just the items for each and every unique city-item pair.
So I would like something like this
City | Item | groupCityItemCount | groupItemCount | Ratio |
---|---|---|---|---|
chicago | 1 | 2 | 4 | 2/4 |
chicago | 2 | 1 | 3 | 1/3 |
cedar | 1 | 1 | 4 | 1/4 |
cedar | 2 | 2 | 3 | 2/3 |
detroit | 1 | 1 | 4 | 1/4 |
This is my current solution but its too slow.
Select city, item, (count(*) / (select count(*) from records t2 where t1.item=t2.item)) AS pen_ratio
From records t1
Group By city, item
Also replaced where with groupBy and having but that is also slow.
Select city, item, (count(*) / (select count(*) from records t2 group by item having t1.item=t2.item)) AS pen_ratio
From records t1
Group By city, item
(Note: I have removed column3 and column4 from the solution for smaller code)
(Edit: Typo as pointed out by xQbert and MatBailie)
CodePudding user response:
Is it slow because it's evaluating each row separately with the subquery in the select statement? It may be operating as a correlated subquery.
If that's the case it might be faster if you get the values out of a join and go from there -
Select city, t1.item, (COUNT(t1.item) / MAX(t2.it_count)) AS pen_ratio
from records t1
JOIN (SELECT item, count(item) AS it_count
FROM records
group by item) t2
ON t2.item = t1.item
GROUP BY city, t1.item
Updated some errors and included the fiddle based off the starting point from xQbert. I had to CAST as float in the fiddle, but you may not need to CAST and use the above query in yours depending on datatypes.
I believe this follows the intent of your original query.
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=d77a715175159304b9192a16ad903347
CodePudding user response:
You can approach it in two parts.
First, aggregate to the level you're interested in, as normal.
Then, use analytical functions to work out subtotals across your partitions (item, in your case).
WITH
aggregate AS
(
SELECT
city,
item,
COUNT(*) AS row_count
FROM
records
GROUP BY
city,
item
)
SELECT
city,
item,
row_count AS groupCityItemCount,
SUM(row_count) OVER (PARTITION BY item) AS groupItemCount
FROM
aggregate
Fiddle borrowed from xQbert