I have a grouped result/table:
tenant|city|count|
1 |A |36 |
2 |A |50 |
1 |B |3 |
1 |C |6 |
2 |C |2 |
1 |D |1 |
2 |D |2 |
Sum of count is 100.
As you can see a city has multiple tenants. If the sum of the count of a city is less than 5% of the total count then that city count should be added to another group named by the 'other' identifier while maintaining the tenant dimension. Resultant data should be.
tenant|city |count|
1 |A |36 |
2 |A |50 |
1 |C |6 |
2 |C |2 |
1 |other |4 | --> Addition of count of B city and count of D city for tenant 1
2 |other |2 | --> count of D city for tenant 2
I want to produce the same result for two databases PostgreSQL and Clickhouse. Any ideas on how to do this? Even if I will have the query to produce this result in either of the DB, I think it should not be difficult to create the query for other DB too. So answer for either database is acceptable.
CodePudding user response:
You can do:
select tenant, grp as city, sum(cnt) as count
from (
select *,
case when sum(cnt) over(partition by city) >= 5
then city else 'Other' end as grp
from t
) x
group by grp, tenant
order by grp, tenant
Result:
tenant city count
------- ------ -----
1 A 36
2 A 50
1 C 6
2 C 2
1 Other 4
2 Other 2
See example at DB Fiddle.
CodePudding user response:
Using a CTE, this query has to scan the table only once:
WITH tc AS (
SELECT tenant, city, sum(count)::int AS total
, sum(sum(count)) OVER (PARTITION BY city) AS city_count
FROM tbl
GROUP BY 1, 2
)
SELECT tenant, city, total
FROM tc
WHERE city_count >= 5
UNION ALL
SELECT tenant, 'other', sum(total)
FROM tc
WHERE city_count < 5
GROUP BY 1
db<>fiddle here
Not sure whether ClickHouse supports window functions over aggregate functions like Postgres does. See:
Their SQL reference does not address that explicitly. I does say this, though:
expressions involving window functions, e.g. (count(*) over ()) / 2)
not supported, wrap in a subquery (feature request)