Home > Enterprise >  How to assign groups to another group in sql?
How to assign groups to another group in sql?

Time:04-06

I have a grouped result/table:

tenant|city|count|
1     |A   |36   |
2     |A   |50   |
1     |B   |3    |
1     |C   |6    |
2     |C   |2    |
1     |D   |1    |
2     |D   |2    |

Sum of count is 100.

As you can see a city has multiple tenants. If the sum of the count of a city is less than 5% of the total count then that city count should be added to another group named by the 'other' identifier while maintaining the tenant dimension. Resultant data should be.

tenant|city    |count|
1     |A       |36   |
2     |A       |50   |
1     |C       |6    |
2     |C       |2    |
1     |other   |4    |  --> Addition of count of B city and count of D city for tenant 1
2     |other   |2    |  --> count of D city for tenant 2

I want to produce the same result for two databases PostgreSQL and Clickhouse. Any ideas on how to do this? Even if I will have the query to produce this result in either of the DB, I think it should not be difficult to create the query for other DB too. So answer for either database is acceptable.

CodePudding user response:

You can do:

select tenant, grp as city, sum(cnt) as count
from (
  select *,
    case when sum(cnt) over(partition by city) >= 5 
         then city else 'Other' end as grp
  from t
) x
group by grp, tenant
order by grp, tenant

Result:

 tenant  city   count 
 ------- ------ ----- 
 1       A      36    
 2       A      50    
 1       C      6     
 2       C      2     
 1       Other  4     
 2       Other  2     

See example at DB Fiddle.

CodePudding user response:

Using a CTE, this query has to scan the table only once:

WITH tc AS (
   SELECT tenant, city, sum(count)::int AS total
        , sum(sum(count)) OVER (PARTITION BY city) AS city_count
   FROM   tbl
   GROUP  BY 1, 2
   )
SELECT tenant, city, total
FROM   tc
WHERE  city_count >= 5

UNION ALL
SELECT tenant, 'other', sum(total)
FROM   tc
WHERE  city_count < 5
GROUP  BY 1

db<>fiddle here

Not sure whether ClickHouse supports window functions over aggregate functions like Postgres does. See:

Their SQL reference does not address that explicitly. I does say this, though:

expressions involving window functions, e.g. (count(*) over ()) / 2)
not supported, wrap in a subquery (feature request)

  • Related