Home > database >  Count the number of occurences in each bucket Redshift SQL
Count the number of occurences in each bucket Redshift SQL

Time:11-10

This might be difficult to explain. But Im trying to write a redshift sql query where I have want the count of organizations that fall into different market buckets. There are 50 markets. For example company x can be only be found in 1 market and company y can be found in 3 markets. I want to preface that I have over 10,000 companies to fit into these buckets. So ideally it would be more like, hypothetically 500 companies are found in 3 markets or 7 companies are found in 50 markets.

The table would like

Market Bucket Org Count
1 Markets 3
2 Markets 1
3 Markets 0

select count(distinct case when enterprise_account = true and (market_name then organization_id end) as "1 Market" from organization_facts

I was trying to formulate the query from above but I got confused on how to effectively formulate the query

Organization Facts

Market Name Org ID Org Name
New York 15683 Company x
Orlando 38478 Company y
Twin Cities 2738 Company z
Twin Cities 15683 Company x
Detroit 99 Company xy

CodePudding user response:

You would need a sub-query that retrieves the number of markets per company, and an outer query that summarises into a count of markets.

Something like:

with markets as (
  select
    org_name,
    count(distinct market_name) as market_count
  from organization_facts
)

select
  market_count,
  count(*) as org_count
from markets
group by market_count
order by market_count

CodePudding user response:

If I follow you correctly, you can do this with two levels of aggregation. Assuming that org_id represents a company in your dataset:

select cnt_markets, count(*) cnt_org_id
from (select count(*) cnt_markets from organization_facts group by org_id) t
group by cnt_markets

The subquery counts the number of markets per company. I assumed no duplicate (ord_id, market_name) tuples in the table ; if that's not the case, then you need count(distinct market_name) instead of count(*) in that spot.

Then, the outer query just counts how many times each market count occurs in the subquery, which yields the result that you want.

Note that I left apart the enterprise_account column ,that appears in your query but not in your data.

  • Related