This might be difficult to explain. But Im trying to write a redshift sql query where I have want the count of organizations that fall into different market buckets. There are 50 markets. For example company x can be only be found in 1 market and company y can be found in 3 markets. I want to preface that I have over 10,000 companies to fit into these buckets. So ideally it would be more like, hypothetically 500 companies are found in 3 markets or 7 companies are found in 50 markets.
The table would like
Market Bucket | Org Count |
---|---|
1 Markets | 3 |
2 Markets | 1 |
3 Markets | 0 |
select count(distinct case when enterprise_account = true and (market_name then organization_id end) as "1 Market" from organization_facts
I was trying to formulate the query from above but I got confused on how to effectively formulate the query
Organization Facts
Market Name | Org ID | Org Name |
---|---|---|
New York | 15683 | Company x |
Orlando | 38478 | Company y |
Twin Cities | 2738 | Company z |
Twin Cities | 15683 | Company x |
Detroit | 99 | Company xy |
CodePudding user response:
You would need a sub-query that retrieves the number of markets per company, and an outer query that summarises into a count of markets.
Something like:
with markets as (
select
org_name,
count(distinct market_name) as market_count
from organization_facts
)
select
market_count,
count(*) as org_count
from markets
group by market_count
order by market_count
CodePudding user response:
If I follow you correctly, you can do this with two levels of aggregation. Assuming that org_id
represents a company in your dataset:
select cnt_markets, count(*) cnt_org_id
from (select count(*) cnt_markets from organization_facts group by org_id) t
group by cnt_markets
The subquery counts the number of markets per company. I assumed no duplicate (ord_id, market_name)
tuples in the table ; if that's not the case, then you need count(distinct market_name)
instead of count(*)
in that spot.
Then, the outer query just counts how many times each market count occurs in the subquery, which yields the result that you want.
Note that I left apart the enterprise_account
column ,that appears in your query but not in your data.