Say I have data like so
group, city, user,
nyc_frisbee, nyc, joe,
nyc_frisbee, nyc, ben,
nyc_frisbee, sf, tim,
sf_cooking, sf, tim,
sf_cooking, atl, jon,
My goal is to get
group, top_city, prob
nyc_firsbee, nyc, .66
sf_cooking, sf, .5
sf_ccoking, atl, .5
Edge case: In this table below, nyc_frisbee shows up only once because there is a single top city. However, sf_cooking shows up twice because sf and atl are tied.
How can I accomplish this in SQL?
What I have so far:
SELECT group, city, top/total AS prob
FROM (
SELECT
group,
city,
freq,
MAX(freq) OVER(PARTITION BY group) AS top,
SUM(freq) OVER(PARTITION BY group) AS total
FROM
(SELECT group, city, COUNT(city) AS freq
FROM mytable
GROUP BY group, city) inner_query
) outer_query
WHERE outer.freq = outer.top
This doesn't behave the way that I intended...
group, top_city, prob
nyc_firsbee, nyc, .66
nyc_firsbee, sf, .66
sf_cooking, sf, .5
sf_ccoking, atl, .5
So I'm getting one row per each unique group/city combination with the same prob value.
CodePudding user response:
How about something like:
WITH x as (
SELECT group, city, CAST(COUNT(*) AS FLOAT)/SUM(COUNT(*)) OVER(PARTITION BY group) AS prob
FROM mytable
GROUP BY group, city
)
SELECT x.*
FROM
x
INNER JOIN
(SELECT group, MAX(prob) maxprob FROM x GROUP BY group) y
ON
x.group = y.group AND
x.prob = y.maxprob
We boil our group/cities down to
group, city, prob,
nyc_frisbee, nyc, 0.66,
nyc_frisbee, sf, 0.33,
sf_cooking, sf, 0.5,
sf_cooking, atl, 0.5,
and alias as X, then we join it to a grouping of itself on the group, max(prob)
. Because sf_cooking, 0.5
is the max, the join matches twice, preserving the tie, but nyc's max is 0.66 which matches once, excluding the 0.33
CodePudding user response:
See if the following works for you. First calculating the probablity, then rank the groups ordered by the probablity, finally getting distinct rows.
with x as (
select *,
Count(*) over(partition by group, City) * 1.0 / Count(*) over(partition by group) prob
from t
), r as (
select *, dense_rank() over(partition by group order by prob desc) rn
from x
)
select distinct group, city, prob
from r
where rn=1