I'm trying to rank my CTE table (a) output based on reviews and partition by two columns (risk_full_nm and exam_year_nb).
Below is my code:
\`WITH a AS
(
select rr.risk_full_nm ,e.exam_year_nb,COUNT(DISTINCT e.exam_dmn_xtrnl_id) as "reviews"
from examwrksp.dsh_exam_base e
left join star.firm f ON f.firm_id = e.crd_nb --firm
left join examwrksp.dsh_mlstn_event_dates d ON d.exam_id = e.exam_id --exam milestones
left join examwrksp.dsh_scope_risk_cntnt rr on rr.exam_id = e.exam_id
where e.exam_year_nb \>= '2018' and e.exam_ctgry_cd = 'CYCLE' and e.exam_st_ds in ('Open','Closed') and e.dstrt_nm != 'MAP Group' and e.exam_type_ds not like 'Funding%'
and e.exam_dmn_prmry_fl = 'Y' and f.main_ofc = 'Y' and d.crd_nb is null
and rr.scope_actv_st is not null and rr.scope_actv_st = 'ACTIVE' and rr.unit_type_label_tx not in ('Discovery Review','Risk Identification Review')
GROUP BY rr.risk_full_nm ,e.exam_year_nb
)
select a.risk_full_nm,a.exam_year_nb,a.reviews,
RANK () OVER(PARTITION BY a.risk_full_nm,a.exam_year_nb ORDER BY a.reviews desc)
from a
order by reviews desc\`
I'd expect my rank column to be ranking my records but I'm getting "1" in every single row. What am I missing?
CodePudding user response:
I analyzed your queries. You have some incorrect logic codes in your queries. I explain to you:
- Window aggregate functions every time are calculated for each grouped fields. But, grouping fields get from after
over (partition by
statement. In your query, you have grouped two fields,a.risk_full_nm
anda.exam_year_nb
. After grouping these fields all rows of resulting data are different because you have not to duplicated (same) rows. So,RANK()
function will be calculated again each for all rows. For best understanding, I explain to you using sample queries.
For example:
with test_data as materialized
(
select 1 as id, 'user1' as username, 'admin' as type_of, '2022-01-14'::date as login_date
union all
select 2 as id, 'user2' as username, 'user' as type_of, '2022-01-06'::date as login_date
union all
select 3 as id, 'user1' as username, 'admin' as type_of, '2022-01-29'::date as login_date
union all
select 4 as id, 'user3' as username, 'user' as type_of, '2022-02-11'::date as login_date
union all
select 5 as id, 'user2' as username, 'user' as type_of, '2022-01-16'::date as login_date
union all
select 6 as id, 'user2' as username, 'user' as type_of, '2022-01-17'::date as login_date
union all
select 7 as id, 'user2' as username, 'user' as type_of, '2022-01-18'::date as login_date
)
select
id,
username,
type_of,
count(*) over (partition by id, username, type_of) as login_count
from
test_data;
-- Result of this query
id username type_of login_count
-----------------------------------------
1 user1 admin 1
2 user2 user 1
3 user1 admin 1
4 user3 user 1
5 user2 user 1
6 user2 user 1
7 user2 user 1
In this query, I wanted to get counts of logins for each user. This is incorrect syntax. So, I used the unique primary key field 'id' for grouping and my rows will be are different. So for each of these rows function count() gets only 1 value. When we remove the 'id' field from the groping field lists, we returned the correct data and count() calculated correctly. For example:
select
id,
username,
type_of,
count(*) over (partition by username, type_of) as login_count
from
test_data
-- Result this query:
id username type_of login_count
----------------------------------------
3 user1 admin 2
1 user1 admin 2
7 user2 user 4
2 user2 user 4
5 user2 user 4
6 user2 user 4
4 user3 user 1