Home > OS >  RANK function doesn't work on CTE table in posgresql?
RANK function doesn't work on CTE table in posgresql?

Time:03-30

I'm trying to rank my CTE table (a) output based on reviews and partition by two columns (risk_full_nm and exam_year_nb).

Below is my code:

\`WITH a AS
(
select rr.risk_full_nm ,e.exam_year_nb,COUNT(DISTINCT e.exam_dmn_xtrnl_id) as "reviews"
from examwrksp.dsh_exam_base e
left join star.firm f ON f.firm_id = e.crd_nb --firm
left join examwrksp.dsh_mlstn_event_dates d ON d.exam_id  = e.exam_id --exam milestones
left join examwrksp.dsh_scope_risk_cntnt rr on rr.exam_id = e.exam_id
where e.exam_year_nb \>= '2018' and e.exam_ctgry_cd = 'CYCLE' and e.exam_st_ds in ('Open','Closed') and e.dstrt_nm != 'MAP Group' and e.exam_type_ds not like 'Funding%'
and e.exam_dmn_prmry_fl = 'Y' and f.main_ofc = 'Y' and d.crd_nb is null
and rr.scope_actv_st is not null and rr.scope_actv_st = 'ACTIVE' and rr.unit_type_label_tx not in ('Discovery Review','Risk Identification Review')
GROUP BY rr.risk_full_nm ,e.exam_year_nb
)
select a.risk_full_nm,a.exam_year_nb,a.reviews,
RANK () OVER(PARTITION BY a.risk_full_nm,a.exam_year_nb ORDER BY a.reviews desc)
from a
order by reviews desc\`

enter image description here

I'd expect my rank column to be ranking my records but I'm getting "1" in every single row. What am I missing?

CodePudding user response:

I analyzed your queries. You have some incorrect logic codes in your queries. I explain to you:

  1. Window aggregate functions every time are calculated for each grouped fields. But, grouping fields get from after over (partition by statement. In your query, you have grouped two fields, a.risk_full_nm and a.exam_year_nb. After grouping these fields all rows of resulting data are different because you have not to duplicated (same) rows. So, RANK() function will be calculated again each for all rows. For best understanding, I explain to you using sample queries.

For example:

with test_data as materialized 
(
    select 1 as id, 'user1' as username, 'admin' as type_of, '2022-01-14'::date as login_date 
    union all 
    select 2 as id, 'user2' as username, 'user' as type_of, '2022-01-06'::date as login_date
    union all 
    select 3 as id, 'user1' as username, 'admin' as type_of, '2022-01-29'::date as login_date
    union all 
    select 4 as id, 'user3' as username, 'user' as type_of, '2022-02-11'::date as login_date
    union all 
    select 5 as id, 'user2' as username, 'user' as type_of, '2022-01-16'::date as login_date
    union all 
    select 6 as id, 'user2' as username, 'user' as type_of, '2022-01-17'::date as login_date
    union all 
    select 7 as id, 'user2' as username, 'user' as type_of, '2022-01-18'::date as login_date
)
select 
    id, 
    username, 
    type_of,  
    count(*) over (partition by id, username, type_of) as login_count
from 
    test_data;

-- Result of this query 
id  username    type_of     login_count
-----------------------------------------
1   user1       admin       1
2   user2       user        1
3   user1       admin       1
4   user3       user        1
5   user2       user        1
6   user2       user        1
7   user2       user        1

In this query, I wanted to get counts of logins for each user. This is incorrect syntax. So, I used the unique primary key field 'id' for grouping and my rows will be are different. So for each of these rows function count() gets only 1 value. When we remove the 'id' field from the groping field lists, we returned the correct data and count() calculated correctly. For example:

select 
    id, 
    username, 
    type_of,  
    count(*) over (partition by username, type_of) as login_count
from 
    test_data

-- Result this query: 

id  username    type_of     login_count
----------------------------------------
3   user1       admin       2
1   user1       admin       2
7   user2       user        4
2   user2       user        4
5   user2       user        4
6   user2       user        4
4   user3       user        1
  • Related