how to count distinct values in a window function-CodePudding

EDIT i added another ID column to show that i cant just do a simple group by and why needs to be some kind of window function.

I need to count the number of currencies per tx_id.

I can do this using the code below but it feels too complicated.
I think it should be possible in a single window function but I cant nail the syntax

-- test data
WITH cte AS (
    SELECT * FROM (
        VALUES
            (1,123, 'GBP'), -- 2 ccys
            (2,123, 'USD'),
            (3,123, 'USD'),
            (4,124, 'GBP'), -- 1 ccys
            (5,124, 'GBP'),
            (6,125, 'EUR'), -- 3 ccys
            (7,125, 'EUR'),
            (8,125, 'JPY'),
            (9,125, 'USD'),
            (10,125, 'EUR')
) AS a (id, tx_id, ccy)
    )

,ccy_count as (
    select id, tx_id, ccy,
        dense_rank() over  (PARTITION BY group_id ORDER BY ccy ) as dense_rank_ccy
    from cte
)
    
select id, 
        tx_id, 
        ccy, 
        max(dense_rank_ccy) over  (PARTITION BY group_id  ) as ccy_count
    from ccy_count
 order by tx_id, ccy

CodePudding user response：

If you just want to report the distinct number of currencies for each transaction, then you should be aggregating, rather than using a window function:

SELECT tx_id, COUNT(DISTINCT ccy) AS cnt
FROM cte
GROUP BY tx_id;

CodePudding user response：

You could use count(distinct ccy) in a scalar subquery as the following:

WITH cte AS (
    SELECT * FROM (
        VALUES
            (1,123, 'GBP'), -- 2 ccys
            (2,123, 'USD'),
            (3,123, 'USD'),
            (4,124, 'GBP'), -- 1 ccys
            (5,124, 'GBP'),
            (6,125, 'EUR'), -- 3 ccys
            (7,125, 'EUR'),
            (8,125, 'JPY'),
            (9,125, 'USD'),
            (10,125, 'EUR')
  ) AS a (id, tx_id, ccy)
)
select id, tx_id, ccy,
    (
     select count(distinct D.ccy)
     from cte D 
     where D.tx_id = T.tx_id
    ) distinct_ccy
from cte T

Or you could use the dense_rank function as the following:

select id, tx_id, ccy,
  dense_rank() over (partition by tx_id order by ccy)   
  dense_rank() over (partition by tx_id order by ccy desc) - 1 as distinct_ccy
from cte T
order by id

See demo