COUNT DISTINCT WINDOW FUNCTION with conditions SQL Oracle-CodePudding

I have a data table as below:

 
CustomerID | Contract_id | Value_date
----------- ------------- -----------
A          | 1234        | 01-JUL-20  
A          | 7896        | 20-DEC-20  
C          | 6578        | 01-JUN-20  
C          | 8990        | 20-OCT-20  
C          | 4789        | 20-DEC-21
B          | 3457        | 09-SEP-21

I tried to count distinct number of Contract_id before each Value_date for each unique CustomerID.

The desired result is:

CustomerID | Value_date  | Count_contract
----------- ------------- -----------
A          | 01-JUL-20   | 0
A          | 20-DEC-20   | 1
C          | 01-JUN-20   | 0
C          | 20-OCT-20   | 1
C          | 20-DEC-21   | 2
B          | 09-SEP-21   | 0

Could anyone suggest how I could count distinct in this case?

I tried the window functions:

select distinct CustomerID,
Value_date,
count(distinct Contract_id) over (partition by CustomerID order by Value_date rows unbounded preceding) count_contract
from tbl_cus_contract;

but it didn't work with the error:

SQL Error: ORA-30487: ORDER BY not allowed here
30487. 00000 -  "ORDER BY not allowed here"
*Cause:    DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY
*Action:

THANK YOU ALL FELLOWS FOR THE ANSWERS!!! I found a nicer way to count distinct as several different Contract_id of the same CustomerID have same value_date.

I was using rank()

Here is my solution: https://dbfiddle.uk/?rdbms=oracle_21&fiddle=c145c0432e003c1be52dbe0685c8d259

CodePudding user response：

We can try to use ROW_NUMBER window function and subtracted with 1

SELECT t1.CustomerID,
       t1.Value_date,
       ROW_NUMBER() OVER(PARTITION BY t1.CustomerID ORDER BY t1.Value_date) - 1
FROM tbl_cus_contract t1

CodePudding user response：

Your expected result does not appear to require counting distinct contract_id's and if that is true then you could just omit the use of distinct within the count and your query would work.

SELECT t1.CustomerID
    , t1.Value_date
    , COUNT(*) OVER (PARTITION BY t1.CustomerID ORDER BY t1.Value_date) - 1
FROM tbl_cus_contract t1
ORDER BY t1.CustomerID
    , t1.Value_date

CodePudding user response：

You can check the first occurrence of the contract_id for a given customer. Then Count only the first occurrence.

WITH cte
     AS (SELECT customerid,value_date,contract_id,
                CASE WHEN Row_number() OVER( PARTITION BY customerid, contract_id
                           ORDER BY value_date ) = 1 THEN 1
                END AS first_occurrence_of_contract
         FROM   tbl_cus_contract
)
SELECT customerid,value_date,
       Count(first_occurrence_of_contract)
         OVER (PARTITION BY customerid 
              ORDER BY value_date ROWS unbounded preceding) - 1 AS count_contract
FROM   cte

db<>fiddle: Try here

CodePudding user response：

This may be done with a model clause:

select *
from t
model
  /*The same as in analytic function*/
  partition by (customer_id)
  /*This would be used as ordering criteria later*/
  dimension by (dt)
  measures (
    /*Need to specify all the columns
      not in partition or dimension*/
    contract_id,
    0 as cnt
  )
  rules (
    cnt[any] =
      /*dt <= cv(dt) is all dates before
      the current value of date.
      The same as window specification*/
      count(distinct contract_id)[dt <= cv(dt)]
  )

CUSTOMER_ID | DT        | CONTRACT_ID | CNT
:---------- | :-------- | ----------: | --:
A           | 01-JUL-20 |        1234 |   1
A           | 20-DEC-20 |        7896 |   2
C           | 01-JUN-20 |        6578 |   1
C           | 20-OCT-20 |        8990 |   2
C           | 20-DEC-21 |        4789 |   3
C           | 23-DEC-21 |        4789 |   3
B           | 09-SEP-21 |        3457 |   1

db<>fiddle here

UPD: In case of multiple contract_id per date you may add unique single reference, so model will not comply on duplicated dimension values. See updated code and a result:

select *
from t
model
  /*The same as in analytic function*/
  partition by (customer_id)
  /*This would be used as ordering criteria later*/
  dimension by (dt)
  measures (
    /*Need to specify all the columns
      not in partition or dimension*/
    contract_id,
    0 as cnt
  )
  unique single reference
  rules (
    cnt[any] =
      /*dt <= cv(dt) is all dates before
      the current value of date.
      The same as window specification*/
      count(distinct contract_id)[dt <= cv(dt)]
  )

CUSTOMER_ID | DT        | CONTRACT_ID | CNT
:---------- | :-------- | ----------: | --:
A           | 01-JUL-20 |        1234 |   1
A           | 20-DEC-20 |        7896 |   2
C           | 01-JUN-20 |        6578 |   1
C           | 20-OCT-20 |        8990 |   3
C           | 20-OCT-20 |        9999 |   3
C           | 20-DEC-21 |        4789 |   4
C           | 23-DEC-21 |        4789 |   4
B           | 09-SEP-21 |        3457 |   1

db<>fiddle here