Home > front end >  Get the distinct rows and count of distinct rows where one column value type follows the other
Get the distinct rows and count of distinct rows where one column value type follows the other

Time:04-15

I have the following table:

order_id | created_dt | type         |
123      | 2000-01-01 | refund       |
123      | 2000-01-01 | cancellation |
124      | 2000-01-01 | refund       |
125      | 2000-01-02 | refund       |
125      | 2000-01-02 | cancellation |

I need to find the distinct orders and the count of distinct orders where a refund is followed by a cancellation. Can this be done in one query or will separate queries be easier?

Maybe LAG and LEAD functions could help but I don't know how to use them so some explanation would be of help. Thanks!

This is what I've tried but it doesn't work since the sub-query returns more than one row.

select ORDER_ID, TYPE,
       (select next_type
           from (
                    select lead(type) over (order by ORDER_ID) as next_type
                    from table_name
                    where type = 'cancellation'
                    ))
from table_name

My desired result is:

order_id | type         |
123      | refund       |   
123      | cancellation |
125      | refund       |   
125      | cancellation |   

and a way to count the number of these orders.

CodePudding user response:

You could find distinct orders that are refunded followed by a cancellation fairly simply using a subselect:

select distinct order_id from <table> t1 where type = 'refund' and (select count(*) from <table> t2 where t2.order_id = t1.order_id and t2.type = 'cancellation';

It's trivial from there to get the count, and depending on context I would generally recommend either counting the results (if you are in programming language where you can count the results), or using a count(order_id) if not.

CodePudding user response:

One problem with your query is that you select rows where type = 'cancellation' and then you look with LEAD at the next row. You won't find refunds thus, but only cancellations.

Another is that your OVER clause is wrong. You want the next type for the same order_id (PARTITION BY order_id) according to the order by date (ORDER BY created_dt).

Well, and then you don't even look for refunds in your query.

This should work for you:

select
  order_id, created_dt, type
from
(
  select
    order_id, created_dt, type,
    lag(type) over (partition by order_id order by created_dt) as prev_type,
    lead(type) over (partition by order_id order by created_dt) as next_type
  from mytable
) with_prev_and_next
where type = 'refund' and next_type = 'cancellation'
   or type = 'cancellation' and prev_type = 'refund'
order by order_id, created_dt;

This gets you all refunds and cancellations where the former is directly followed by the latter, regardless of whether there are other types in the table. If there can be other types and the cancellation doesn't have to follow the refund directly, then add a where clause to the inner query in order to look at refunds and cancellations only.

As to counting these orders:

select count(distinct order_id) from ( <above query> ) q;
  • Related