Home > Net >  Get the last state of record if it follow correct sequence
Get the last state of record if it follow correct sequence

Time:01-28

In a data table, data is stored based on the production state and shipment of a product. This means product states are

product_created
packed
shipped
delivered

The packed and shipped states are inserted from a legacy system where these states can come late, even after the box has been delivered status. I want to take the last state of the product if the product journey follows the correct sequence (product_created -> packed -> shipped -> delivered)

It should also support duplicate records, for example, product_created -> packed -> packed -> packed -> shipped -> shipped -> delivered. In this case, the sequence is correct, so it should be taken into consideration.

input table

PRODUCT_ID  STATE       INSERTION_TIME
1   product_created 2023-01-10 07:00:00
1   product_created 2023-01-10 09:00:00
1   packed          2023-01-11 01:00:00
1   packed          2023-01-11 02:00:00
1   packed          2023-01-11 09:00:00
1   shipped         2023-01-12 01:00:00
1   delivered       2023-01-12 02:00:00
2   product_created 2023-01-10 07:00:44
2   packed          2023-01-11 01:00:00
2   delivered       2023-01-11 09:00:00
2   shipped         2023-01-12 02:00:00
3   product_created 2023-01-10 07:00:00
3   packed          2023-01-11 01:00:00
3   product_created 2023-01-11 09:00:00
3   packed          2023-01-11 09:00:00
3   shipped         2023-01-12 01:00:00
3   delivered       2023-01-12 02:00:00

Output

PRODUCT_ID  STATE   INSERTION_TIME
1         delivered 2023-01-12 02:00:00

Above result does not content PRODUCT_ID =2 and 3 because it contains incorrect sequence.

My query gets the final state of a product, but I don't know how to exclude products with incorrect sequences

SELECT * FROM datatable
QUALIFY ROW_NUMBER() OVER ( PARTITION BY PRODUCT_ID ORDER BY INSERTION_TIME DESC) = 1

This query only gives the last state of each product but it may contains some product which have incorrect sequence.

CodePudding user response:

SQL allows to detect patterns across multiple rows using enter image description here

MATCH_RECOGNIZE db<>fiddle demo

CodePudding user response:

Assuming there can only be 4 distinct patterns for correct sequences, we can start by taking care of the duplicated states by keeping the most recent entry per product_id and state.

In the final select we can pick the last state per product_id on the condition that all its states when aggregated and ordered by insertion_time match one of the valid sequences.

with cte as
(select *
 from dataTable
 qualify row_number() over (partition by product_id, state order by insertion_time desc)=1)

select * 
from cte
qualify row_number() over (partition by product_id order by insertion_time desc) = 1 and
        listagg(state,',') within group (order by insertion_time asc) over (partition by product_id) in 
            ('product_created,packed,shipped,delivered',
             'product_created,packed,shipped',
             'product_created,packed',
             'product_created')
  • Related