Home > Blockchain >  How to perform a group by and then do order by on records belonging to same group?
How to perform a group by and then do order by on records belonging to same group?

Time:02-21

I have a table that has ORDER_LINE_ID as primary key. 2 other columns of concern are STATUS_ID and STATUS_TS. Over the course of time, the STATUS_ID and STATUS_TS change for the same ORDER_LINE_ID and it is all collected up.

Now, I have multiple records with same ORDER_LINE_ID having different STATUS_ID & STATUS_TS. My goal is to "group by" based on ORDER_LINE_ID and then, perform an "Order by" on STATUS_TS, then collect the STATUS_ID, OL_ID and STATUS_TS corresponding to the most recent value of STATUS_TS using a first() function (to select the top record in the group by) or some other means.

Mind that there is a 'HAVING' clause in the code following the "group by" condition to exclude few STATUS_IDs.

Attaching last part of code for reference:

Select

  COLR.ORDER_DT ORDER_DT,
  COLR.ORDER_LINE_ID ORDER_LINE_ID,
  COLR.ORDER_HEADER_KEY ORDER_HEADER_KEY,
  COLR.ORDER_LINE_KEY ORDER_LINE_KEY,
  MIN(CASE WHEN COLR.STATUS_QTY > 0 THEN COLR.STATUS_ID END)  LINE_STATUS, 
  --Max(CASE WHEN COLR.STATUS_QTY > 0 THEN COLR.STATUS_QTY END) UNIT,

  SUBSTRING(MIN(CONCAT(LPAD( CASE WHEN COLR.STATUS_QTY > 0 THEN COLR.STATUS_ID END , 11, '0'), COLR.STATUS_QTY)), 12) AS UNIT,

  SUBSTRING(MAX(CONCAT(LPAD( CASE WHEN COLR.STATUS_QTY > 0 THEN COLR.STATUS_ID END , 11, '0'), COLR.STATUS_DESC)), 12) AS LINE_DESCRIPTION,

  COLR.ITEM_KEY ITEM_KEY,
  COLR.PRODUCT_LINE PRODUCT_LINE,
  COLR.SHIP_NODE_CD SHIP_NODE_CD,
  COLR.RECEIVING_NODE_CD RECEIVING_NODE_CD,
  COLR.LINE_TOTAL_AMT LINE_TOTAL_AMT,                                   --Nile bpk
  COLR.ADDITIONAL_LINE_TYPE_CD ADDITIONAL_LINE_TYPE_CD,                 --Nile bpk
  COLR.RETURN_ACTION_CD RETURN_ACTION_CD,                               --Nile bpk
  COLR.ORDER_QTY ORDER_QTY,                                             --Nile bpk
  COLR.RETURN_REASON_CD RETURN_REASON_CD,                               --Nile bpk
  COLR.RETURN_SUB_REASON_CD RETURN_SUB_REASON_CD,                       --Nile bpk
  COLR.RETURN_REASON_DESC RETURN_REASON_DESC,                           --Nile bpk
  COLR.RETURN_ACTION RETURN_ACTION,                                     --Nile bpk
  MIN(COLR.STATUS_TS) STATUS_TS,                                        --Nile bpk

  MIN(CASE WHEN  (TRIM(COLR.STATUS_ID)) ='3700' THEN (COLR.INSERT_TS) ELSE NULL END) ORDER_SHIPPED_DT,
  MIN(CASE WHEN  (TRIM(COLR.STATUS_ID)) ='3700.8000' THEN (COLR.INSERT_TS) ELSE NULL END) STORE_RECEIVED_DT,
  MIN(CASE WHEN  (TRIM(COLR.STATUS_ID)) ='3700.9000' THEN (COLR.INSERT_TS) ELSE NULL END) CUSTOMER_PICKED_UP_DATE,
  MIN(CASE WHEN  (TRIM(COLR.STATUS_ID)) in ('9000','3700.01.545','3200.525','9000.300','3200.520','3700.01.540','1100.525') OR (TRIM(COLR.STATUS_ID)) >= '9000'  THEN (COLR.INSERT_TS) ELSE NULL END) CANCELLED_DATE,
  
  MIN(CASE WHEN  (TRIM(COLR.STATUS_ID)) ='3700.9000' THEN (COLR.INSERT_TS) ELSE NULL END) CUSTOMER_PICKUP_TS,            --Nile bpk
  MIN(CASE WHEN  (TRIM(COLR.STATUS_ID)) ='3200.050' THEN (COLR.INSERT_TS) ELSE NULL END) ORDER_DROP_TS,                  --Nile bpk
  
  MIN(CASE WHEN  (TRIM(COLR.STATUS_ID)) in ('3700.01','3700.01.01','3700.02') THEN (COLR.INSERT_TS) ELSE NULL END) RETURN_DATE,

  MIN(CASE WHEN  (TRIM(COLR.STATUS_ID)) in ('1100','1100.200','1100.525','1300','1310') AND COLR.STATUS_QTY > 0 THEN (COLR.STATUS_QTY) ELSE NULL END) PENDING_OMS_QTY,

  MIN(CASE WHEN  (TRIM(COLR.STATUS_ID)) in ('1500','1500.100','1500.101','3200','3200.050','3200.100','3200.200','3200.500','3200.520') AND COLR.STATUS_QTY > 0 THEN (COLR.STATUS_QTY) ELSE NULL END) PENDING_SHIPMENT_QTY,
  MIN(CASE WHEN  (TRIM(COLR.STATUS_ID)) in ('3700','3700.00.03','3700.01.03','3700.01.540','3700.500','3700.7777') AND COLR.STATUS_QTY > 0 THEN (COLR.STATUS_QTY) ELSE NULL END) IN_TRANSIT_QTY,

  MIN(CASE WHEN  (TRIM(COLR.STATUS_ID)) ='3700.8000' AND COLR.STATUS_QTY > 0 THEN (COLR.STATUS_QTY) ELSE NULL END) AWAITING_PICKUP_QTY,
  MIN(CASE WHEN  (TRIM(COLR.STATUS_ID)) ='3700.9000' AND COLR.STATUS_QTY > 0 THEN (COLR.STATUS_QTY) ELSE NULL END) CUSTOMER_PICKED_UP_QTY,
  MIN(CASE WHEN  (TRIM(COLR.STATUS_ID)) in ('1300','1310','1500','1500.100') AND COLR.STATUS_QTY > 0 THEN (COLR.STATUS_QTY) ELSE NULL END) BO_STATUS_QTY --Added for BOSTS story by NILE team

FROM
    (select * from ((select * from TEMP_SALES_ORDER_DATA)UNION(select * from TEMP_RET_ORDER_DATA)) where STATUS_QTY >0
      ) COLR

GROUP BY
 COLR.ORDER_DT,
 COLR.ORDER_LINE_ID,
 COLR.ORDER_LINE_KEY,
 COLR.PRODUCT_LINE,
 COLR.ITEM_KEY,
 COLR.ORDER_HEADER_KEY,
 COLR.SHIP_NODE_CD,
 COLR.RECEIVING_NODE_CD,
 COLR.LINE_TOTAL_AMT,                                         
 COLR.ADDITIONAL_LINE_TYPE_CD,                                
 COLR.RETURN_ACTION_CD,                                       
 COLR.ORDER_QTY,                                              
 COLR.RETURN_REASON_CD,                                       
 COLR.RETURN_SUB_REASON_CD,                                   
 COLR.RETURN_REASON_DESC,                                     
 COLR.RETURN_ACTION                                           


HAVING
LINE_STATUS not in ('3700.01.545','3200.525','9000.300','3200.520','3700.01.540','1100.525')  --Cancelled
AND LINE_STATUS <= '9000' -- Cancelled
AND LINE_STATUS NOT IN ('3700.01.01','3700.02') 

I feel that using an order by in a group by could negate the group by operation. Need some guidance here.

CodePudding user response:

I interpret your question to be something like this:

How do I get the most recent status of each order line?

Depending on what version of MySQL you have available you may use row_number() over() to achieve this. (in MySQL v8 or later.) Note that you don't specify what timestamp column is available to determine "the most recent" but I have assumed ORDER_DT can be used for this. If not substitute the correct field's name where I have used ORDER_DT.

Example query:

SELECT
      ORDER_LINE_ID
    , STATUS_ID
    , OL_ID
    , STATUS_TS
FROM (
    SELECT *
        , ROW_NUMBER() OVER (
            PARTITION BY ORDER_LINE_ID ORDER BY ORDER_DT DESC
            ) AS rn
    FROM (
        SELECT *
        FROM TEMP_SALES_ORDER_DATA
        
        UNION ALL
        
        SELECT *
        FROM TEMP_RET_ORDER_DATA
        )
    WHERE STATUS_QTY > 0
    ) COLR
WHERE colr.rn = 1

What the above row_number() function will return is controlled by the over() clause as follows:

  • PARTITION BY ORDER_LINE_ID is similar to grouping, but unlike group by, all rows are returned. When data is partitioned, row numbering (re)commences at 1 for each change of value in ORDER_LINE_ID
  • ORDER BY ORDER_DT DESC is vital, here the dates are arranged in descending order and the first row with "the most recent date" is given a row number of 1. Any other rows within the same partition will have a row number incremented by 1, so only one row within any partition can have a value of 1.
  • Note it is necessary to formulate the row numbers in a derived table first so that you can subsequently filter out all unwanted rows via the column alias you use e.g. WHERE colr.rn = 1

notes:

  • I would use UNION ALL rather than UNION as there is no advantage to removing duplicate rows as we will only return 1 row for each order line id.
  • to achieve "the oldest" instead of "most recent" then change the order to ASC (ascending) insted of DESC (descending)
  • it can be useful or necessary to have more columns in the order by portion of the over clause e.g. perhaps an ID column may help decide which row is returned if multiple rows have the same date or timestamp.

Also worthy of note is that you can return the whole unchanged row using this technique which is quite unlike use of group by where the data being returned could be sourced from different rows (e.g. the minimum and maximum values of any column are very likely to be stored in different rows).

Refer to "window functions"

  •  Tags:  
  • sql
  • Related