Home > Software design >  Get first and last Order and the highest value Item in each order for each Customer, all of which ar
Get first and last Order and the highest value Item in each order for each Customer, all of which ar

Time:09-21

I need to find the first and last Order for each Customer by OrderDate, and the name and SKU of the item with the highest business volume in each of those orders. For reference, the Customer table has >150k records, and Orders and OrderDetails (these are the Items) a lot more.

Note: Both Orders and their respective items should be in the same row as the Customer

Orders

OrderID   OrderDate     CustomerID   BusinessVolumeTotal   Subtotal
13212     '2021-09-06'  512312       500.00                25.60

OrderDetails

OrderID   ItemCode   ItemDescription   BusinessVolume
13212     'SKW-BS'   'Some item'       450.00

In my first query I attempted to stick to joining instead of subqueries, which resulted in this

select distinct(c.CustomerID), c.FirstName   ' '   c.LastName as Name,
    cs.CustomerStatusDescription as Status,
    ct.CustomerTypeDescription as Type, pv.Volume80 as G3,
    fo.OrderID,fo.OrderDate,fo.SubTotal,fo.Country, fod.ItemCode, fod.ItemDescription, fopt.PriceTypeID,
    lo.OrderID,lo.OrderDate,lo.SubTotal,lo.Country, lod.ItemCode, lod.ItemDescription, lopt.PriceTypeID
from Customers c
    left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID
    left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID
    left join PeriodVolumes pv on pv.CustomerID = c.CustomerID
    left join Orders fo on fo.CustomerID = c.CustomerID -- First Order
    left join Orders lo on lo.CustomerID = c.CustomerID -- Last Order
    left join OrderDetails fod on fod.OrderID = fo.OrderID
    left join OrderDetails lod on lod.OrderID = lo.OrderID
    left join PriceTypes fopt on fo.PriceTypeID = fopt.PriceTypeID
    left join PriceTypes lopt on lo.PriceTypeID = lopt.PriceTypeID
where c.CustomerStatusID in (1,2)
  and c.CustomerTypeID in (2,3)
  and pv.PeriodTypeID = 2
  /* First Order */
  and fo.OrderID = (select top 1(OrderID) from Orders where CustomerID = c.CustomerID and OrderStatusID>=7 order by OrderDate )
  and fod.ItemID = (select top 1(ItemID) from OrderDetails where OrderID = fo.OrderID order by BusinessVolume)
  /* Last Order */
  and lo.OrderID = (select top 1(OrderID) from Orders where CustomerID = c.CustomerID and OrderStatusID>=7 order by OrderDate desc)
  and lod.ItemID = (select top 1(ItemID) from OrderDetails where OrderID = lo.OrderID order by BusinessVolume desc)
  and pv.PeriodID = (select PeriodID from Periods where PeriodTypeID=2 and StartDate <= @now and EndDate >= @now)

But this ends up executing for about 6-7 minutes. From the explain plan, it looks like most of it is taken up by a Key Lookup for Orders based on OrderStatusID >= 7.

So I tried to use window functions to achieve the same:

select distinct(c.CustomerID), c.FirstName   ' '   c.LastName as Name, cs.CustomerStatusDescription as Status,
   ct.CustomerTypeDescription as Type,  pv.Volume80 as G3, 
   fal.*
from Customers c
left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID
left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID
left join PeriodVolumes pv on pv.CustomerID = c.CustomerID
left join(
    select
        CustomerID,
        max(case when MinDate = 1 then OrderID end)                 FirstOrderID,
        max(case when MinDate = 1 then OrderDate end)               FirstOrderDate,
        max(case when MinDate = 1 then BusinessVolumeTotal end)     FirstBVTotal,
        max(case when MinDate = 1 then PriceTypeDescription end)    FirstPriceType,
        max(case when MinDate = 1 then ItemCode end)                FirstItemCode,
        max(case when MinDate = 1 then ItemDescription end)         FirstItemDescription,
        max(case when MaxDate = 1 then OrderID end)                 LastOrderID,
        max(case when MaxDate = 1 then OrderDate end)               LastOrderDate,
        max(case when MaxDate = 1 then BusinessVolumeTotal end)     LastBVTotal,
        max(case when MaxDate = 1 then PriceTypeDescription end)    LastPriceType,
        max(case when MaxDate = 1 then ItemCode end)                LastItemCode,
        max(case when MaxDate = 1 then ItemDescription end)         LastItemDescription
    from
        (
            select distinct o.CustomerID,
            o.OrderID,
            o.OrderDate,
            o.BusinessVolumeTotal,
            PT.PriceTypeDescription,
            RANK() over (partition by o.CustomerID order by OrderDate) as MinDate,
            RANK() over (partition by o.CustomerID order by OrderDate desc) as MaxDate,
            FIRST_VALUE(ItemCode) over (partition by od.OrderID order by BusinessVolume desc) as ItemCode,
            FIRST_VALUE(ItemDescription) over (partition by od.OrderID order by BusinessVolume desc) as ItemDescription
            from Orders o
                     left join OrderDetails od on od.OrderID = o.OrderID
                     left join PriceTypes PT on o.PriceTypeID = PT.PriceTypeID
            where o.OrderStatusID >= 7
        ) fal
    group by CustomerID
) fal on c.CustomerId = fal.CustomerID
where c.CustomerStatusID in (1,2)
and c.CustomerTypeID in (2,3)
and pv.PeriodTypeID = 2
    /* CurrentG3 */
and pv.PeriodID = (select PeriodID from Periods where PeriodTypeID=2 and StartDate <= @now and EndDate >= @now)

Alas, this ended up executing even longer. I need a way to optimize this if possible.

Secondary query

I do also need a count and sum of volume per Order in the last 3, 6 and 12 months. I currently do this programatically as secondary queries after the original returns a result, and I forward the CustomerIDs, like this:

select count(OrderID) as Cnt, sum(BusinessVolumeTotal) as Bv, CustomerID
from Orders where OrderStatusID > 6 and OrderTypeID in (1,4,8,11)
and OrderDate >= @timeAgo and CustomerID in @ids group by CustomerID

Times 3, because 3, 6 and 12 months. Ideally, I would also like to make this part of the original but I don't really have a good idea on how to do it, especially with how convoluted the joining is with the orders.

So ideally I'd end up with a result table like this

CustomerID  Name       CustomerStatus  CustomerType  FirstOrderID  FirstOrderDate  FirstBVTotal FirstItemCode  FirstItemDesc  FirstPriceType  LastOrderID  LastOrderDate  LastBVTotal  LastItemCode  LastItemDesc  LastPriceType  ThreeMonthCount  ThreeMonthTotal  SixMonthCount SixMonthTotal  TwelveMonthCount  TwelveMonthTotal
512312     'Jane Doe'  'Active'        'Retail'      13212         '2020-06-06'    50.00        'Item1'        'Item 1 desc'  'Retail'        14321        '2021-09-01'   200.00       'Item2'       'Item 2 desc' 'Retail'       45               4305.00          76            8545.60        183               21542.95

Any help and advice on how to optimize or reduce the query, as well as anything you believe Im doing wrong would be GREATLY appreciated.

P.S. I don't know if the title is fitting and if I'd be able to change it later, it's been a while since I've used SO to ask a question.

UPDATE

Actual Execution Plan for Query 1:
https://www.brentozar.com/pastetheplan/?id=SJd56RSmK

Actual Execution Plan for Query 2:
https://www.brentozar.com/pastetheplan/?id=BJ7QHk87Y

CodePudding user response:

I think you need to keep in mind two main points with this type of query:

  • The key to good performance with window functions is to not introduce an unnecessary sort. So while you can use ROW_NUMBER to get the first order in either direction, you should not use another opposing ROW_NUMBER to get the last. Rather use LEAD to check if the next row exists, thereby telling you if this is the last row. You can then use conditional aggregation.
  • There are generally two ways to calculate first/last: a row-numbering solution, as above, or an APPLY, which picks out the exact one you need.
    I think that for the OrderDetails we should use an apply, because there are only two orders per customer that we need to find. This does need good indexing, so if OrderDetails is not well indexed, then you may want to switch to a row-numbering solution for this also.
select
    c.CustomerID,
    c.FirstName   ' '   c.LastName as Name,
    cs.CustomerStatusDescription as Status,
    ct.CustomerTypeDescription as Type,
    pv.Volume80 as G3,
    o.FirstOrderID,
    o.FirstOrderDate,
    o.FirstSubTotal,
    o.FirstCountry,
    fod.ItemCode as FirstItemCode,
    fod.ItemDescription as FirstItemDescription,
    fopt.PriceTypeDescription as FirstPriceTypeDescription,
    o.LastOrderID,
    o.LastOrderDate,
    o.LastSubTotal,
    o.LastCountry,
    lod.ItemCode as LastItemCode,
    lod.ItemDescription as LastItemDescription,
    lopt.PriceTypeDescription as LastPriceTypeDescription 
from Customers c
left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID
left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID
left join PeriodVolumes pv on pv.CustomerID = c.CustomerID
  and pv.PeriodTypeID = 2
  and pv.PeriodID = (
    select top 1 PeriodID
    from Periods p
    where p.PeriodTypeID = 2
      and p.StartDate <= @now
      and p.EndDate >= @now
  )
left join (
    select
      o.CustomerID,
      min(case when rn = 1 then OrderID end) as FirstOrderId,
      min(case when rn = 1 then OrderDate end) as FirstOrderDate,
      min(case when rn = 1 then SubTotal end) as FirstSubTotal,
      min(case when rn = 1 then Country end) as FirstCountry,
      min(case when nx is null then OrderID end) as LastOrderId,
      min(case when nx is null then OrderDate end) as LastOrderDate,
      min(case when nx is null then SubTotal end) as LastSubTotal,
      min(case when nx is null then Country end) as LastCountry,
      count(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then 1 end) as ThreeMonthCount,
      sum(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then BusinessVolumeTotal end) as ThreeMonthTotal,
      count(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then 1 end) as SixMonthCount,
      sum(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then BusinessVolumeTotal end) as SixMonthTotal,
      count(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then 1 end) as TwelveMonthCount,
      sum(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then BusinessVolumeTotal end) as TwelveMonthTotal
    from (
        select *,
            ROW_NUMBER() over (partition by o.CustomerID order by OrderDate) as rn,
            LEAD(OrderID) over (partition by o.CustomerID order by OrderDate) as nx
        from Orders o
        where o.OrderStatusID >= 7
          and o.OrderTypeID in (1,4,8,11)
          and o.OrderDate >= @timeAgo
    ) o
    group by o.CustomerID
) o on o.CustomerID = c.CustomerID
outer apply (
    select top 1
      od.ItemCode,
      od.ItemDescription
    from OrderDetails od
    order by od.BusinessVolume desc
    where od.OrderID = o.FirstOrderId
) fod
outer apply (
    select top 1
      od.ItemCode,
      od.ItemDescription
    from OrderDetails od
    order by od.BusinessVolume desc
    where od.OrderID = o.LastOrderId
) lod
left join PriceTypes fopt on fopt.PriceTypeID = o.FirstPriceTypeID 
left join PriceTypes lopt on lopt.PriceTypeID = o.LastPriceTypeID 
where c.CustomerStatusID in (1,2)
  and c.CustomerTypeID in (2,3);

I'm also going to give you a row-numbering version, as judging by your execution plan, it may actually be better. You need to try both

select
    c.CustomerID,
    c.FirstName   ' '   c.LastName as Name,
    cs.CustomerStatusDescription as Status,
    ct.CustomerTypeDescription as Type,
    pv.Volume80 as G3,
    o.FirstOrderID,
    o.FirstOrderDate,
    o.FirstSubTotal,
    o.FirstCountry,
    o.FirstItemCode,
    o.FirstItemDescription,
    o.FirstPriceTypeDescription,
    o.LastOrderID,
    o.LastOrderDate,
    o.LastSubTotal,
    o.LastCountry,
    o.LastItemCode,
    o.LastItemDescription,
    o.LastPriceTypeDescription 
from Customers c
left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID
left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID
left join PeriodVolumes pv on pv.CustomerID = c.CustomerID
  and pv.PeriodTypeID = 2
  and pv.PeriodID = (
    select top 1 PeriodID
    from Periods p
    where p.PeriodTypeID = 2
      and p.StartDate <= @now
      and p.EndDate >= @now
  )
left join (
    select
      o.CustomerID,
      min(case when rn = 1 then o.OrderID end) as FirstOrderId,
      min(case when rn = 1 then o.OrderDate end) as FirstOrderDate,
      min(case when rn = 1 then o.SubTotal end) as FirstSubTotal,
      min(case when rn = 1 then o.Country end) as FirstCountry,
      min(case when rn = 1 then od.ItemCode end) as FirstItemCode,
      min(case when rn = 1 then od.ItemDescription end) as FirstItemDescription,
      min(case when rn = 1 then opt.PriceTypeDescription end) as FirstPriceTypeDescription,
      min(case when nx is null then o.OrderID end) as LastOrderId,
      min(case when nx is null then o.OrderDate end) as LastOrderDate,
      min(case when nx is null then o.SubTotal end) as LastSubTotal,
      min(case when nx is null then o.Country end) as LastCountry,
      min(case when nx is null then od.ItemCode end) as LastItemCode,
      min(case when nx is null then od.ItemDescription end) as LastItemDescription,
      min(case when nx is null then opt.PriceTypeDescription end) as LastPriceTypeDescription,
      count(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then 1 end) as ThreeMonthCount,
      sum(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then BusinessVolumeTotal end) as ThreeMonthTotal,
      count(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then 1 end) as SixMonthCount,
      sum(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then BusinessVolumeTotal end) as SixMonthTotal,
      count(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then 1 end) as TwelveMonthCount,
      sum(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then BusinessVolumeTotal end) as TwelveMonthTotal
    from (
        select *,
            ROW_NUMBER() over (partition by o.CustomerID order by OrderDate) as rn,
            LEAD(OrderID) over (partition by o.CustomerID order by OrderDate) as nx
        from Orders o
        where o.OrderStatusID >= 7
          and o.OrderTypeID in (1,4,8,11)
          and o.OrderDate >= @timeAgo
    ) o
    left join PriceTypes opt on opt.PriceTypeID = o.PriceTypeID 
    join (
        select *,
          ROW_NUMBER() over (partition by od.OrderID order by od.BusinessVolume desc) as rn
        from OrderDetails od
    ) od on od.OrderID = o.OrderId
    where rn = 1 or nx is null
) o on o.CustomerID = c.CustomerID
where c.CustomerStatusID in (1,2)
  and c.CustomerTypeID in (2,3);

Good indexing is essential to good performance. I would expect roughly the following indexes on your tables, either clustered or non-clustered (clustered indexed INCLUDE every other column automatically), you can obviously add other INCLUDE columns if needed:

Customers (CustomerID) INCLUDE (FirstName, LastName)

CustomerTypes (CustomerTypeID) INCLUDE (CustomerTypeDescription)

CustomerStatuses (CustomerStatusID) INCLUDE (CustomerTypeDescription)

PeriodVolumes (CustomerID) INCLUDE (Volume80)

Periods (PeriodTypeID, StartDate, PeriodID) INCLUDE (EndDate)  -- can swap Start and End

Orders (CustomerID, OrderDate) INCLUDE (OrderStatusID, SubTotal, Country, BusinessVolumeTotal)

OrderDetails (OrderID, BusinessVolume) INCLUDE (ItemCode ItemDescription)

PriceTypes (PriceTypeID) INCLUDE (PriceTypeDescription)

You should think carefully about INNER vs LEFT joins, because the optimizer can more easily move around an INNER join.

Note also, that DISTINCT is not a function, it is calculated over an entire set of columns. Generally, one can assume that if a DISTINCT is in the query then the joins have not been thought through properly.

  • Related