Home > Software engineering >  Identify an associated main meal for choice items
Identify an associated main meal for choice items

Time:01-31

I have a MS SQL Server database for a restaurant. There is a transactional table that shows the meals ordered and, where applicable, the optional (choice) items ordered with each meal e.g. "with chips", "with beans".

I am trying to determine, for each choice item, what the main meal was that it was ordered with.

How can I write a query in such a way that it doesn't matter how many choice items there are for a particular main meal, it will always find the appropriate main meal? We can assume that each choice item belongs to the same main meal until a new main meal appears.

The expected outcome might look something like this (first and last columns created by the query, middle columns are present in the table): -

RowWithinTxn TransactionID LineNumber CourseCategory ItemName AssociatedMainMeal
1 123 123456 Main Steak NULL
2 123 123457 Choice Chips Steak
3 123 123458 Choice Beans Steak
1 124 124567 Main Fish NULL
2 124 124568 Choice Mushy Peas Fish

As it's possible to have more than one choice item with the same main meal, something like LAST_VALUE() or LAG() (where we lag by 1), won't always give the correct answers (for example: -)

CASE WHEN CourseCategory = 'Choice Items' THEN
    LAST_VALUE(ItemName)
    OVER(PARTITION BY TransactionID
        ORDER BY LineNumber ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
    ELSE NULL END AS AssociatedMainMeal

OR

CASE WHEN CourseCategory = 'Choice Items' THEN
    LAG(ItemName,1,0)
    OVER(PARTITION BY TransactionID
        ORDER BY LineNumber)
    ELSE NULL END AS AssociatedMainMeal

I suspect ROW_NUMBER() might be useful somewhere, i.e.: ROW_NUMBER() OVER(PARTITION BY TransactionID ORDER BY LineNumber) AS RowWithinTxn, as this would chop the table up into separate TransactionIDs

Many Thanks!

CodePudding user response:

You want to use conditional aggregation here:

MAX(CASE CourseCategory WHEN 'Main' THEN ItemName END) OVER (PARTITION BY TransactionID)

CodePudding user response:

Assuming that you could have many main meals per transaction, then effectively need a LAG function that can ignore certain rows. There is an IGNORE NULLS option in some RDMBS, where you could use something like:

LAG(CASE WHEN CourseCategoruy = 'Main' THEN ItemName END) IGNORE NULLS
    OVER(PARTITION BY TransactionID ORDER BY Linenumber)

Unfortunately SQL Server does not support this, but Itzik Ben-Gan came up with a nice workaround using windowed functions. Essentially if you combine your sorting column and your data column into a single binary value, you can use MAX in a similar way to LAG since the max binary value will always be the previous row since the sorting column is part of it.

It does however make for quite painful reading because you have to convert two columns to binary and combine them:

CONVERT(BINARY(4), t.Linenumber)   
 CONVERT(BINARY(50), CASE WHEN t.CourseCategory = 'Main' THEN t.ItemName END)

Then get the max of them

MAX(<result of above expression>) OVER(PARTITION BY t.TransactionID ORDER BY t.LineNumber)

Then remove the first 4 characters (from linenumber) and convert back to varchar:

CONVERT(VARCHAR(50), SUBSTRING(<result of above expression>, 5, 50)

Then finally, you only want to show this value for non main meals

CASE WHEN t.CourseCategory <> 'Main' THEN <result of above expression> END

Putting this all together, along with some sample data you end up with:

DROP TABLE IF EXISTS #T;
CREATE TABLE #T
(
    RowWithinTxn INT,
    TransactionID INT,
    LineNumber INT,
    CourseCategory VARCHAR(6),
    ItemName VARCHAR(10)
);

INSERT INTO #T
(
    RowWithinTxn,
    TransactionID,
    LineNumber,
    CourseCategory,
    ItemName
)
VALUES
    (1, 123, 123456, 'Main', 'Steak'),
    (2, 123, 123457, 'Choice', 'Chips'),
    (3, 123, 123458, 'Choice', 'Beans'),
    (1, 124, 124567, 'Main', 'Fish'),
    (2, 124, 124568, 'Choice', 'Mushy Peas');


SELECT  t.RowWithinTxn, 
        t.TransactionID,
        t.LineNumber, 
        t.CourseCategory, 
        t.ItemName, 
        AssociatedMainMeal = CASE WHEN t.CourseCategory <> 'Main' THEN 
                                CONVERT(VARCHAR(50), SUBSTRING(MAX(CONVERT(BINARY(4), t.Linenumber)   CONVERT(BINARY(50), CASE WHEN t.CourseCategory = 'Main' THEN t.ItemName END))
                                                                    OVER(PARTITION BY t.TransactionID ORDER BY t.LineNumber), 5, 50)) 
                            END
FROM    #T AS t;
  • Related