I have a MS SQL Server database for a restaurant. There is a transactional table that shows the meals ordered and, where applicable, the optional (choice) items ordered with each meal e.g. "with chips", "with beans".
I am trying to determine, for each choice item, what the main meal was that it was ordered with.
How can I write a query in such a way that it doesn't matter how many choice items there are for a particular main meal, it will always find the appropriate main meal? We can assume that each choice item belongs to the same main meal until a new main meal appears.
The expected outcome might look something like this (first and last columns created by the query, middle columns are present in the table): -
RowWithinTxn | TransactionID | LineNumber | CourseCategory | ItemName | AssociatedMainMeal |
---|---|---|---|---|---|
1 | 123 | 123456 | Main | Steak | NULL |
2 | 123 | 123457 | Choice | Chips | Steak |
3 | 123 | 123458 | Choice | Beans | Steak |
1 | 124 | 124567 | Main | Fish | NULL |
2 | 124 | 124568 | Choice | Mushy Peas | Fish |
As it's possible to have more than one choice item with the same main meal, something like LAST_VALUE() or LAG() (where we lag by 1), won't always give the correct answers (for example: -)
CASE WHEN CourseCategory = 'Choice Items' THEN
LAST_VALUE(ItemName)
OVER(PARTITION BY TransactionID
ORDER BY LineNumber ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
ELSE NULL END AS AssociatedMainMeal
OR
CASE WHEN CourseCategory = 'Choice Items' THEN
LAG(ItemName,1,0)
OVER(PARTITION BY TransactionID
ORDER BY LineNumber)
ELSE NULL END AS AssociatedMainMeal
I suspect ROW_NUMBER() might be useful somewhere, i.e.: ROW_NUMBER() OVER(PARTITION BY TransactionID ORDER BY LineNumber) AS RowWithinTxn, as this would chop the table up into separate TransactionIDs
Many Thanks!
CodePudding user response:
You want to use conditional aggregation here:
MAX(CASE CourseCategory WHEN 'Main' THEN ItemName END) OVER (PARTITION BY TransactionID)
CodePudding user response:
Assuming that you could have many main meals per transaction, then effectively need a LAG
function that can ignore certain rows. There is an IGNORE NULLS
option in some RDMBS, where you could use something like:
LAG(CASE WHEN CourseCategoruy = 'Main' THEN ItemName END) IGNORE NULLS
OVER(PARTITION BY TransactionID ORDER BY Linenumber)
Unfortunately SQL Server does not support this, but Itzik Ben-Gan came up with a nice workaround using windowed functions. Essentially if you combine your sorting column and your data column into a single binary value, you can use MAX
in a similar way to LAG
since the max binary value will always be the previous row since the sorting column is part of it.
It does however make for quite painful reading because you have to convert two columns to binary and combine them:
CONVERT(BINARY(4), t.Linenumber)
CONVERT(BINARY(50), CASE WHEN t.CourseCategory = 'Main' THEN t.ItemName END)
Then get the max of them
MAX(<result of above expression>) OVER(PARTITION BY t.TransactionID ORDER BY t.LineNumber)
Then remove the first 4 characters (from linenumber) and convert back to varchar:
CONVERT(VARCHAR(50), SUBSTRING(<result of above expression>, 5, 50)
Then finally, you only want to show this value for non main meals
CASE WHEN t.CourseCategory <> 'Main' THEN <result of above expression> END
Putting this all together, along with some sample data you end up with:
DROP TABLE IF EXISTS #T;
CREATE TABLE #T
(
RowWithinTxn INT,
TransactionID INT,
LineNumber INT,
CourseCategory VARCHAR(6),
ItemName VARCHAR(10)
);
INSERT INTO #T
(
RowWithinTxn,
TransactionID,
LineNumber,
CourseCategory,
ItemName
)
VALUES
(1, 123, 123456, 'Main', 'Steak'),
(2, 123, 123457, 'Choice', 'Chips'),
(3, 123, 123458, 'Choice', 'Beans'),
(1, 124, 124567, 'Main', 'Fish'),
(2, 124, 124568, 'Choice', 'Mushy Peas');
SELECT t.RowWithinTxn,
t.TransactionID,
t.LineNumber,
t.CourseCategory,
t.ItemName,
AssociatedMainMeal = CASE WHEN t.CourseCategory <> 'Main' THEN
CONVERT(VARCHAR(50), SUBSTRING(MAX(CONVERT(BINARY(4), t.Linenumber) CONVERT(BINARY(50), CASE WHEN t.CourseCategory = 'Main' THEN t.ItemName END))
OVER(PARTITION BY t.TransactionID ORDER BY t.LineNumber), 5, 50))
END
FROM #T AS t;