I have an order line table that looks like this:
ID | Order ID | Product Reference | Variant |
---|---|---|---|
1 | 1 | Banana | Green |
2 | 1 | Banana | Yellow |
3 | 2 | Apple | Green |
4 | 2 | Banana | Brown |
5 | 3 | Apple | Red |
6 | 3 | Apple | Yellow |
7 | 4 | Apple | Yellow |
8 | 4 | Banana | Green |
9 | 4 | Banana | Yellow |
10 | 4 | Pear | Green |
11 | 4 | Pear | Green |
12 | 4 | Pear | Green |
I want to know how often people place an order with a combination of different fruit products. I want to know the orderId
for that situation and which productReference
was combined in the orders.
I only care about the product, not the variant.
I would imagine the desired output looking like this - a simple table output that gives insight in what product combos are ordered:
Order ID | Product |
---|---|
2 | Banana |
2 | Apple |
4 | Banana |
4 | Apple |
4 | Pear |
I just need data output of the combination Banana Apple
and Banana Apple Pear
happening so I can get more insight in the frequency of how often this happens. We expect most of our customers to only order Apple
, Banana
or Pear
products, but that assumption needs to be verified.
Problem
I kind of get stuck after the first step.
select orderId, productReference, count(*) as amount
from OrderLines
group by orderId, productReference
This outputs:
Order ID | Product Reference | amount |
---|---|---|
1 | Banana | 2 |
2 | Apple | 1 |
2 | Banana | 1 |
3 | Apple | 2 |
4 | Apple | 1 |
4 | Banana | 2 |
4 | Pear | 3 |
I just don't know how to move on from this step to get the data I want.
CodePudding user response:
You can use a window count() over()
select *
from
(
select orderId, productReference, count(*) as amount
, count(productReference) over(partition by orderId) np
from OrderLines
group by orderId, productReference
) t
where np > 1
CodePudding user response:
You need only the rows where an Order_Id has different products; you can do this many ways.
One way is to aggregate and filter to only rows where the min product <> the max product, then use a correlation to find matching orders:
select distinct t.Order_ID, t.Product_Reference
from t
where exists (
select *
from t t2
where t2.Order_ID = t.Order_ID
group by order_id
having Min(Product_Reference) != Max(Product_Reference)
);
See this demo fiddle
CodePudding user response:
You could use STRING_AGG: https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-ver16
Here's an example:
SELECT orderID, STRING_AGG(productReference, ' ') products
FROM
(
SELECT DISTINCT orderID, productReference
FROM orderLines
) order_products
GROUP BY orderID
For each order ID, this pulls out the distinct products, then the STRING_AGG combines them into one field.
Output
orderID | products |
---|---|
1 | Banana |
2 | Apple Banana |
3 | Apple |
4 | Apple Banana Pear |
SQL fiddle example: http://sqlfiddle.com/#!18/8a677/6