Home > Mobile >  Select rows where the combination of two columns is unique and we only display rows where the first
Select rows where the combination of two columns is unique and we only display rows where the first

Time:01-07

I have an order line table that looks like this:

ID Order ID Product Reference Variant
1 1 Banana Green
2 1 Banana Yellow
3 2 Apple Green
4 2 Banana Brown
5 3 Apple Red
6 3 Apple Yellow
7 4 Apple Yellow
8 4 Banana Green
9 4 Banana Yellow
10 4 Pear Green
11 4 Pear Green
12 4 Pear Green

I want to know how often people place an order with a combination of different fruit products. I want to know the orderId for that situation and which productReference was combined in the orders.

I only care about the product, not the variant.

I would imagine the desired output looking like this - a simple table output that gives insight in what product combos are ordered:

Order ID Product
2 Banana
2 Apple
4 Banana
4 Apple
4 Pear

I just need data output of the combination Banana Apple and Banana Apple Pear happening so I can get more insight in the frequency of how often this happens. We expect most of our customers to only order Apple, Banana or Pear products, but that assumption needs to be verified.

Problem

I kind of get stuck after the first step.

select orderId, productReference, count(*) as amount
from OrderLines
group by orderId, productReference

This outputs:

Order ID Product Reference amount
1 Banana 2
2 Apple 1
2 Banana 1
3 Apple 2
4 Apple 1
4 Banana 2
4 Pear 3

I just don't know how to move on from this step to get the data I want.

CodePudding user response:

You can use a window count() over()

select * 
from
(   
    select orderId, productReference, count(*) as amount
       , count(productReference) over(partition by orderId) np
    from OrderLines
    group by orderId, productReference    
) t
where np > 1

CodePudding user response:

You need only the rows where an Order_Id has different products; you can do this many ways.

One way is to aggregate and filter to only rows where the min product <> the max product, then use a correlation to find matching orders:

select distinct t.Order_ID, t.Product_Reference
from t
where exists (
    select *
    from t t2
    where t2.Order_ID = t.Order_ID
    group by order_id
    having Min(Product_Reference) != Max(Product_Reference)
);

See this demo fiddle

CodePudding user response:

You could use STRING_AGG: https://learn.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql?view=sql-server-ver16

Here's an example:

SELECT orderID, STRING_AGG(productReference, ' ') products
FROM  
    ( 
      SELECT DISTINCT orderID, productReference
      FROM orderLines        
    ) order_products
GROUP BY orderID

For each order ID, this pulls out the distinct products, then the STRING_AGG combines them into one field.

Output

orderID products
1 Banana
2 Apple Banana
3 Apple
4 Apple Banana Pear

SQL fiddle example: http://sqlfiddle.com/#!18/8a677/6

  • Related