I have a data set like below:
transaction_id | store_id | product_id | product_name |
---|---|---|---|
1 | 100 | p001 | product_1 |
1 | 100 | p002 | product_2 |
1 | 100 | p003 | product_3 |
4 | 100 | p002 | product_2 |
4 | 100 | p003 | product_3 |
5 | 100 | p002 | product_2 |
5 | 100 | p003 | product_3 |
7 | 100 | p001 | product_1 |
7 | 100 | p003 | product_3 |
8 | 101 | p002 | product_2 |
8 | 101 | p003 | product_3 |
9 | 101 | p001 | product_1 |
9 | 101 | p002 | product_2 |
2 | 101 | p001 | product_1 |
3 | 101 | p002 | product_2 |
3 | 101 | p003 | product_3 |
6 | 101 | p001 | product_1 |
6 | 101 | p002 | product_2 |
I am trying to find a query to give output as below.
store | freq_prod_ids | count_of_transactions |
---|---|---|
100 | p002, p003 | 3 |
100 | p001, p003 | 2 |
101 | p001, p002 | 2 |
101 | p002, p003 | 2 |
This essentially should give the top 2 [N=2] frequently bought product combinations in a single transaction for each store.
Please help to have an SQL query to get this response.
CodePudding user response:
You may try the following which performs a self-join before aggregating based on store and product pairs. The row_number
is used to retrieve the top 2 product pairs per store.
SELECT
store_id, freq_prod_ids,count_of_transactions
FROM (
SELECT
t1.store_id,
CONCAT(t1.product_id,', ',t2.product_id) as freq_prod_ids,
COUNT(1) as count_of_transactions,
ROW_NUMBER() OVER (PARTITION BY t1.store_id ORDER BY COUNT(1) DESC) as rn
FROM my_table t1
INNER JOIN my_table t2 on t1.store_id = t2.store_id and
t1.product_id < t2.product_id and
t1.transaction_id = t2.transaction_id
GROUP BY t1.store_id,CONCAT(t1.product_id,', ',t2.product_id)
) t3 WHERE rn <=2
Let me know if this works for you.