I have an already simplified table of transactions with the customers and the unique items they purchased. Example:
| Customer email | Item |
| ---------------- | ---------------- |
| First | row |
| [email protected] | 111 |
| [email protected] | 112 |
| [email protected] | 113 |
| [email protected] | 111 |
| [email protected] | 112 |
| [email protected] | 110 |
| [email protected] | 111 |
| [email protected] | 113 |
I want to get a list of popular pairs (combinations) within one client with a number of occurrences:
| item1 | item2 | number of occurrences|
| --------| ----- | -------------------- |
| '111' | '112' | 2 |
| '111' | '113' | 2 |
| '112' | '113' | 1 |
| '110' | '111' | 1 |
| '110' | '113' | 1 |
Is it possible to achieve using SQL? Or I should use something else.
Many thanks for your help in advance.
CodePudding user response:
Consider below query:
WITH purchased AS (
SELECT customer_email, ARRAY_AGG(DISTINCT item ORDER BY item) items
FROM sample GROUP BY 1
),
associations AS (
SELECT customer_email, STRUCT(first, second) AS item_pair
FROM purchased,
UNNEST(items) first WITH OFFSET o1
JOIN UNNEST(items) second WITH OFFSET o2 ON o1 < o2
)
SELECT FORMAT('%t', item_pair) item_pair, COUNT(1) cnt
FROM associations
GROUP BY 1 ORDER BY 2 DESC;
output will be:
with sample:
CREATE TEMP TABLE sample AS
SELECT '[email protected]' customer_email, '111' item UNION ALL
SELECT '[email protected]', '112' UNION ALL
SELECT '[email protected]', '113' UNION ALL
SELECT '[email protected]', '111' UNION ALL
SELECT '[email protected]', '112' UNION ALL
SELECT '[email protected]', '110' UNION ALL
SELECT '[email protected]', '111' UNION ALL
SELECT '[email protected]', '113';