Home > Software engineering >  Find popular pairs from table with transcations
Find popular pairs from table with transcations

Time:06-29

I have an already simplified table of transactions with the customers and the unique items they purchased. Example:

| Customer email   | Item             |
| ---------------- | ---------------- |
| First            | row              |
| [email protected]    | 111              |
| [email protected]    | 112              |
| [email protected]    | 113              |
| [email protected]      | 111              |
| [email protected]      | 112              |
| [email protected]        | 110              |
| [email protected]        | 111              | 
| [email protected]        | 113              |

I want to get a list of popular pairs (combinations) within one client with a number of occurrences:

| item1   | item2 | number of occurrences|              
| --------| ----- | -------------------- |
| '111'   | '112' | 2                    |
| '111'   | '113' | 2                    |
| '112'   | '113' | 1                    |
| '110'   | '111' | 1                    |
| '110'   | '113' | 1                    |

Is it possible to achieve using SQL? Or I should use something else.

Many thanks for your help in advance.

CodePudding user response:

Consider below query:

WITH purchased AS (
  SELECT customer_email, ARRAY_AGG(DISTINCT item ORDER BY item) items
    FROM sample GROUP BY 1
),
associations AS (
  SELECT customer_email, STRUCT(first, second) AS item_pair
    FROM purchased,
         UNNEST(items) first WITH OFFSET o1
    JOIN UNNEST(items) second WITH OFFSET o2 ON o1 < o2
)
SELECT FORMAT('%t', item_pair) item_pair, COUNT(1) cnt
  FROM associations
 GROUP BY 1 ORDER BY 2 DESC;

output will be:

enter image description here

with sample:

CREATE TEMP TABLE sample AS
SELECT '[email protected]' customer_email, '111' item UNION ALL
SELECT '[email protected]', '112' UNION ALL
SELECT '[email protected]', '113' UNION ALL
SELECT '[email protected]', '111' UNION ALL
SELECT '[email protected]', '112' UNION ALL
SELECT '[email protected]', '110' UNION ALL
SELECT '[email protected]', '111' UNION ALL
SELECT '[email protected]', '113';
  • Related