How to find the approriate SQL syntax for Snowflake environment?-CodePudding

Here is how my data is configured in my order table:

I have 3 fields: ORDER_ID, that is the unique field per ORDER (only appears once in the whole table), USER_KEY, that is the unique field per BUYER (buyers can place multiple orders and can appear multiple times in the order table), and FRUIT, that identifies the item that has been bought per order.

I need to identify the buyers (USER_KEY), that have purchases multiples of the same products.

I have the following query to identify these buyers:

      select 
        t.user_key from temp t 
inner join (
        SELECT user_key,
        Count(order_id) as [Minimum of 2 Count] from temp group by user_key, fruit 
    ) it on t.user_key = it.user_key and it.[Minimum of 2 Count] > 1
    group by t.user_key;

This query however does not work in SNOWFLAKE. Does anyone know how I can tweak the syntax of this query to work in Snowflake?

Thank you!

CodePudding user response：

The identifiers are quoted with " - [] are probably Sybase or T-SQL syntax:

select  t.user_key from temp t 
inner join (
        SELECT user_key,
        Count(order_id) as "Minimum of 2 Count" from temp group by user_key, fruit 
    ) it on t.user_key = it.user_key and it."Minimum of 2 Count" > 1
    group by t.user_key;

Anyway the query coud be rewritten:

SELECT DISTINCT user_key
FROM temp t
QUALIFY COUNT(order_id) OVER(PARTITION BY user_key, fruit) > 1;

or:

SELECT DISTINCT user_key
FROM temp 
GROUP BY user_key, fruit
HAVING COUNT(ORDER_ID) > 1;

CodePudding user response：

First get all your results where user and fruit are the same and count those results.

You then filter the result in an outer query:

SELECT 
    user_key
FROM 
(
   SELECT
      t1.user_key as user_key,
      count(t1.user_key) as num_user
   FROM
      temp t1
      temp t2
   WHERE
      t1.user_key = t2.user_key
      AND t1.fruit = t2.fruit
   GROUP BY
      t1.user_key
)
WHERE
  num_user > 2

CodePudding user response：

We can use either windows function or group based on what columns you are looking:

 select user_key, fruit, count(1) from Test1
group by user_key, fruit
having count(1)>1;