Home > Blockchain >  SQL - Identify unique person in a customer table with several customer ids, phone numbers and email
SQL - Identify unique person in a customer table with several customer ids, phone numbers and email

Time:08-26

I have a customer table with a unique customers id, a customer phone number and a customer email. It could happen, that a unique person can have several customer id's, but the same email and/or the same phone number. Just like this:

| customer id  | customer email  | customer phone number  | person id  |
|:-------------|:----------------|:-----------------------|:-----------|
| 1            |[email protected]  | 0111 111111            | 1          |
| 2            |[email protected]  | 0999 999999            | 1          |
| 3            |[email protected]   | 0111 111111            | 1          | 
| 4            |[email protected]   | 0555 555555            | 1          |
| 5            |[email protected]   | 0555 555555            | 1          |
| 6            |[email protected] | 0333 333333            | 6          |
| 7            |[email protected] | 0333 333333            | 6          |

I would like to check with a SQL query in Snowflake, are there any mathes between email and phone number and create a new unique person id (e.g. customer ids 1 to 5 seem to be the same unique person because of the email and phone number and the customer id 6 and 7 are one unique person, because the phone numbers are the same). The smallest customer id of this unique person should always be taken as the person id.

How can I identify this way unique persons with a SQL query?

Thank you so much for your help & best Sophie

CodePudding user response:

This is a bit ugly, and may be improved with a window function or recursive CTE, but this is quick and dirty and generates the results you posted. It requires two passes over the data because your result set shows a transitive relationship on the data.

-- Test table
    insert into
        customer (id, email, phone, person_id)
    values
        (1, '[email protected]', '0111 111111', null),(2, '[email protected]', '0999 999999', null),
        (3, '[email protected]', '0111 111111', null),(4, '[email protected]', '0555 555555', null),
        (5, '[email protected]', '0555 555555', null),(6, '[email protected]', '0333 333333', null),(7, '[email protected]', '0333 333333', null);
 
-- CTE to reflect anchor query and 2 passes to produce the results
   
    with cte as (select
        A.id,
        A.email,
        A.phone,
        min(B.id) as min_id
    from
        customer A,
        customer B
    where
        (A.email = B.email)
        or (A.phone = B.phone)
    group by
        A.id,
        A.email,
        A.phone),
    t1 as (select email, case when min(id) < min(min_id) then min(id) else min(min_id) end min_id from cte
        group by email),
    t2 as (select phone, case when min(id) < min(min_id) then min(id) else min(min_id) end min_id from cte
        group by phone),
    t3 as     
        (select cte.id, cte.email, cte.phone, case when min(t1.min_id) < min(t2.min_id) then min(t1.min_id) else min(t2.min_id) end min_id
        from cte, t1, t2
        where (cte.email = t1.email) and (cte.phone = t2.phone)
        group by 1,2,3),
    t4 as (select email, case when min(id) < min(min_id) then min(id) else min(min_id) end min_id from t3
        group by email),
    t5 as (select phone, case when min(id) < min(min_id) then min(id) else min(min_id) end min_id from t3
        group by phone)
        
    select t3.id, t3.email, t3.phone, case when min(t4.min_id < t3.min_id) then min(t4.min_id) when min(t5.min_id < t3.min_id) then min(t5.min_id) else min(t3.min_id) end person_id
    from t3, t4, t5
    where (t4.email = t3.email) and (t5.phone = t3.phone)
    group by 1,2,3
        order by t3.id;

-- Results
ID  EMAIL               PHONE       PERSON_ID
1   [email protected]     0111 111111 1
2   [email protected]     0999 999999 1
3   [email protected]      0111 111111 1
4   [email protected]      0555 555555 1
5   [email protected]      0555 555555 1
6   [email protected]    0333 333333 6
7   [email protected]    0333 333333 6
  • Related