Home > database >  SQL giving me different results
SQL giving me different results

Time:02-10

If I run this two queries separately and combined the output in excel it gives me the correct results

select
    DISTINCT userid
  from
    table1
  where
    event like '%signup%'
    and date(root_tstamp) between '2022-02-01'
    and '2022-02-07'
    and userid is not null and userid <> ''),

and

    select
        DISTINCT user_id,
        geo_country
      from
        table2
      where
        geo_country is not null
        and geo_country <> ''
        and user_id is not null
        and user_id <> ''
        and date(etl_tstsmp) between '2022-02-01'
        and '2022-02-07'

But if I ran it on as a single query with CTE it gives me the wrong results as following query

with base1 as ( select
    DISTINCT userid
  from
    table1
  where
    event like '%signups%'
    and date(root_tstamp) between '2022-02-01'
    and '2022-02-07'
    and userid is not null and userid <> ''),
base2 as (
  select
    DISTINCT user_id,
    geo_country
  from
    table2
  where
    geo_country is not null
    and geo_country <> ''
    and user_id is not null
    and user_id <> ''
    and date(etl_tstamp) between '2022-02-01'
    and '2022-02-07'
)
select
base2.geo_country,
base1.userid
from base1
inner join base2 on base1.userid = base2.user_id

any suggestion

CodePudding user response:

Based on your comments I suspect you need to change your base2 query to an aggregation, something like

select user_id, max(geo_country) geo_country
  from table2
  where
    geo_country is not null
    and geo_country <> ''
    and user_id is not null
    and user_id <> ''
    and date(etl_tstamp) between '2022-02-01' and '2022-02-07'
  group by user_id

There is also no need for joining two separate CTEs, you can simply use the above query and implement an exists for your first query, correlating on user_id, ie,

select user_id, max(geo_country) geo_country
from table2 t2
where <criteria>
and exists (
  select * from table1 t1
  where t1.user_id=t2.user_id and <base1 criteria>
)
group by user_id
  • Related