If I run this two queries separately and combined the output in excel it gives me the correct results
select
DISTINCT userid
from
table1
where
event like '%signup%'
and date(root_tstamp) between '2022-02-01'
and '2022-02-07'
and userid is not null and userid <> ''),
and
select
DISTINCT user_id,
geo_country
from
table2
where
geo_country is not null
and geo_country <> ''
and user_id is not null
and user_id <> ''
and date(etl_tstsmp) between '2022-02-01'
and '2022-02-07'
But if I ran it on as a single query with CTE it gives me the wrong results as following query
with base1 as ( select
DISTINCT userid
from
table1
where
event like '%signups%'
and date(root_tstamp) between '2022-02-01'
and '2022-02-07'
and userid is not null and userid <> ''),
base2 as (
select
DISTINCT user_id,
geo_country
from
table2
where
geo_country is not null
and geo_country <> ''
and user_id is not null
and user_id <> ''
and date(etl_tstamp) between '2022-02-01'
and '2022-02-07'
)
select
base2.geo_country,
base1.userid
from base1
inner join base2 on base1.userid = base2.user_id
any suggestion
CodePudding user response:
Based on your comments I suspect you need to change your base2 query to an aggregation, something like
select user_id, max(geo_country) geo_country
from table2
where
geo_country is not null
and geo_country <> ''
and user_id is not null
and user_id <> ''
and date(etl_tstamp) between '2022-02-01' and '2022-02-07'
group by user_id
There is also no need for joining two separate CTEs, you can simply use the above query and implement an exists for your first query, correlating on user_id, ie,
select user_id, max(geo_country) geo_country
from table2 t2
where <criteria>
and exists (
select * from table1 t1
where t1.user_id=t2.user_id and <base1 criteria>
)
group by user_id