I have a question and wonder if anyone can help. I'm trying to return engagements of different data sources (please see below to see how I generated values under data source on BQ), please see the screenshot as an example. For data source 1234 I need to return everything, however for data source 5678 I want the query to only return data after 5/22/22 (see "after" as the expected outputs). Is there a way to do so? Many thanks!
Here is how I got the data source from BQ:
DECLARE data_set_list ARRAY<STRING>
DEFAULT ['1234','5678'];
CREATE OR REPLACE TABLE `XXX.XXX.719` (Data_source STRING, Date DATE, device STRING);
FOR data_set IN (SELECT * FROM UNNEST(data_set_list) as id) DO
EXECUTE IMMEDIATE
FORMAT(
"""
INSERT INTO `XXX.719`
CodePudding user response:
Can you try this query (I ran this query in BigQuery and sampleData as sample table):
Emulated Table:
WITH sampleData AS
(SELECT '1234' as dataSource, date('2022-05-21') as Date, 'mobile' as device UNION ALL
SELECT '1234', date('2022-05-22'), 'desktop' UNION ALL
SELECT '5678', date('2022-05-22'), 'mobile' UNION ALL
SELECT '5678', date('2022-05-23'), 'mobile' )
Query:
select * from sampleData where dataSource = "1234"
UNION ALL
select * from sampleData where Date > date('2022-05-22') and dataSource = "5678" ;
With the output below:
For reference you can view query syntax for BigQuery here.
CodePudding user response:
I believe you are trying to find: per Datasource, unique device on latest date. If so, this query might help.
Sample data and codes given below. Sample data is your raw table and second block is actual code to get the desired result.
WITH sampleData AS
(SELECT '1234' as DataSource, date('2022-05-21') as Date, 'mobile' as device
UNION ALL
SELECT '1234', date('2022-05-22'), 'desktop' UNION ALL
SELECT '5678', date('2022-05-22'), 'mobile' UNION ALL
SELECT '5678', date('2022-05-23'), 'mobile' )
select * except(rank) from (
select *, rank() over (partition by DataSource, device order by Date desc) rank
from sampleData)
where rank = 1
order by DataSource, Date;