Home > other >  BigQuery: Return data for one specific data source after certain date
BigQuery: Return data for one specific data source after certain date

Time:07-20

I have a question and wonder if anyone can help. I'm trying to return engagements of different data sources (please see below to see how I generated values under data source on BQ), please see the screenshot as an example. For data source 1234 I need to return everything, however for data source 5678 I want the query to only return data after 5/22/22 (see "after" as the expected outputs). Is there a way to do so? Many thanks!

example

Here is how I got the data source from BQ:

DECLARE data_set_list ARRAY<STRING>

DEFAULT ['1234','5678'];


CREATE OR REPLACE TABLE `XXX.XXX.719`  (Data_source STRING, Date DATE, device STRING);

FOR data_set IN (SELECT * FROM UNNEST(data_set_list) as id) DO

EXECUTE IMMEDIATE

FORMAT(

"""

INSERT INTO `XXX.719` 

CodePudding user response:

Can you try this query (I ran this query in BigQuery and sampleData as sample table):

Emulated Table:

WITH sampleData AS
 (SELECT '1234' as dataSource, date('2022-05-21') as Date, 'mobile' as device UNION ALL
  SELECT '1234', date('2022-05-22'), 'desktop' UNION ALL
  SELECT '5678', date('2022-05-22'), 'mobile' UNION ALL
  SELECT '5678', date('2022-05-23'), 'mobile' )

Query:

select * from sampleData where dataSource = "1234"
UNION ALL
select * from sampleData where Date > date('2022-05-22') and dataSource = "5678" ;

With the output below:

enter image description here

For reference you can view query syntax for BigQuery here.

CodePudding user response:

I believe you are trying to find: per Datasource, unique device on latest date. If so, this query might help.

Sample data and codes given below. Sample data is your raw table and second block is actual code to get the desired result.


WITH sampleData AS
 (SELECT '1234' as DataSource, date('2022-05-21') as Date, 'mobile' as device 
   UNION ALL
  SELECT '1234', date('2022-05-22'), 'desktop' UNION ALL
  SELECT '5678', date('2022-05-22'), 'mobile' UNION ALL
  SELECT '5678', date('2022-05-23'), 'mobile' )

select * except(rank) from (
   select *, rank() over (partition by DataSource, device order by Date desc) rank 
   from sampleData)
where rank = 1
order by DataSource, Date;

  • Related