Home > Mobile >  Combine 2 sql queries in Bigquery
Combine 2 sql queries in Bigquery

Time:10-08

I currently have two queries that i've stitched together using views in Bigquery. I am wondering if there's a way to combine them into a single query and eliminate the need for multiple views.

The first query concatenates a few strings to create a field "id"

SELECT *,  CONCAT(CAST(ga_sourcemedium AS string),   CAST(ga_campaign AS string),   CAST(ga_adcontent AS string),   CAST(ga_country AS string),   CAST(ga_region AS string), CAST(ga_devicecategory AS string),   CAST(ga_date AS string)) AS id,FROM  `table_name`

The second query de-dupes the records based on the id field.

SELECT DISTINCT orders.*
       FROM `table_name` orders
 INNER JOIN (
              SELECT id,
                     MAX(_sdc_sequence) AS sequence
                FROM `table_name`
              GROUP BY id
            ) latest_orders
         ON orders.id = latest_orders.id
        AND orders._sdc_sequence = latest_orders.sequence

thank you in advance

CodePudding user response:

With BigQuery, you can use the “UNION ALL” statement, but you need to follow some rules in both queries to use it.

The rules are:

  • Both queries must have the same number of columns.
  • Both queries must have the same data type.

You can see this sample code using “UNION ALL:”

   select '1','11','3'
   union all
   select '3','4','5'

Both have the same number of columns and the same data type.

In this case, you can see a sample code that is wrong:

Select 
TY_Sales ,
LY_Sales ,
Comp
From
[project.dataset.table]
 
UNION ALL
 
TY_Visits ,
LY_Visits ,
Comp
From
[project.dataset.table]
 
UNION ALL
 
Select 
TY_Orders ,
LY_Orders ,
Comp
From
[project.dataset.table]

The query is invalid since all the queries contain the same number of columns, but columns are of different data types (first two are integer and comp as string). So we have to convert it into the same data type.

You can see more information about “UNION ALL” in BigQuery.

CodePudding user response:

You can just use Common Table Expressions so that the result is top-down and easy to understand:

WITH orders AS (
  SELECT 
    *,  
    CONCAT(CAST(ga_sourcemedium AS STRING), CAST(ga_campaign AS STRING),   
    CAST(ga_adcontent AS STRING),   CAST(ga_country AS STRING),   
    CAST(ga_region AS STRING), CAST(ga_devicecategory AS STRING), 
    CAST(ga_date AS STRING)) AS id
  FROM  
    `table_name`
),
latest_orders AS (
  SELECT
    id,
    MAX(_sdc_sequence) AS sequence
  FROM
    orders
  GROUP BY
    1
),
final AS (
  SELECT DISTINCT
    o.*
  FROM
    orders AS o
  INNER JOIN
    latest_orders AS lo ON o.id = lo.id AND o._sdc_sequence = lo.sequence
)
SELECT * FROM final
  • Related