I currently have two queries that i've stitched together using views in Bigquery. I am wondering if there's a way to combine them into a single query and eliminate the need for multiple views.
The first query concatenates a few strings to create a field "id"
SELECT *, CONCAT(CAST(ga_sourcemedium AS string), CAST(ga_campaign AS string), CAST(ga_adcontent AS string), CAST(ga_country AS string), CAST(ga_region AS string), CAST(ga_devicecategory AS string), CAST(ga_date AS string)) AS id,FROM `table_name`
The second query de-dupes the records based on the id field.
SELECT DISTINCT orders.*
FROM `table_name` orders
INNER JOIN (
SELECT id,
MAX(_sdc_sequence) AS sequence
FROM `table_name`
GROUP BY id
) latest_orders
ON orders.id = latest_orders.id
AND orders._sdc_sequence = latest_orders.sequence
thank you in advance
CodePudding user response:
With BigQuery, you can use the “UNION ALL” statement, but you need to follow some rules in both queries to use it.
The rules are:
- Both queries must have the same number of columns.
- Both queries must have the same data type.
You can see this sample code using “UNION ALL:”
select '1','11','3'
union all
select '3','4','5'
Both have the same number of columns and the same data type.
In this case, you can see a sample code that is wrong:
Select
TY_Sales ,
LY_Sales ,
Comp
From
[project.dataset.table]
UNION ALL
TY_Visits ,
LY_Visits ,
Comp
From
[project.dataset.table]
UNION ALL
Select
TY_Orders ,
LY_Orders ,
Comp
From
[project.dataset.table]
The query is invalid since all the queries contain the same number of columns, but columns are of different data types (first two are integer and comp as string). So we have to convert it into the same data type.
You can see more information about “UNION ALL” in BigQuery.
CodePudding user response:
You can just use Common Table Expressions so that the result is top-down and easy to understand:
WITH orders AS (
SELECT
*,
CONCAT(CAST(ga_sourcemedium AS STRING), CAST(ga_campaign AS STRING),
CAST(ga_adcontent AS STRING), CAST(ga_country AS STRING),
CAST(ga_region AS STRING), CAST(ga_devicecategory AS STRING),
CAST(ga_date AS STRING)) AS id
FROM
`table_name`
),
latest_orders AS (
SELECT
id,
MAX(_sdc_sequence) AS sequence
FROM
orders
GROUP BY
1
),
final AS (
SELECT DISTINCT
o.*
FROM
orders AS o
INNER JOIN
latest_orders AS lo ON o.id = lo.id AND o._sdc_sequence = lo.sequence
)
SELECT * FROM final