I'm in the process of analyzing costs that are produced by the usage of Data Studio (now Looker Studio). For that I'm looking at the BigQuery audit logs.
I can find the query that Data Studio is firing, which may look something like this:
SELECT * FROM (
SELECT clmn10_, SUM(clmn26_) AS clmn100000_ FROM (
SELECT * FROM (
SELECT t0.customer AS clmn10_, t0.my_metric1 AS clmn26_, t0.my_dim2 AS clmn39_, t0.product AS clmn6_ FROM (
SELECT *
FROM `mytable`
WHERE date(_PARTITIONTIME) >= PARSE_DATE('%Y%m%d', @DS_START_DATE)
AND date(_PARTITIONTIME) <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
) t0
) WHERE ((clmn6_ = "value1") AND (clmn39_ = false))
) GROUP BY clmn10_
) LIMIT 20000000
Now that's a great start, but I would like to better understand the usage of @DS_START_DATE and @DS_END_DATE.
Strangely enough these variables show up in the logs like above, when I would have expected them to show the resolved values (i.e., the actual start and end dates used).
Is there a way to get the values of these variables for the queries?
Any hints are greatly appreciated.
CodePudding user response:
You can extract the time range of the customized Looker Studio by
looking at the INFORMATION_SCHEMA.JOBS
and the job_stages
column. After several unnest operation, you can extract the where
condition. I always use i n Looker Studio the where date(column) between ... and ...
.
SELECT * except(substeps),
DATE_FROM_UNIX_DATE(safe_cast(split(substep,",")[safe_offset(1)] as int64)) as from_date,
DATE_FROM_UNIX_DATE(safe_cast(replace(split(substep,",")[safe_offset(2)],")","") as int64)) as to_date
from (
select
end_time,
stage.*,
array_to_string(substeps, " ") as query
from `region-eu`.INFORMATION_SCHEMA.JOBS, unnest(job_stages) as job_stage,unnest(job_stage.steps) as stage
where stage.kind="READ"
), unnest(substeps) as substep
where substep like "