It needs to pull data from BigQuery to Google Sheets every Sunday at 11:59 pm for the last 21 days.
Here's the code I wrote to find out Sunday.
SELECT
EXTRACT(YEAR FROM date) AS year,
week,
name,
parent_name,
sum(total_sales) as total_revenue,
title
FROM partitioned_ts
WHERE
id=14213
and
total_sales>0
and
date BETWEEN DATE_SUB(CURRENT_DATE(),
INTERVAL
IF( EXTRACT(DAYOFWEEK FROM CURRENT_DATE()) = 1,
6,
EXTRACT(DAYOFWEEK FROM CURRENT_DATE()) - 1)
DAY) - INTERVAL 21 DAY AND DATE_SUB(CURRENT_DATE(),
INTERVAL
IF( EXTRACT(DAYOFWEEK FROM CURRENT_DATE()) = 1,
6,
EXTRACT(DAYOFWEEK FROM CURRENT_DATE()) - 1)
DAY)
GROUP BY name, parent_name, week, year, title
order by year DESC, week DESC, name
This query will process 85.31 GB when run. The old version of the query calculates the last 21 days from today
date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 21 DAY) AND CURRENT_DATE(
and this query processes only 1.6 GB when run. How should I write the query to process lower data?
CodePudding user response:
The easiest way to filter out rows older than 21 days would be:
DATE_DIFF(CURRENT_DATE(), date, DAY) <= 21
If you need to reduce the scan size, the best approach is to partition the source table by date
https://cloud.google.com/bigquery/docs/querying-partitioned-tables#use_a_constant_filter_expression
CodePudding user response:
The following code works:
date BETWEEN DATE_SUB((DATE_TRUNC(CURRENT_DATE(), WEEK(Saturday))), INTERVAL 21 DAY) AND (DATE_TRUNC(CURRENT_DATE(), WEEK(Saturday)))