Home > Mobile >  Google Big query different result based on same date filter
Google Big query different result based on same date filter

Time:06-05

Edit 1: so the issue is '<=' is acting as '<' in google query which is strange. But '>=' acts normally. Any idea why this is happening?

Goal: to get data for May 2019.

Info about database here: enter image description here

Query 2 uses timestamp >= '2019-05-01' AND timestamp <= '2019-05-31'

SELECT file.project AS package, COUNT(file.project) AS installs, FORMAT_DATETIME('%Y-%m', timestamp) AS month
FROM `bigquery-public-data.pypi.file_downloads` 
WHERE timestamp >= '2019-05-01' AND timestamp <= '2019-05-31'
GROUP BY month, package;

enter image description here

Both query one and two should scan same amount of data - May 2019 but both query gives different results and scans different amount of data as you can see in attached images.

Which one is correct and why both are not matching?

CodePudding user response:

The two filters are different, you can simply check the difference in the result by the below script.

Differences
SELECT timestamp, FORMAT_DATETIME('%Y-%m', timestamp) AS month
FROM `bigquery-public-data.pypi.file_downloads` 
WHERE
    timestamp > '2019-04-30' AND timestamp < '2019-06-01'
    AND NOT (timestamp >= '2019-05-01' AND timestamp <= '2019-05-31')
;
Results

enter image description here

Personal Preference
SELECT file.project AS package, COUNT(file.project) AS installs, FORMAT_DATETIME('%Y-%m', timestamp) AS month
FROM `bigquery-public-data.pypi.file_downloads`
WHERE timestamp BETWEEN '2019-05-01' AND '2019-05-31'

p.s. As you can check out in the enter image description here

SELECT EXTRACT(MONTH FROM timestamp) month, COUNT(1) cnt
  FROM `bigquery-public-data.pypi.file_downloads`
 WHERE timestamp BETWEEN  '2019-05-01' AND '2019-06-01' -- scan 22.57 GB
 GROUP BY 1
  • Related