Home > Software engineering >  Google Bigquery SQL with clause `where 1=0` is not returning correct schema
Google Bigquery SQL with clause `where 1=0` is not returning correct schema

Time:08-04

I recently added a new column in my BigQuery Table. The following code snippet is used in legacy code to determine the table schema

df = gbq.read_gbq('SELECT * FROM {}.{} where 1=0'.format(BIGQUERY_DATASET_NAME, table), project_id=project_id)

But the problem is that it is not returning the newly added column in the df. Although when I use some other condition like 1=3 in where clause or limit 0 then it returns the correct schema.

Trying to understand what is causing the issue.

CodePudding user response:

If you want to get the column names - which I assume is the point of this - and can change the legacy code, perhaps a better approach would be to get it directly from the INFORMATION_SCHEMA view.

An example would be as follows:

schema_query = f"""
SELECT column_name
  FROM {BIGQUERY_DATASET_NAME}.INFORMATION_SCHEMA.COLUMNS
 WHERE table_name = '{table}'
"""

df = gbq.read_gbq(schema_query, project_id=project_id)

(if using python <3.6, revert to the .format syntax of course)

I have a hunch that this will avoid the problem you're encountering using the legacy code you're working with. I agree with the other answer that it's quite possible you're seeing cached results.

CodePudding user response:

I'm guessing, it has to do with caching , you can always ask BQ not to use caching :

df =  gbq.read_gbq('SELECT * FROM {}.{} where 1=0'.format(...)
     , configuration = {'query': {'useQueryCache': False}} )
  • Related