Home > OS >  Prevent SQL Injection in BigQuery with Python for table name
Prevent SQL Injection in BigQuery with Python for table name

Time:03-25

I have an Airflow DAG which takes an argument from the user for table.

I then use this value in an SQL statement and execute it in BigQuery. I'm worried about exposing myself to SQL Injection.

Here is the code:

sql = f"""
        CREATE OR REPLACE TABLE {PROJECT}.{dataset}.{table} PARTITION BY DATE(start_time) as (
            //OTHER CODE
        )
        """

client = bigquery.Client()
query_job = client.query(sql)

Both dataset and table get passed through via airflow but I'm worried someone could pass through something like: random_table; truncate other_tbl; -- as the table argument.

My fear is that the above will create a table called random_table and then truncate an existing table.

Is there a safer way to process these passed through arguments?

I've looked into parameterized queries in BigQuery but these don't work for table names.

CodePudding user response:

You will have to create a table name validator. I think you can safely validate by using just backticks --> ` at the start and at the end of your table name string. It's not a 100% solution but it worked for some of my test scenarios I try. It should work like this:

# validate should look for ` at the beginning and end of your tablename
table_name = validate(f"`{project}.{dataset}.{table}`")

sql = f"""
        CREATE OR REPLACE TABLE {table_name} PARTITION BY DATE(start_time) as (
            //OTHER CODE
        )
        """
...

Note: I suggest you to check the following post on medium site to check about bigquery sql injection.

I checked the official documentation about Running parameterized queries, and sadly it only covers the parameterization of variables not tables or other string part of your query.

As a final note, I recommend to open a feature request for BigQuery for this particular scenario.

CodePudding user response:

You should probably look into sanitization/validation of user input in general. This is done before passing the input to the BQ query.

With Python, you could look for malicious strings in the user input - like truncate in your example - or use a regex to filter input that for instance contains --. Those are just some quick examples. I recommend you do more research on that topic; you will also find quite a few questions on that topic on SE.

  • Related