I am working on accessing data in BigQuery with Python to do some data analysis. I access the data with a standard SQL query of:
"SELECT * FROM `project.dataset.table`"
I am using the same base code on multiple datasets so I took the approach of using environment variables for the project, dataset and table, giving me an actual query that looks like this:
f"SELECT * FROM `{PROJECT}.{DATASET}.{TABLE}`"
I did this in an effort to abstract my tables a little. I run bandit testing in my CI/CD pipeline and this query using variables is failing, suggesting possible injection. Now my query cannot be changed by user input as there are no points where I take user input to get to this query. I'm trying to figure out if this is a safe query to include in my code. I've attempted running more variables, less variables, using secret manager and all fail the bandit testing.
My gut is telling me that the usage of the variables "hides" some of my info since the table is at least separate from the query and that since no users can input anything there is no issue. But the failing test has me a bit concerned. Any thoughts on if this is safe?
CodePudding user response:
YES
since no users can input anything there is no issue
If an attacker gains access to your environment variables, they can use them to perform a SQL injection. This is privilege elevation or escalation.
Parameters normally don't work on identifiers such as table names, only values. You can still protect yourself by filtering the identifiers. Some libraries have a function to do this. At minimum, make sure they don't contain a `
.
Consider using a SQL builder which will take care of this for you.
CodePudding user response:
SQL injection is important
, because it allows the attacker to destroy and read sensitive data.
for your query you can: parameterized queries, Parameterized statements ensure that the parameters passed into the SQL statements are treated safely.
BigQuery supports query parameters to help prevent SQL injection when queries are constructed using user input. This feature is only available with standard SQL syntax.
#Example
client = bigquery.Client()
query = """
SELECT word, word_count
FROM `bigquery-public-data.samples.shakespeare`
WHERE corpus = @corpus
AND word_count >= @min_word_count
ORDER BY word_count DESC;
"""
job_config = bigquery.QueryJobConfig(
query_parameters=[
bigquery.ScalarQueryParameter("corpus", "STRING", "romeoandjuliet"),
bigquery.ScalarQueryParameter("min_word_count", "INT64", 250),
]
)
query_job = client.query(query, job_config=job_config) # Make an API request.