Home > database >  Pass a Python Variable containing multiple ID Numbers into external BigQuery Script
Pass a Python Variable containing multiple ID Numbers into external BigQuery Script

Time:11-23

I have created a python class, and one of my methods is meant to take in either a single ID number or a list of ID numbers. The function will then use the ID numbers to query from a table in BigQuery using a .sql script. Currently, the function works fine for a single ID number using the following:

def state_data(self, state, id_number):

    if state == 'NY': 
        sql_script = self.sql_scripts['get_data_ny']
    else:
        sql_script = self.sql_scripts['get_data_rest']
         
    sql_script = sql_script.replace('@@id_number@@', id_number)

I'm having issues with passing in multiple ID numbers at once. There are 3 different ways that I've tried without success:

  • The above method, passing in the multiple ID numbers as a tuple to use with WHERE ID_NUM IN('@@id_number@@'). This doesn't work, as when the .sql script gets called, a syntax error is returned, as parentheses and quotes are automatically added. For example, the SQL statement attempts to run as WHERE ID_NUM IN('('123', '124')'). This would run fine without one of the two sets of parentheses and quotes, but no matter what I try to pass in, they always get added.
  • The second technique I have tried is to create a table, populate it with the passed in ID numbers, and then join with the larger table in BQ. It goes as follows:
CREATE OR REPLACE TABLE ID_Numbers
(
     ID_Number STRING
);

INSERT INTO ID_Numbers (ID_Number)
VALUES ('@@id_number@@');

-- rest of script is a simple left join of the above created table with the BQ table containing the data for each ID

This again works fine for single ID numbers, but passing in multiple VALUES (in this case ID Numbers) would require a ('@@id_number@@') per unique ID. One thing that I have not yet attempted - to assign a variable to each unique ID and pass each one in as a new VALUE. I am not sure if this technique will work.

  • The third technique I've tried is to include the full SQL query in the function, rather than calling a .sql script. The list of ID numbers get passed in as tuple, and the query goes as follows:
id_nums = tuple(id_number)
query = ("""SELECT * FROM `data_table`
WHERE ID_NUM IN{}""").format(id_nums)

This technique also does not work, as I get the following error: AttributeError: 'QueryJob' object has no attribute 'format'. I've attempted to look into this error but I cannot find anything that helps me out effectively.

Finally, I'll note that none of the posts asking the same or similar questions have solved my issues so far.

I am looking for any and all advice for a way that I can successfully pass a variable containing multiple ID numbers into my function that ultimately calls and runs a BQ query.

CodePudding user response:

You should be able to use *args to get the id_numbers as a sequence and f-strings and str.join() to build the SQL query:

class MyClass:

    def state_data(self, state, *id_numbers):
        print(f"{state=}")
        query = f"""
            SELECT * FROM `data_table`
            WHERE ID_NUM IN ({", ".join(str(id_number) for id_number in id_numbers)})
        """
        print(query)


my_class = MyClass()

my_class.state_data("some state", 123)
my_class.state_data("some more state", 123, 124)

On my machine, this prints:

sql python main.py
state='some state'

            SELECT * FROM `data_table`
            WHERE ID_NUM IN (123)
        
state='some more state'

            SELECT * FROM `data_table`
            WHERE ID_NUM IN (123, 124)
  • Related