how to read specific file while passing multiple parameters in python?-CodePudding

I am trying to read specific big query table by passing argument value in my function, so depends on which argument value I pass, my function should read specific datatable. To do so, I was able to write this in python, but I feel like this is not quite elegant. I tried to use *args and **kwargs, but seems **kwargs could be better way to handle this but my current code has some syntax error. Can anyone look into my code and suggest possible way of improving this function?

my current attempt:

this is my first function to read file based on parameter value:

# !pip install pandas-gbq
import pandas as pd

def _readbqTble1(project_name='fb', dataset_name='iris',which_dim='lifespan', which_src='asia'):
    def str_join(*args):
        return '_'.join(map(str, args))

    table_name =  str_join('inst_pic',{*which_src},{*which_dim})
    query = "select * from `{project_name}.{dataset_name}.{table_name}`".format(project_name=project_name,dataset_name=dataset_name,table_name=table_name)
    df = pd.read_gbq(query,project_id=project_name,dialect='standard')
    return df

but this line give me error when I call the function: table_name = str_join('inst_pic',{*which_src},{*which_dim}). to me, logic is fine, *args should give is parameter value, so that way we can concatenate the string, I got syntax error instead.

this is my another approach by using **kwargs:

def _readbqTble2(**kwargs):
    def str_join(*args):
        return '_'.join(map(str, args))
    for _, v in kwargs.items():
        table_name=str_join('inst_pic',v)

    for k, v in kwargs.items():
        query = "select * from `{k}.{k}.{table_name}`".format(kwargs,table_name=table_name)
        df = pd.read_gbq(query,project_id=project_name,dialect='standard')
        return df

if __name__=="__main__":
    _readbqTble2(project_name='fb', dataset_name='iris', which_dim='lifespan', which_src='asia')

    _readbqTble1

but this line query = "select * from {k}.{k}.{table_name}".format(kwargs,table_name=table_name) also give me error instead. I feel like **kwargs could be better way here. Can anyone point me out what went wrong here? what's the best way to read data based on parameter value in python?

use case:

I have those tables names listed in bigquery such as:

fb.iris.inst_pic_asia_lifespan
fb.iris.inst_pic_asia_others
fb.iris.inst_pic_europe_lifespan
fb.iris.inst_pic_europe_others

basically, I want to read specific table depends on the argument that we passed. I want my function be more parametric so can handle any file I want to read based on argument values that we passed.

Can anyone suggest any elegant way of doing this?

CodePudding user response：

You have confusion over the how *args and **kwargs works in general. Please see this article to help you demystify. You can also see this SO answer: here

Working version of your code with inline comments:

def _readbqTble1(which_dim="lifespan", which_src="asia", **kwargs):
    def str_join(*args):
        return "_".join(map(str, args))

    # `*args` takes all arguments in the call-list, and puts it in a tuple
    # Thus, you shouldn't wrap your `which_src` in a set like `{which_src}`,
    # just simply pass the string in the call-list.
    table_name = str_join("inst_pic", which_src, which_dim)
    
    # `**kwargs` will "unpack" all kv's in the call-list to `_readbqTble1` as if
    # they were passed as k=v arguments to string.format below.
    query = "select * from `{project_name}.{dataset_name}.{table_name}`".format(
        table_name=table_name,
        **kwargs,
    )
    df = pd.read_gbq(query, project_id=kwargs.pop("project_name"), dialect="standard")
    return df