Home > Back-end >  How to perform variable assignment using dataframe name as variable
How to perform variable assignment using dataframe name as variable

Time:10-06

I have a written a python function that takes a data frame as one of the arguments. Below is the simplified version of the function:

def cat_update(df_to_update, df_source, cat_lst, con_lst):
    try:
        for cat, con in itertools.product(cat_lst, con_lst):
            df_to_update.at[cat,  con] = df_source.at[cat,  con]

Below is how I am calling this function:

cat_update(df_templete1, raw_source, cat_lst, con_lst)

Now, I need to scale my code where there can be multiple source data frames (raw_source)

How do I specify a variable here so that instead of specifying the actual data frame value I can change it as per the requirement?

I tried specifying assigning the value of the variable as follows:

raw_source = 'df_source_1'

But in this case, it goes as a string and not as a data frame hence the function is not able to evaluate it as per expectations. In short, I need to change it from str to pandas.core.frame.DataFrame

More information: I call the above function inside a for loop:

for n in range(len(df_config)):
    cat_lst = df_config.at[n,'category'].split(",")
    con_lst = df_config.at[n,'country'].split(",")
    raw_source = df_config.at[n,'Raw source']
    energy_source = df_config.at[n,'Energy source']

Hence the source data frame is picked up automatically from user input which is saved in the df_config.

CodePudding user response:

Create a dictionary like this: {"data_frame_name" : data_frame}, so that you can access each data_frame by it's name, and assume we have a data_src_1 data, like below:

data_src_1 = [['Alex',10],['Bob',12],['Clarke',13]]
df_source_1 = pd.DataFrame(data_src_1)
raw_sources = {"df_source_1" : df_source_1}    # You can have other dataframes here

Pass the name of data frame you want df_source to the cat_update method, and edit the method like this:

raw_sources = {"df_source_1" : df_source_1, ...}
def cat_update(df_to_update, df_source, cat_lst, con_lst):
    try:
        for cat, con in itertools.product(cat_lst, con_lst):
            df_to_update.at[cat,  con] = raw_sources[df_source].at[cat,  con]

However, you could just pass the data frame such as df_source_1 it self to the method, but in the above snippet, you can have all data frames altogether in one dictionary (raw_sources).

  • Related