Home > Mobile >  Unable to use a jupyter cell global variable inside a python function
Unable to use a jupyter cell global variable inside a python function

Time:10-02

I am working on a jupyter notebook using python

I have created two dataframes like as shown below

The below two dataframes are declared outside the function - Meaning they are just defined/declared/initialized in jupyter notebook cell [And I wish to use them inside a function like as shown below]

subcols = ["subjid","marks"]           #written in jupyter cell 1
subjdf= pd.DataFrame(columns=subcols)

testcolumns = ["testid","testmarks"]   #written in jupyter cell 2
testdf= pd.DataFrame(columns=testcolumns)

def fun1():                  #written in jupyter cell 3
....
....
return df1,df2

def fun2(df1,df2):
...
...
return df1,df2,df3

def fun3(df1,df2,df3):
...
  subjdf['subid'] = df1['indid']
...
return df1,df2,df3,subjdf

def fun4(df1,df2,df3,subjdf):
...
  testdf['testid'] = df2['examid']
...
return df1,df2,df3,subjdf,testdf

The above way of writing throws an error in fun3 as below

UnboundLocalError: local variable 'subjdf' referenced before assignment

but I have already created subjdf outside the function blocks [Refer 1st Jupyter cell]

Two things to note here

a] I don't get an error if I use global subjdf in fun3

b] If I use global subjdf, I don't get any error for testdf in fun4. I was expecting testdf to have similar error as well because I have used them the same way in fun4.

So, my question is why not for testdf but only for subjdf

Additionally, I have followed similar approach earlier [without using global variable but just declaring the df outside the function blocks] and it was working fine. Not sure, why it is throwing error only now.

Can help me to avoid this error? please.

CodePudding user response:

You have created subjdf, but your function fun3 needs it as argument :

def fun3(subjdf, df1, df2, df3):
  ...
  subjdf['subid'] = df1['indid']

You're not using python functions properly. You don't need to use global in your case. Whether you pass the correct argument and return it, or think about creating an instance method using self. You have many solutions, but Instance methods are a good solution when you have to handle pandas.Dataframe within classes and functions.

CodePudding user response:

It's possible run you snippet as you guess. So many lines of code is missing.

If you don't want to use a class, and that you want keep using this recursive manner, then rebuild you code that way :

subcols = ["subjid","marks"]          
subjdf= pd.DataFrame(columns=subcols)

testcolumns = ["testid","testmarks"]   
testdf= pd.DataFrame(columns=testcolumns)

def fun1():
    # DO SOMETHING to generate df1 and df2
    return df1, df2

def fun2():
    df1, df2 = fun1()
    # DO SOMETHING to generate df3
    return df1, df2, df3

def fun3(subjdf):
    df1, df2, df3 = fun2()
    subjdf['subid'] = df1['indid']
    return df1, df2, df3, subjdf

def fun4(subjdf, testdf):
    df1, df2, df3, subjdf = fun3()
    testdf['testid'] = df2['examid']
    return df1, df2, df3, subjdf, testdf

fun4(subjdf, testdf)

But I repeat, build an instance method with self for building this.

  • Related