I am working on a jupyter notebook using python
I have created two dataframes like as shown below
The below two dataframes are declared outside the function - Meaning they are just defined/declared/initialized in jupyter notebook cell [And I wish to use them inside a function like as shown below]
subcols = ["subjid","marks"] #written in jupyter cell 1
subjdf= pd.DataFrame(columns=subcols)
testcolumns = ["testid","testmarks"] #written in jupyter cell 2
testdf= pd.DataFrame(columns=testcolumns)
def fun1(): #written in jupyter cell 3
....
....
return df1,df2
def fun2(df1,df2):
...
...
return df1,df2,df3
def fun3(df1,df2,df3):
...
subjdf['subid'] = df1['indid']
...
return df1,df2,df3,subjdf
def fun4(df1,df2,df3,subjdf):
...
testdf['testid'] = df2['examid']
...
return df1,df2,df3,subjdf,testdf
The above way of writing throws an error in fun3
as below
UnboundLocalError: local variable 'subjdf' referenced before assignment
but I have already created subjdf
outside the function blocks [Refer 1st Jupyter cell]
Two things to note here
a] I don't get an error if I use global subjdf
in fun3
b] If I use global subjdf
, I don't get any error for testdf
in fun4. I was expecting testdf
to have similar error as well because I have used them the same way in fun4.
So, my question is why not for testdf
but only for subjdf
Additionally, I have followed similar approach earlier [without using global variable but just declaring the df outside the function blocks] and it was working fine. Not sure, why it is throwing error only now.
Can help me to avoid this error? please.
CodePudding user response:
You have created subjdf
, but your function fun3
needs it as argument :
def fun3(subjdf, df1, df2, df3):
...
subjdf['subid'] = df1['indid']
You're not using python functions properly. You don't need to use global
in your case. Whether you pass the correct argument and return it, or think about creating an instance method using self
. You have many solutions, but Instance methods are a good solution when you have to handle pandas.Dataframe
within classes and functions.
CodePudding user response:
It's possible run you snippet as you guess. So many lines of code is missing.
If you don't want to use a class, and that you want keep using this recursive manner, then rebuild you code that way :
subcols = ["subjid","marks"]
subjdf= pd.DataFrame(columns=subcols)
testcolumns = ["testid","testmarks"]
testdf= pd.DataFrame(columns=testcolumns)
def fun1():
# DO SOMETHING to generate df1 and df2
return df1, df2
def fun2():
df1, df2 = fun1()
# DO SOMETHING to generate df3
return df1, df2, df3
def fun3(subjdf):
df1, df2, df3 = fun2()
subjdf['subid'] = df1['indid']
return df1, df2, df3, subjdf
def fun4(subjdf, testdf):
df1, df2, df3, subjdf = fun3()
testdf['testid'] = df2['examid']
return df1, df2, df3, subjdf, testdf
fun4(subjdf, testdf)
But I repeat, build an instance method with self
for building this.