If
(A) Mutable objects modified in functions are also mutated in the calling context
and
(B) pandas dataframes are mutable objects,
then in the following example, why is an empty dataframe not printed in the last output (Outside-After
)?
import pandas as pd
def foo(df):
df=df[0:0] # clear the df
print(df)
df=pd.DataFrame([[1,2,3],[4,5,6]])
print("\nOutside - Before:")
print(df)
print("\nInside function:")
foo(df)
print("\nOutside - After:")
print(df)
Output:
Outside - Before:
0 1 2
0 1 2 3
1 4 5 6
Inside function:
Empty DataFrame
Columns: [0, 1, 2]
Index: []
Outside - After:
0 1 2
0 1 2 3
1 4 5 6
CodePudding user response:
Your problem is not with the dataframe itself, but rather with the df
identifier inside foo
. The df
inside foo
is a different identifier than the df
outside of foo
. Setting the version inside the function doesn't affect the version outside the function. To illustrate...this code is functionally equivalent to yours:
import pandas as pd
def foo(some_df):
some_df=some_df[0:0] # clear the df
print(some_df)
df=pd.DataFrame([[1,2,3],[4,5,6]])
print("\nOutside - Before:")
print(df)
print("\nInside function:")
foo(df)
print("\nOutside - After:")
print(df)
This causes some_df
to be set to the value of df
by way of df
being passed into foo
as a parameter. df
is unaffected from that point on. Hopefully this makes it more clear why df
doesn't change.
To get the result you desire, you can do this:
import pandas as pd
def foo(df):
df=df[0:0] # clear the df
print(df)
return df
df=pd.DataFrame([[1,2,3],[4,5,6]])
print("\nOutside - Before:")
print(df)
print("\nInside function:")
df = foo(df)
print("\nOutside - After:")
print(df)
As you can see, the value of the df
outside the function gets set by means of assigning the return value of df
to it. Since you're returning the value of the df
inside the function, changing the inner one ends up changing the outer one as well.
CodePudding user response:
If you ever have a value assignment inside a function, the new value will only be available inside that function:
def foo(v):
v = 1
i = 0
foo(i)
# i is still 0
There is no way to reassign a value inside a function and have that change reflected outside it.
However... you can change the contents of the value:
def foo(df):
df[:]=0
df=pd.DataFrame([[1,2,3],[4,5,6]])
foo(df)
# df is now all zero