Let's say we have a 100 rows pandas Dataframe "frame", then we define a method
def test(a_dataframe):
a_dataframe["new_col"] = "new_value"
a_dataframe = a_dataframe.iloc[0:10,:]
if we run test(frame), the frame object would have the "new_col", but still have 100 rows.
Could anybody explain why the method test could add new column to a Dataframe but couldn't subset it?
Thanks
I thought the "test" method would add new column to a Dataframe as well as subset it with the first 10 rows.
CodePudding user response:
When you call the function with test(frame)
, the local variable a_dataframe
inside the function will initially contain a reference to the frame
object that exists outside of the function. Now the two lines within the body of the function do very different things:
a_dataframe["new_col"] = "new_value"
does not change the value of the local variablea_dataframe
. Instead, it invokes the__setitem__
method on the dataframe that is referenced by that variable. So theframe
outside the function is changed accordingly.a_dataframe = a_dataframe.iloc[0:10,:]
does change the value of the local variablea_dataframe
. This has nothing to do with theiloc
method. It is simply because witha_dataframe = <anything>
, you assign a new value to the local variablea_dataframe
, thus overwriting the reference toframe
it initially contained.
If you do want to drop rows from frame
from within the function, you could use something like a_dataframe.drop(range(10, 100), inplace=True)
. This would work similarly to case 1. above, calling a method on the dataframe that is referenced by the local variable. Note that the first argument of the drop
method refers to index values, which are not necessarily identical to the row numbers that iloc
refers to.