Home > Software design >  Python, pandas Dataframe
Python, pandas Dataframe

Time:12-16

Let's say we have a 100 rows pandas Dataframe "frame", then we define a method

def test(a_dataframe):
  a_dataframe["new_col"] = "new_value"
  a_dataframe = a_dataframe.iloc[0:10,:]

if we run test(frame), the frame object would have the "new_col", but still have 100 rows.

Could anybody explain why the method test could add new column to a Dataframe but couldn't subset it?

Thanks

I thought the "test" method would add new column to a Dataframe as well as subset it with the first 10 rows.

CodePudding user response:

When you call the function with test(frame), the local variable a_dataframe inside the function will initially contain a reference to the frame object that exists outside of the function. Now the two lines within the body of the function do very different things:

  1. a_dataframe["new_col"] = "new_value" does not change the value of the local variable a_dataframe. Instead, it invokes the __setitem__ method on the dataframe that is referenced by that variable. So the frame outside the function is changed accordingly.
  2. a_dataframe = a_dataframe.iloc[0:10,:] does change the value of the local variable a_dataframe. This has nothing to do with the iloc method. It is simply because with a_dataframe = <anything>, you assign a new value to the local variable a_dataframe, thus overwriting the reference to frame it initially contained.

If you do want to drop rows from frame from within the function, you could use something like a_dataframe.drop(range(10, 100), inplace=True). This would work similarly to case 1. above, calling a method on the dataframe that is referenced by the local variable. Note that the first argument of the drop method refers to index values, which are not necessarily identical to the row numbers that iloc refers to.

  • Related