Home > database >  Can apply function change the original input pandas df?
Can apply function change the original input pandas df?

Time:08-28

I always assume that the apply function won't change the original pandas dataframe and need the assignment to return the changes, however, could anyone help to explain why this happen?

def f(row):
    row['a'] = 10
    row['b'] = 20
df_x = pd.DataFrame({'a':[10,11,12], 'b':[3,4,5], 'c':[1,1,1]}) #, 'd':[[1,2],[1,2],[1,2]]
df_x.apply(f, axis = 1)
df_x

returns

    a   b   c
0   10  20  1
1   10  20  1
2   10  20  1

So, apply function changed the original pd.DataFrame without return, but if there's an non-basic type column in the data frame, then it won't do anything:

def f(row):
    row['a'] = 10
    row['b'] = 20
    row['d'] = [0]
    
df_x = pd.DataFrame({'a':[10,11,12], 'b':[3,4,5], 'c':[1,1,1], 'd':[[1,2],[1,2],[1,2]]})
df_x.apply(f, axis = 1)
df_x

This return result without any change

    a   b   c   d
0   10  3   1   [1, 2]
1   11  4   1   [1, 2]
2   12  5   1   [1, 2]

Could anyone help to explain this or provide some reference? thx

CodePudding user response:

Series are mutable objects. If you modify them during an operation, the changes will be reflected if no copy is made.

This is what happens in the first case. My guess: no copy is made as your DataFrame has a homogenous dtype (integer), so all the DataFrame is stored as a unique array internally.

In the second case, you have at least one item being a list. This make the dtype object, the DataFrame not a single dtype and apply must generate a new Series before running due to the mixed type of the row.

You can actually reproduce this just by changing a single element to another type:

def f(row):
    row['a'] = 10
    row['b'] = 20
df_x = pd.DataFrame({'a':[10,11,12],
                     'b':[3,4,5],
                     'c':[1,1.,1]}) # float
df_x.apply(f, axis = 1)
df_x

# different types
# no mutation
    a  b    c
0  10  3  1.0
1  11  4  1.0
2  12  5  1.0

Take home message: never modify a mutable input in a function (unless you want it and know what you're doing).

  • Related