Home > OS >  apply function to some rows in pandas
apply function to some rows in pandas

Time:12-12

I want to do several changes to some rows in a pandas dataframe. The rows to change are selected based on the contents of some other columns. The dataset is large, and I have not managed to find a solution which is not very slow.

The following toy code illustrates the problem:

import pandas as pd

def change1(s):

    if s['a'] == 1:
        s[['b', 'c']] = s[['c', 'b']].values
    
    return s

def change2(s):

    s[['b', 'c']] = s[['c', 'b']].values
    
    return s


df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6], 'c':[7,8,9]})
print('original:')
print(df)

df = df.apply(change1, axis = 1)
print('change1:')
print(df)

df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6], 'c':[7,8,9]})
df.loc[df['a']==1,:] = df.loc[df['a']==1,:].apply(change2, axis=1)
print('change2:')
print(df)

My questions are:

  1. Why does the second strategy (change2) not work, while the first one does?
  2. What would be a more correct, and faster, way to do this?

CodePudding user response:

Found a better solution:

df = df.where(df['a'] != 1, change2, axis=1)

That one was fast enough. Case closed..

CodePudding user response:

Why not:

df.loc[df['a']==1, ['b','c']] = df.loc[df['a']==1,['c','b']].values

change2 doesn't work because df.loc[df['a']==1,:] is a slice of df based on a df['a']==1 returned as a view, so when you pull ['b','c'] from this slice, you get a copy, so assignment will have no effect on the original df.

  • Related