How do you modify columns in Pandas via method chaining?-CodePudding

What is the best pandas method to apply specific functions to specific columns?

Let

df = pd.DataFrame({'A':[1,2,3], 'B':[1,2,3], 'C':[1,2,3]})

Suppose I want to double the values in column 'A' and halve the values in column 'B', and keep column 'C' unchanged.

I know I could do

df['A'] = df['A'] * 2
df['B'] = df['B'] / 2

but I'm looking for something that allows for method chaining.

This comes close:

df.apply({'A':lambda x: x*2, 'B':lambda x: x/2})

But by default it drops column 'C'

The other alternative I know is an abuse of the .assign method. Ie

df.assign(**{'A':lambda x: x['A']*2, 'B':lambda x: x['B']/2})

But I'm sure Pandas has a method for what I'm looking to do.

CodePudding user response：

The latter code you propose is absolutely not an abuse: assign accepts a callable and there is even an example in the documentation.

>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5   32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

I would personnally use:

df.assign(A=lambda d: d['A']*2,
          B=lambda d: d['B']/2)

NB. If you have a look at my pandas answers, you will see that I am using it all the time ;)

alternative: eval

You could also use eval (only one expression per line is allowed):

df.eval('''A = A*2
           B = B/2''')

# or
df.eval('A = A*2\nB = B/2')

Warning: while eval is nice, it is quite slower than assign

CodePudding user response：

Create a custom function that receives the DataFrame (df) and the dictionary of functions, and first applies each function to the respective column using DataFrame.transform, and then adds the unchanged columns.

Since that your use case seems to apply vectorized transformations that only depend on the given column, using DataFrame.transform should be faster than using DataFrame.apply. DataFrame.transform also allows you to apply a list of functions to the same column.

def map_columns(df, func_dict, *args, **kwargs):
    # Transform each column
    df_trans = df.transform(func_dict, *args, **kwargs)
    # get the unchanged columns
    other_cols = df.columns.difference(df_trans.columns)
    # add the unchanged columns to the final result
    df_trans[other_cols] = df[other_cols]
    return df_trans

>>> df = pd.DataFrame({'A':[1,2,3], 'B':[1,2,3], 'C':[1,2,3]})
>>> df

   A  B  C
0  1  1  1
1  2  2  2
2  3  3  3

>>> map_columns(df, {'A':lambda x: x*2, 'B':lambda x: x/2})

   A    B  C
0  2  0.5  1
1  4  1.0  2
2  6  1.5  3

To use it with method chaining just pass the function to DataFrame.pipe

>>> ( 
  df.pipe(map_columns, {'A':lambda x: x*2, 'B':lambda x: x/2})
    .mul(-1)
    # (...)
)

   A    B  C
0 -2 -0.5 -1
1 -4 -1.0 -2
2 -6 -1.5 -3