What is the best pandas method to apply specific functions to specific columns?
Let
df = pd.DataFrame({'A':[1,2,3], 'B':[1,2,3], 'C':[1,2,3]})
Suppose I want to double the values in column 'A'
and halve the values in column 'B'
, and keep column 'C'
unchanged.
I know I could do
df['A'] = df['A'] * 2
df['B'] = df['B'] / 2
but I'm looking for something that allows for method chaining.
This comes close:
df.apply({'A':lambda x: x*2, 'B':lambda x: x/2})
But by default it drops column 'C'
The other alternative I know is an abuse of the .assign
method. Ie
df.assign(**{'A':lambda x: x['A']*2, 'B':lambda x: x['B']/2})
But I'm sure Pandas has a method for what I'm looking to do.
CodePudding user response:
The latter code you propose is absolutely not an abuse: assign
accepts a callable and there is even an example in the documentation.
>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 32)
temp_c temp_f
Portland 17.0 62.6
Berkeley 25.0 77.0
I would personnally use:
df.assign(A=lambda d: d['A']*2,
B=lambda d: d['B']/2)
NB. If you have a look at my pandas answers, you will see that I am using it all the time ;)
alternative: eval
You could also use eval
(only one expression per line is allowed):
df.eval('''A = A*2
B = B/2''')
# or
df.eval('A = A*2\nB = B/2')
Warning: while eval
is nice, it is quite slower than assign
CodePudding user response:
Create a custom function that receives the DataFrame (df
) and the dictionary of functions, and first applies each function to the respective column using DataFrame.transform
, and then adds the unchanged columns.
Since that your use case seems to apply vectorized transformations that only depend on the given column, using DataFrame.transform
should be faster than using DataFrame.apply
. DataFrame.transform
also allows you to apply a list of functions to the same column.
def map_columns(df, func_dict, *args, **kwargs):
# Transform each column
df_trans = df.transform(func_dict, *args, **kwargs)
# get the unchanged columns
other_cols = df.columns.difference(df_trans.columns)
# add the unchanged columns to the final result
df_trans[other_cols] = df[other_cols]
return df_trans
>>> df = pd.DataFrame({'A':[1,2,3], 'B':[1,2,3], 'C':[1,2,3]})
>>> df
A B C
0 1 1 1
1 2 2 2
2 3 3 3
>>> map_columns(df, {'A':lambda x: x*2, 'B':lambda x: x/2})
A B C
0 2 0.5 1
1 4 1.0 2
2 6 1.5 3
To use it with method chaining just pass the function to DataFrame.pipe
>>> (
df.pipe(map_columns, {'A':lambda x: x*2, 'B':lambda x: x/2})
.mul(-1)
# (...)
)
A B C
0 -2 -0.5 -1
1 -4 -1.0 -2
2 -6 -1.5 -3