Vectorized operations in Pandas with fixed columns/rows/values-CodePudding

I would like to perform operations on Pandas dataframes using fixed columns, rows, or values.

For example:

import numpy as np
import pandas as pd

df = pd.DataFrame({'a':(1,2,3), 'b':(4,5,6), 'c':(7,8,9), 'd':(10,11,12),
                  'e':(13,14,15)})

df
Out[57]: 
   a  b  c   d   e
0  1  4  7  10  13
1  2  5  8  11  14
2  3  6  9  12  15

I want to use the values in columns 'a' and 'b' as fixed values.


# It's easy enough to perform the operation I want on one column at a time:
df.loc[:,'f'] = df.loc[:,'c']   df.loc[:,'a']   df.loc[:,'b']

# It gets cumbersome if there are many columns to perform the operation on though:
df.loc[:,'g'] = df.loc[:,'d'] / df.loc[:,'a'] * df.loc[:,'b']
df.loc[:,'h'] = df.loc[:,'e'] / df.loc[:,'a'] * df.loc[:,'b']
# etc.

# This returns columns with all NaN values.
df.loc[:,('f','g','h')] = df.loc[:,'c':'e'] / df.loc[:'a']

Is there an optimal way to do what I want in Pandas? I could not find working solutions in the Pandas documentation or this SO thread. I don't think I can use .map() or .applymap(), because I'm under the impression they can only be using for simple equations (one input value). Thanks for reading.

CodePudding user response：

Use div and mul instead of / and * with axis=0:

df[['g', 'h']] = df[['d', 'e']].div(df['a'], axis=0).mul(df['b'], axis=0)
print(df)

# Output
   a  b  c   d   e     g     h
0  1  4  7  10  13  40.0  52.0
1  2  5  8  11  14  27.5  35.0
2  3  6  9  12  15  24.0  30.0

With numpy:

arr = df.to_numpy()
arr[:, [3, 4]] / arr[:, [0]] * arr[:, [1]]

# Output
array([[40. , 52. ],
       [27.5, 35. ],
       [24. , 30. ]])

CodePudding user response：

As @Corralien pointed out, its better to use Pandas dataframe operations such as .div(), but I also figured out that the usage of .loc[] is important.

# Doesn't work:
df.loc[:,['f','g','h']] = df.loc[:,'c':'e'].div(df.loc[:'a'], axis=0)

# Doesn't work:
df[['f','g','h']] = df.loc[:,'c':'e'].div(df.loc[:'a'], axis=0)

# Now works.
df[['f','g','h']] = df.loc[:,'c':'e'].div(df['a'], axis=0)

At the moment, I'm not exactly sure why this is. Any insight would be helpful, thanks.