Home > Mobile >  Vectorized operations in Pandas with fixed columns/rows/values
Vectorized operations in Pandas with fixed columns/rows/values

Time:02-17

I would like to perform operations on Pandas dataframes using fixed columns, rows, or values.

For example:

import numpy as np
import pandas as pd

df = pd.DataFrame({'a':(1,2,3), 'b':(4,5,6), 'c':(7,8,9), 'd':(10,11,12),
                  'e':(13,14,15)})

df
Out[57]: 
   a  b  c   d   e
0  1  4  7  10  13
1  2  5  8  11  14
2  3  6  9  12  15

I want to use the values in columns 'a' and 'b' as fixed values.


# It's easy enough to perform the operation I want on one column at a time:
df.loc[:,'f'] = df.loc[:,'c']   df.loc[:,'a']   df.loc[:,'b']

# It gets cumbersome if there are many columns to perform the operation on though:
df.loc[:,'g'] = df.loc[:,'d'] / df.loc[:,'a'] * df.loc[:,'b']
df.loc[:,'h'] = df.loc[:,'e'] / df.loc[:,'a'] * df.loc[:,'b']
# etc.

# This returns columns with all NaN values.
df.loc[:,('f','g','h')] = df.loc[:,'c':'e'] / df.loc[:'a']

Is there an optimal way to do what I want in Pandas? I could not find working solutions in the Pandas documentation or this SO thread. I don't think I can use .map() or .applymap(), because I'm under the impression they can only be using for simple equations (one input value). Thanks for reading.

CodePudding user response:

Use div and mul instead of / and * with axis=0:

df[['g', 'h']] = df[['d', 'e']].div(df['a'], axis=0).mul(df['b'], axis=0)
print(df)

# Output
   a  b  c   d   e     g     h
0  1  4  7  10  13  40.0  52.0
1  2  5  8  11  14  27.5  35.0
2  3  6  9  12  15  24.0  30.0

With numpy:

arr = df.to_numpy()
arr[:, [3, 4]] / arr[:, [0]] * arr[:, [1]]

# Output
array([[40. , 52. ],
       [27.5, 35. ],
       [24. , 30. ]])

CodePudding user response:

As @Corralien pointed out, its better to use Pandas dataframe operations such as .div(), but I also figured out that the usage of .loc[] is important.

# Doesn't work:
df.loc[:,['f','g','h']] = df.loc[:,'c':'e'].div(df.loc[:'a'], axis=0)

# Doesn't work:
df[['f','g','h']] = df.loc[:,'c':'e'].div(df.loc[:'a'], axis=0)

# Now works.
df[['f','g','h']] = df.loc[:,'c':'e'].div(df['a'], axis=0)

At the moment, I'm not exactly sure why this is. Any insight would be helpful, thanks.

  • Related