I would like to perform operations on Pandas dataframes using fixed columns, rows, or values.
For example:
import numpy as np
import pandas as pd
df = pd.DataFrame({'a':(1,2,3), 'b':(4,5,6), 'c':(7,8,9), 'd':(10,11,12),
'e':(13,14,15)})
df
Out[57]:
a b c d e
0 1 4 7 10 13
1 2 5 8 11 14
2 3 6 9 12 15
I want to use the values in columns 'a' and 'b' as fixed values.
# It's easy enough to perform the operation I want on one column at a time:
df.loc[:,'f'] = df.loc[:,'c'] df.loc[:,'a'] df.loc[:,'b']
# It gets cumbersome if there are many columns to perform the operation on though:
df.loc[:,'g'] = df.loc[:,'d'] / df.loc[:,'a'] * df.loc[:,'b']
df.loc[:,'h'] = df.loc[:,'e'] / df.loc[:,'a'] * df.loc[:,'b']
# etc.
# This returns columns with all NaN values.
df.loc[:,('f','g','h')] = df.loc[:,'c':'e'] / df.loc[:'a']
Is there an optimal way to do what I want in Pandas? I could not find working solutions in the Pandas documentation or this SO thread. I don't think I can use .map()
or .applymap()
, because I'm under the impression they can only be using for simple equations (one input value). Thanks for reading.
CodePudding user response:
Use div
and mul
instead of /
and *
with axis=0
:
df[['g', 'h']] = df[['d', 'e']].div(df['a'], axis=0).mul(df['b'], axis=0)
print(df)
# Output
a b c d e g h
0 1 4 7 10 13 40.0 52.0
1 2 5 8 11 14 27.5 35.0
2 3 6 9 12 15 24.0 30.0
With numpy
:
arr = df.to_numpy()
arr[:, [3, 4]] / arr[:, [0]] * arr[:, [1]]
# Output
array([[40. , 52. ],
[27.5, 35. ],
[24. , 30. ]])
CodePudding user response:
As @Corralien pointed out, its better to use Pandas dataframe operations such as .div()
, but I also figured out that the usage of .loc[]
is important.
# Doesn't work:
df.loc[:,['f','g','h']] = df.loc[:,'c':'e'].div(df.loc[:'a'], axis=0)
# Doesn't work:
df[['f','g','h']] = df.loc[:,'c':'e'].div(df.loc[:'a'], axis=0)
# Now works.
df[['f','g','h']] = df.loc[:,'c':'e'].div(df['a'], axis=0)
At the moment, I'm not exactly sure why this is. Any insight would be helpful, thanks.