The objective is to multiply some constant value to a column in Pandas. Each column has its own constant value.
For example, the columns 'a_b_c','dd_ee','ff_ff','abc','devb'
are multiply with constant 15,20,15,15,20, respectively.
The constants values and its associated column is store in a dict const_val
const_val=dict(a_b_c=15,
dd_ee=20,
ff_ff=15,
abc=15,
devb=20,)
Currently, I am using a for-loop
to multiply each column to its associate constant value which is shown in code below
for dpair in const_val:
df[('per_a',dpair)]=df[dpair]*const_val[dpair]/reval
However, I wonder whether there is more elagent ways of doing this.
The full code is provided below
import pandas as pd
import numpy as np
np.random.seed(0)
const_val=dict(a_b_c=15,
dd_ee=20,
ff_ff=15,
abc=15,
devb=20,)
df = pd.DataFrame(data=np.random.randint(5, size=(3, 6)),
columns=['id','a_b_c','dd_ee','ff_ff','abc','devb'])
reval=6
for dpair in const_val:
df[('per_a',dpair)]=df[dpair]*const_val[dpair]/reval
The expected output is as below
id a_b_c dd_ee ... (per_a, ff_ff) (per_a, abc) (per_a, devb)
0 4 0 3 ... 7.5 7.5 3.333333
1 3 2 4 ... 0.0 0.0 13.333333
2 2 1 0 ... 2.5 2.5 0.000000
Please note that the
(per_a, ff_ff) (per_a, abc) (per_a, devb)
are multiindex column. The representative might be different in your compiler
p.s., I am using IntelliJ IDEA
CodePudding user response:
If you only have numbers in your DataFrame:
out = df.mul(pd.Series(const_val).reindex(df.columns, fill_value=1), axis=1)
If you have a mix non numeric and non-numeric:
out = df.select_dtypes('number').mul(pd.Series(const_val), axis=1).combine_first(df)
update:
out = df.join(df[list(const_val)].mul(pd.Series(const_val), axis=1)
.div(reval).add_prefix('per_a_'))
Output
id a_b_c dd_ee ff_ff abc devb per_a_a_b_c per_a_dd_ee per_a_ff_ff per_a_abc per_a_devb
0 1 4 3 0 3 0 10.0 10.000000 0.0 7.5 0.0
1 2 3 0 1 3 3 7.5 0.000000 2.5 7.5 10.0
2 3 0 1 1 1 0 0.0 3.333333 2.5 2.5 0.0
CodePudding user response:
Update for multiindex/tuple column headers:
cols = pd.Index(const_val.keys())
mi = pd.MultiIndex.from_product([['per_a'], cols])
df[mi] = df[cols] * pd.Series(const_val) / reval
print(df)
Output:
id a_b_c dd_ee ff_ff abc devb (per_a, a_b_c) (per_a, dd_ee) (per_a, ff_ff) (per_a, abc) (per_a, devb)
0 4 0 3 3 3 1 0.0 10.000000 7.5 7.5 3.333333
1 3 2 4 0 0 4 5.0 13.333333 0.0 0.0 13.333333
2 2 1 0 1 1 0 2.5 0.000000 2.5 2.5 0.000000
Try this using pandas intrinsic data alignment tenants to align data using indexing:
cols = pd.Index(const_val.keys())
df[cols '_per_a'] = df[cols] * pd.Series(const_val) / reval
Output:
id a_b_c dd_ee ff_ff abc devb a_b_c_per_a dd_ee_per_a ff_ff_per_a abc_per_a devb_per_a
0 4 0 3 3 3 1 0.0 10.000000 7.5 7.5 3.333333
1 3 2 4 0 0 4 5.0 13.333333 0.0 0.0 13.333333
2 2 1 0 1 1 0 2.5 0.000000 2.5 2.5 0.000000
CodePudding user response:
df
id a_b_c dd_ee ff_ff abc devb
0 4 0 3 3 3 1
1 3 2 4 0 0 4
2 2 1 0 1 1 0
make const_val to series
s = pd.Series(const_val)
s
a_b_c 15
dd_ee 20
ff_ff 15
abc 15
devb 20
dtype: int64
use broadcasting
out = df[['id']].join(df[df.columns[1:]].mul(s))
out
id a_b_c dd_ee ff_ff abc devb
0 4 0 60 45 45 20
1 3 30 80 0 0 80
2 2 15 0 15 15 0