Elagent way multiplying diffrent constant value to diffrent columns in Pandas-CodePudding

The objective is to multiply some constant value to a column in Pandas. Each column has its own constant value.

For example, the columns 'a_b_c','dd_ee','ff_ff','abc','devb' are multiply with constant 15,20,15,15,20, respectively.

The constants values and its associated column is store in a dict const_val

const_val=dict(a_b_c=15,
               dd_ee=20,
               ff_ff=15,
               abc=15,
               devb=20,)

Currently, I am using a for-loop to multiply each column to its associate constant value which is shown in code below

for dpair in const_val:
    df[('per_a',dpair)]=df[dpair]*const_val[dpair]/reval

However, I wonder whether there is more elagent ways of doing this.

The full code is provided below

import pandas as pd
import numpy as np
np.random.seed(0)

const_val=dict(a_b_c=15,
               dd_ee=20,
               ff_ff=15,
               abc=15,
               devb=20,)

df = pd.DataFrame(data=np.random.randint(5, size=(3, 6)),
                  columns=['id','a_b_c','dd_ee','ff_ff','abc','devb'])

reval=6

for dpair in const_val:
    df[('per_a',dpair)]=df[dpair]*const_val[dpair]/reval

The expected output is as below

   id  a_b_c  dd_ee  ...  (per_a, ff_ff)  (per_a, abc)  (per_a, devb)
0   4      0      3  ...             7.5           7.5       3.333333
1   3      2      4  ...             0.0           0.0      13.333333
2   2      1      0  ...             2.5           2.5       0.000000

Please note that the

(per_a, ff_ff)  (per_a, abc)  (per_a, devb)

are multiindex column. The representative might be different in your compiler

p.s., I am using IntelliJ IDEA

CodePudding user response：

If you only have numbers in your DataFrame:

out = df.mul(pd.Series(const_val).reindex(df.columns, fill_value=1), axis=1)

If you have a mix non numeric and non-numeric:

out = df.select_dtypes('number').mul(pd.Series(const_val), axis=1).combine_first(df)

update:

out = df.join(df[list(const_val)].mul(pd.Series(const_val), axis=1)
              .div(reval).add_prefix('per_a_'))

Output

   id  a_b_c  dd_ee  ff_ff  abc  devb  per_a_a_b_c  per_a_dd_ee  per_a_ff_ff  per_a_abc  per_a_devb
0   1      4      3      0    3     0         10.0    10.000000          0.0        7.5         0.0
1   2      3      0      1    3     3          7.5     0.000000          2.5        7.5        10.0
2   3      0      1      1    1     0          0.0     3.333333          2.5        2.5         0.0

CodePudding user response：

Update for multiindex/tuple column headers:

cols = pd.Index(const_val.keys())
mi = pd.MultiIndex.from_product([['per_a'], cols])
df[mi] = df[cols] * pd.Series(const_val) / reval
print(df)

Output:

   id  a_b_c  dd_ee  ff_ff  abc  devb  (per_a, a_b_c)  (per_a, dd_ee)  (per_a, ff_ff)  (per_a, abc)  (per_a, devb)
0   4      0      3      3    3     1             0.0       10.000000             7.5           7.5       3.333333
1   3      2      4      0    0     4             5.0       13.333333             0.0           0.0      13.333333
2   2      1      0      1    1     0             2.5        0.000000             2.5           2.5       0.000000

Try this using pandas intrinsic data alignment tenants to align data using indexing:

cols = pd.Index(const_val.keys())
df[cols   '_per_a'] = df[cols] * pd.Series(const_val) / reval

Output:

   id  a_b_c  dd_ee  ff_ff  abc  devb  a_b_c_per_a  dd_ee_per_a  ff_ff_per_a  abc_per_a  devb_per_a
0   4      0      3      3    3     1          0.0    10.000000          7.5        7.5    3.333333
1   3      2      4      0    0     4          5.0    13.333333          0.0        0.0   13.333333
2   2      1      0      1    1     0          2.5     0.000000          2.5        2.5    0.000000

CodePudding user response：

df

    id  a_b_c   dd_ee   ff_ff   abc devb
0   4   0       3       3       3   1
1   3   2       4       0       0   4
2   2   1       0       1       1   0

make const_val to series

s = pd.Series(const_val)

s

a_b_c    15
dd_ee    20
ff_ff    15
abc      15
devb     20
dtype: int64

use broadcasting

out = df[['id']].join(df[df.columns[1:]].mul(s))

out

    id  a_b_c   dd_ee   ff_ff   abc devb
0   4   0       60      45      45  20
1   3   30      80      0       0   80
2   2   15      0       15      15  0