How do I multiply a Pandas dataframe by a multiplier from a dict?-CodePudding

Starting from a dataframe like the below (simplified example of my real case):

import pandas as pd
df = pd.DataFrame({
    'a': [1.0, 1.1, 1.0, 4.2, 5.1],
    'b': [5.0, 4.2, 3.1, 3.2, 4.1],
    'c': [3.9, 2.0, 4.2, 3.8, 6.7],
    'd': [3.1, 2.1, 1.2, 1.0, 1.0]
})

And then taking a dictionary containing some multipliers I want to multiply certain columns in the dataframe by:

dict = {
  "b": 0.01,
  "d": 0.001
}

i.e. I want to check if each column in the dataframe is in my dictionary, and if it does exist as a key, then multiply that column of the dataframe by the value in the dictionary. In this example, I would want to multiply column 'b' by 0.01 and column 'd' by 0.001. I would end up with:

    'a': [1.0, 1.1, 1.0, 4.2, 5.1],
    'b': [0.05, 0.042, 0.031, 0.032, 0.041],
    'c': [3.9, 2.0, 4.2, 3.8, 6.7],
    'd': [0.0031, 0.0021, 0.0012, 0.001, 0.001]

In my real example, the dataframe is a cleaned-up set of data read in from Excel, and the dictionary of multipliers is read in from a config file, to allow users to specify which columns need converting from whatever is in Excel to the desired/expected units of measure (e.g. converting 'g/h' in the raw data to 'kg/h' in the dataframe).

What are some good, clear ways of achieving this intent, even if I have to restructure the implementation a bit?

CodePudding user response：

Try:

df[list(dct)] *= dct.values()

print(df)

Prints:

     a      b    c       d
0  1.0  0.050  3.9  0.0031
1  1.1  0.042  2.0  0.0021
2  1.0  0.031  4.2  0.0012
3  4.2  0.032  3.8  0.0010
4  5.1  0.041  6.7  0.0010

If in dct are keys not in dataframe:

tmp = {k: dct[k] for k in dct.keys() & df.columns}

df[list(tmp)] *= tmp.values()

CodePudding user response：

Use this:

import pandas as pd
df = pd.DataFrame({
    'a': [1.0, 1.1, 1.0, 4.2, 5.1],
    'b': [5.0, 4.2, 3.1, 3.2, 4.1],
    'c': [3.9, 2.0, 4.2, 3.8, 6.7],
    'd': [3.1, 2.1, 1.2, 1.0, 1.0]
})

dic = {
    'b': 0.01,
    'd': 0.001
}

dfKeyList = [key for key in df.keys()]

for key, value in dic.items():
    if key in dfKeyList:
        df[key] = df[key] * value

print(df)

CodePudding user response：

cols = list(dict.keys())
vals = list(dict.values())
df[cols] = df[cols] * vals

CodePudding user response：

for x in df:
    if x in dict:
        df[x] *= dict[x]

CodePudding user response：

Let's create a new dictionary with default multiplier of all columns as 1 and update the created dictionary with your existing multiplier dictionary

d = dict({col: 1 for col in df.columns},
         **{k:v for k, v in d.items() if k in df.columns})
# If you are sure keys in d exist in df.columns, use following instead
d = dict({col: 1 for col in df.columns}, **d)

out = df.mul(pd.DataFrame([d]).values, axis=1)

print(out)

     a      b    c       d
0  1.0  0.050  3.9  0.0031
1  1.1  0.042  2.0  0.0021
2  1.0  0.031  4.2  0.0012
3  4.2  0.032  3.8  0.0010
4  5.1  0.041  6.7  0.0010

CodePudding user response：

What is about a classic for loop?

for col_name, value in dct.items():
    if col_name in df.columns:
        df[col_name] *= value

Don't name your variables using python object names like dict!