Starting from a dataframe like the below (simplified example of my real case):
import pandas as pd
df = pd.DataFrame({
'a': [1.0, 1.1, 1.0, 4.2, 5.1],
'b': [5.0, 4.2, 3.1, 3.2, 4.1],
'c': [3.9, 2.0, 4.2, 3.8, 6.7],
'd': [3.1, 2.1, 1.2, 1.0, 1.0]
})
And then taking a dictionary containing some multipliers I want to multiply certain columns in the dataframe by:
dict = {
"b": 0.01,
"d": 0.001
}
i.e. I want to check if each column in the dataframe is in my dictionary, and if it does exist as a key, then multiply that column of the dataframe by the value in the dictionary. In this example, I would want to multiply column 'b' by 0.01 and column 'd' by 0.001. I would end up with:
'a': [1.0, 1.1, 1.0, 4.2, 5.1],
'b': [0.05, 0.042, 0.031, 0.032, 0.041],
'c': [3.9, 2.0, 4.2, 3.8, 6.7],
'd': [0.0031, 0.0021, 0.0012, 0.001, 0.001]
In my real example, the dataframe is a cleaned-up set of data read in from Excel, and the dictionary of multipliers is read in from a config file, to allow users to specify which columns need converting from whatever is in Excel to the desired/expected units of measure (e.g. converting 'g/h' in the raw data to 'kg/h' in the dataframe).
What are some good, clear ways of achieving this intent, even if I have to restructure the implementation a bit?
CodePudding user response:
Try:
df[list(dct)] *= dct.values()
print(df)
Prints:
a b c d
0 1.0 0.050 3.9 0.0031
1 1.1 0.042 2.0 0.0021
2 1.0 0.031 4.2 0.0012
3 4.2 0.032 3.8 0.0010
4 5.1 0.041 6.7 0.0010
If in dct
are keys not in dataframe:
tmp = {k: dct[k] for k in dct.keys() & df.columns}
df[list(tmp)] *= tmp.values()
CodePudding user response:
Use this:
import pandas as pd
df = pd.DataFrame({
'a': [1.0, 1.1, 1.0, 4.2, 5.1],
'b': [5.0, 4.2, 3.1, 3.2, 4.1],
'c': [3.9, 2.0, 4.2, 3.8, 6.7],
'd': [3.1, 2.1, 1.2, 1.0, 1.0]
})
dic = {
'b': 0.01,
'd': 0.001
}
dfKeyList = [key for key in df.keys()]
for key, value in dic.items():
if key in dfKeyList:
df[key] = df[key] * value
print(df)
CodePudding user response:
cols = list(dict.keys())
vals = list(dict.values())
df[cols] = df[cols] * vals
CodePudding user response:
for x in df:
if x in dict:
df[x] *= dict[x]
CodePudding user response:
Let's create a new dictionary with default multiplier of all columns as 1 and update the created dictionary with your existing multiplier dictionary
d = dict({col: 1 for col in df.columns},
**{k:v for k, v in d.items() if k in df.columns})
# If you are sure keys in d exist in df.columns, use following instead
d = dict({col: 1 for col in df.columns}, **d)
out = df.mul(pd.DataFrame([d]).values, axis=1)
print(out)
a b c d
0 1.0 0.050 3.9 0.0031
1 1.1 0.042 2.0 0.0021
2 1.0 0.031 4.2 0.0012
3 4.2 0.032 3.8 0.0010
4 5.1 0.041 6.7 0.0010
CodePudding user response:
What is about a classic for loop?
for col_name, value in dct.items():
if col_name in df.columns:
df[col_name] *= value
Don't name your variables using python object names like dict
!