My problem: In a column set, replace the values in each column based on a chosen dictionary. I have two dictionaries, each with a set of keys (column names), and each key has two distinct values. In the end, sum the values of this set of columns and save them in a new column.
df = pd.DataFrame({'var1': [1,0,1,0,0,0,1], 'var2': [1,0,1,0,1,0,0], 'var3': [1,0,1,0,1,0,0]})
dict = {
dict1 {"var1" : {0:0, 1:19}, "var2" : {0:0, 1:6}, "var3" : {0:0, 1:9}},
dict2 {"var1" : {0:0, 1:0}, "var2" : {0:0, 1:-1}, "var3" : {0:0, 1:-2}}
}
def get_values(df, dict):
values = []
for i in df.columns:
np.where(df[i] == 1,
df[i].map(dict.key[dict]),
0)
values = np.sum(df,axis=1).tolist()
return values
dict1 = get_values(df, dict1)
dict2 = get_values(df, dict2)
I looked for similar posts involving mapping functions and nested dictionaries but couldn't find anything suitable for my query.
CodePudding user response:
Here is a way to do what I think you are asking:
import numpy as np
def replaceAndSumValues(df, d):
out = df.transform(lambda x, dct: np.where(x, dct[x.name][1], dct[x.name][0]), dct=d)
return out.assign(sum=out.sum(axis=1))
Explanation:
Given an input dataframe and a nested dict that, for each column label, maps column values to replacement values, the function replaceAndSumValues()
will:
- use
transform()
to iterate over the columns in the dataframe and callnp.where()
to replace the values in each column based on the nested dictionary for that column name - create a new column
sum
which contains the sum of the values in all columns for each row.
Full test code:
import pandas as pd
df = pd.DataFrame({'var1': [1,0,1,0,0,0,1], 'var2': [1,0,1,0,1,0,0], 'var3': [1,0,1,0,1,0,0]})
print(df)
dicts = {
'dict1': {"var1" : {0:0, 1:19}, "var2" : {0:0, 1:6}, "var3" : {0:0, 1:9}},
'dict2': {"var1" : {0:0, 1:0}, "var2" : {0:0, 1:-1}, "var3" : {0:0, 1:-2}}
}
import numpy as np
def replaceAndSumValues(df, d):
out = df.transform(lambda x, dct: np.where(x, dct[x.name][1], dct[x.name][0]), dct=d)
return out.assign(sum=out.sum(axis=1))
df1 = replaceAndSumValues(df, dicts['dict1'])
print('df1:', df1, sep='\n')
df2 = replaceAndSumValues(df, dicts['dict2'])
print('df2:', df2, sep='\n')
Input:
var1 var2 var3
0 1 1 1
1 0 0 0
2 1 1 1
3 0 0 0
4 0 1 1
5 0 0 0
6 1 0 0
Output:
df1:
var1 var2 var3 sum
0 19 6 9 34
1 0 0 0 0
2 19 6 9 34
3 0 0 0 0
4 0 6 9 15
5 0 0 0 0
6 19 0 0 19
df2:
var1 var2 var3 sum
0 0 -1 -2 -3
1 0 0 0 0
2 0 -1 -2 -3
3 0 0 0 0
4 0 -1 -2 -3
5 0 0 0 0
6 0 0 0 0