I have a whole output csv of data I want to plug into a regression model, containing of course, a dependent variable column and associated values, and multiple independent variable columns (parameters), and those associated values. The problem is that the values I have for my independent variables are not the "real' values, they are just placeholders that correspond to the actual values. I am using python and pandas for the dataframe.
I have this dataframe ("Param" for parameter):
Color Zone Dependent.var Param_1 Param_2 Param_3
0 Blue A XX 0 1 3
1 Blue A XX 2 1 1
2 Blue A XX 0 0 1
3 Blue B XX 1 0 0
4 Blue B XX 2 2 2
5 Blue B XX 2 1 1
6 Blue C XX 2 0 0
7 Blue C XX 2 2 3
8 Blue C XX 3 1 1
9 Red A XX 3 2 1
10 Red A XX 0 1 3
11 Red A XX 3 3 2
12 Red B XX 2 1 1
13 Red B XX 0 1 3
14 Red B XX 1 0 0
15 Red C XX 0 2 1
16 Red C XX 1 1 0
17 Red C XX 3 2 3
18 Green A XX 2 2 3
19 Green A XX 0 1 3
20 Green A XX 3 2 2
21 Green B XX 2 1 1
22 Green B XX 0 2 3
23 Green B XX 2 3 2
24 Green C XX 0 1 1
25 Green C XX 3 1 0
26 Green C XX 3 2 3
Sorry this is so long, but I felt necessary to communicate the data structure. Now I have 9 separate dictionaries (or keys, I get confused between the terms), 1 for each Color and Zone combination, that contain the "translation" of the initial placeholder values with their "real" values", and they look like this, where the "real" values are different between each Color/Zone:
For Blue (1 for Blue/A, 1 for Blue/B, and 1 for Blue/C):
0 1 2 3
------------------------
Param_1 XX XX XX XX
Param_2 XX XX XX XX
Param_3 XX XX XX XX
For Red (1 for Red/A, 1 for Red/B, and 1 for Red/C):
0 1 2 3
------------------------
Param_1 YY YY YY YY
Param_2 YY YY YY YY
Param_3 YY YY YY YY
For Green (1 for Green/A, 1 for Green/B, and 1 for Green/C):
0 1 2 3
------------------------
Param_1 ZZ ZZ ZZ ZZ
Param_2 ZZ ZZ ZZ ZZ
Param_3 ZZ ZZ ZZ ZZ
I want to take these "parameter value maps" and then use them to replace the numbers in the initial dataframe. Is there a simple way to do this in python? I just get really confused with how to address the matter where there are three separate Colors, and 3 separate zones, with 9 separate sets of replacement values. I am thinking this would involve a for loop to iterate though the "Colors" and "Zones", but I am not sure.
CodePudding user response:
Since you don't give a specific mapping, I'll use this example mapping:
PARAMS = ["Param_1", "Param_2", "Param_3"]
PARAM_MAP = {'Red': {'Param_1': {0: 'P1_R0', 1: 'P1_R1', 2: 'P1_R2', 3: 'P1_R3'},
'Param_2': {0: 'P2_R0', 1: 'P2_R1', 2: 'P2_R2', 3: 'P2_R3'},
'Param_3': {0: 'P3_R0', 1: 'P3_R1', 2: 'P3_R2', 3: 'P3_R3'}},
'Green': {'Param_1': {0: 'P1_G0', 1: 'P1_G1', 2: 'P1_G2', 3: 'P1_G3'},
'Param_2': {0: 'P2_G0', 1: 'P2_G1', 2: 'P2_G2', 3: 'P2_G3'},
'Param_3': {0: 'P3_G0', 1: 'P3_G1', 2: 'P3_G2', 3: 'P3_G3'}},
'Blue': {'Param_1': {0: 'P1_B0', 1: 'P1_B1', 2: 'P1_B2', 3: 'P1_B3'},
'Param_2': {0: 'P2_B0', 1: 'P2_B1', 2: 'P2_B2', 3: 'P2_B3'},
'Param_3': {0: 'P3_B0', 1: 'P3_B1', 2: 'P3_B2', 3: 'P3_B3'}}}
And use this function which we'll apply to the dataframe:
def mapping(row):
color_map = PARAM_MAP[row["Color"]]
return pd.Series({p: color_map[p][v] for p, v in zip(PARAMS, row[PARAMS])})
Now apply the mapping and reassign:
df[PARAMS] = df.apply(mapping, axis=1)
Output:
Color Zone Dependent.var Param_1 Param_2 Param_3
0 Blue A XX P1_B0 P2_B1 P3_B3
1 Blue A XX P1_B2 P2_B1 P3_B1
2 Blue A XX P1_B0 P2_B0 P3_B1
3 Blue B XX P1_B1 P2_B0 P3_B0
4 Blue B XX P1_B2 P2_B2 P3_B2
5 Blue B XX P1_B2 P2_B1 P3_B1
6 Blue C XX P1_B2 P2_B0 P3_B0
7 Blue C XX P1_B2 P2_B2 P3_B3
8 Blue C XX P1_B3 P2_B1 P3_B1
9 Red A XX P1_R3 P2_R2 P3_R1
10 Red A XX P1_R0 P2_R1 P3_R3
11 Red A XX P1_R3 P2_R3 P3_R2
12 Red B XX P1_R2 P2_R1 P3_R1
13 Red B XX P1_R0 P2_R1 P3_R3
14 Red B XX P1_R1 P2_R0 P3_R0
15 Red C XX P1_R0 P2_R2 P3_R1
16 Red C XX P1_R1 P2_R1 P3_R0
17 Red C XX P1_R3 P2_R2 P3_R3
18 Green A XX P1_G2 P2_G2 P3_G3
19 Green A XX P1_G0 P2_G1 P3_G3
20 Green A XX P1_G3 P2_G2 P3_G2
21 Green B XX P1_G2 P2_G1 P3_G1
22 Green B XX P1_G0 P2_G2 P3_G3
23 Green B XX P1_G2 P2_G3 P3_G2
24 Green C XX P1_G0 P2_G1 P3_G1
25 Green C XX P1_G3 P2_G1 P3_G0
26 Green C XX P1_G3 P2_G2 P3_G3
I imagine there may be a better way, perhaps with groupby
, but I couldn't figure it out.