I have the DataFrame 'indata2'. The following code allows me to add an 'Indicator' column that groups my data based on Indata2['Label'].
import pandas as pd
import numpy as np
indata2 = [[2, 'SIS X ', 9.65, 'Q'],
[2, 'SIS X-', 5.32, 'Q'],
[2, 'SIS Y ', 8.24, 'Q'],
[2, 'SIS Y-', 3.27, 'Q'],
[2, 'SIS', 3.40, 'Q'],
[2, 'C. VIV', 0.23, 'L'],
[2, 'SOBRE P', 0.38, 'SD'],
[2, 'SOBRE P', 0.19, 'SD'],
# [2, 'VIEN X ', 7.36, 'W'],
# [2, 'VIEN X-', 23.09, 'W'],
# [2, 'VIEN Y ', 6.66, 'W'],
# [2, 'VIEN Y-', 2.68, 'W'],
[4, 'SIS X ', 14.41, 'Q'],
[4, 'SIS X-', 12.23, 'Q'],
[4, 'SIS Y ', 10.00, 'Q'],
[4, 'SIS Y-', 11.00, 'Q'],
[4, 'C. VIV', 0.38, 'L'],
[4, 'C. VIV', 0.34, 'L'],
[4, 'C. VIV', 0.13, 'L'],
[4, 'SOBRE P', 0.62, 'SD']]
# [4, 'VIEN X ', 29.21, 'W'],
# [4, 'VIEN X-', 8.70, 'W'],
# [4, 'VIEN Y-', 7.46, 'W'],
# [4, 'VIEN Y ', 11.62, 'W'],
# [4, 'VIEN', 9.6, 'W']]
indata2 = pd.DataFrame(data = indata2, columns = ['KeyData', 'Text', 'AvgAbs', 'Label'])
l = indata2.Label.unique()
m = pd.DataFrame(l, columns = ['Label'])
m['Indicator'] = m.index 1
outputdata = indata2.merge(m[['Indicator','Label']],'left')
mapper = {label: i 1 for i, label in enumerate(indata2["Label"].unique())}
indata2["Indicator"] = np.select([(indata2["Label"]=="Q")&(indata2["Text"].str.contains("X")),
(indata2["Label"]=="Q")&(indata2["Text"].str.contains("Y")),
(indata2["Label"]=="W")&(indata2["Text"].str.contains("X")),
(indata2["Label"]=="W")&(indata2["Text"].str.contains("Y")),
(indata2["Label"].isin(list("QW"))&~(indata2["Text"].str.contains("[X-Y]", regex=True)))
],
[mapper["Q"] 0.1, mapper["Q"] 0.2, mapper["W"] 0.1, mapper["W"] 0.2, 0],
indata2["Label"].map(mapper))
From the indata2['Label'] column, I have two special labels: W and Q, which are grouped based on indata2['Label'] and indata2['Text'].
Everything is fine, but an error occurs for the case when in indata2['Label'] the Q, W or both labels are not entered (which could happen because they are data). When that happens I have a type error: Key error.
Note: The commented lines of the code are for the case where the Q, W or both labels are not entered in which case it gives me error. But it works very well when I uncomment them.
What solution can be given for this case? Greetings.
CodePudding user response:
Make sure the mapper always has values for Q and W even if you don't need them:
mapper = {"Q": 1, "W": 2}
mapper.update({label: i 2 for i, label in enumerate(indata2[~indata2["Label"].isin(["Q","W"])]["Label"].unique())})
indata2["Indicator"] = np.select([(indata2["Label"]=="Q")&(indata2["Text"].str.contains("X")),
(indata2["Label"]=="Q")&(indata2["Text"].str.contains("Y")),
(indata2["Label"]=="W")&(indata2["Text"].str.contains("X")),
(indata2["Label"]=="W")&(indata2["Text"].str.contains("Y")),
(indata2["Label"].isin(list("QW"))&~(indata2["Text"].str.contains("[X-Y]", regex=True)))
],
[mapper["Q"] 0.1, mapper["Q"] 0.2, mapper["W"] 0.1, mapper["W"] 0.2, 0],
indata2["Label"].map(mapper))
>>> indata2
KeyData Text AvgAbs Label Indicator
0 2 SIS X 9.65 Q 1.1
1 2 SIS X- 5.32 Q 1.1
2 2 SIS Y 8.24 Q 1.2
3 2 SIS Y- 3.27 Q 1.2
4 2 SIS 3.40 Q 0.0
5 2 C. VIV 0.23 L 2.0
6 2 SOBRE P 0.38 SD 3.0
7 2 SOBRE P 0.19 SD 3.0
8 4 SIS X 14.41 Q 1.1
9 4 SIS X- 12.23 Q 1.1
10 4 SIS Y 10.00 Q 1.2
11 4 SIS Y- 11.00 Q 1.2
12 4 C. VIV 0.38 L 2.0
13 4 C. VIV 0.34 L 2.0
14 4 C. VIV 0.13 L 2.0
15 4 SOBRE P 0.62 SD 3.0
As an aside, you don't need any of the following lines (and should remove them from your code):
l = indata2.Label.unique()
m = pd.DataFrame(l, columns = ['Label'])
m['Indicator'] = m.index 1
outputdata = indata2.merge(m[['Indicator','Label']],'left')