Home > Back-end >  Error operating data with pandas (in case of missing data)
Error operating data with pandas (in case of missing data)

Time:10-28

I have the DataFrame 'indata2'. The following code allows me to add an 'Indicator' column that groups my data based on Indata2['Label'].

import pandas as pd
import numpy as np

indata2 = [[2,  'SIS X ',      9.65,    'Q'],
          [2,   'SIS X-',      5.32,    'Q'],
          [2,   'SIS Y ',      8.24,    'Q'],
          [2,   'SIS Y-',      3.27,    'Q'],
          [2,   'SIS',        3.40, 'Q'],
          [2,   'C. VIV',      0.23,    'L'],
          [2,   'SOBRE P',  0.38,   'SD'],
          [2,   'SOBRE P',  0.19,   'SD'],
#          [2,  'VIEN X ',  7.36,   'W'],
#          [2,  'VIEN X-',  23.09,  'W'],
#          [2,  'VIEN Y ',  6.66,   'W'],
#          [2,  'VIEN Y-',  2.68,   'W'],
          [4,   'SIS X ',      14.41,   'Q'],
          [4,   'SIS X-',      12.23,   'Q'],
          [4,   'SIS Y ',      10.00,   'Q'],
          [4,   'SIS Y-',      11.00,   'Q'],
          [4,   'C. VIV',      0.38,    'L'],
          [4,   'C. VIV',      0.34,    'L'],
          [4,   'C. VIV',      0.13,    'L'],
          [4,   'SOBRE P',  0.62,   'SD']]
#          [4,  'VIEN X ',  29.21,  'W'],
#          [4,  'VIEN X-',  8.70,   'W'],
#          [4,  'VIEN Y-',  7.46,   'W'],
#          [4,  'VIEN Y ',  11.62,  'W'],
#          [4,  'VIEN',      9.6,   'W']]

indata2 = pd.DataFrame(data = indata2, columns = ['KeyData', 'Text', 'AvgAbs', 'Label'])

l = indata2.Label.unique()
m = pd.DataFrame(l, columns = ['Label'])
m['Indicator'] = m.index   1

outputdata = indata2.merge(m[['Indicator','Label']],'left')

mapper = {label: i 1 for i, label in enumerate(indata2["Label"].unique())}
indata2["Indicator"] = np.select([(indata2["Label"]=="Q")&(indata2["Text"].str.contains("X")), 
                                  (indata2["Label"]=="Q")&(indata2["Text"].str.contains("Y")), 
                                  (indata2["Label"]=="W")&(indata2["Text"].str.contains("X")), 
                                  (indata2["Label"]=="W")&(indata2["Text"].str.contains("Y")),
                                  (indata2["Label"].isin(list("QW"))&~(indata2["Text"].str.contains("[X-Y]", regex=True)))
                                 ],
                                 [mapper["Q"] 0.1, mapper["Q"] 0.2, mapper["W"] 0.1, mapper["W"] 0.2, 0],
                                 indata2["Label"].map(mapper))

From the indata2['Label'] column, I have two special labels: W and Q, which are grouped based on indata2['Label'] and indata2['Text'].

Everything is fine, but an error occurs for the case when in indata2['Label'] the Q, W or both labels are not entered (which could happen because they are data). When that happens I have a type error: Key error.

Note: The commented lines of the code are for the case where the Q, W or both labels are not entered in which case it gives me error. But it works very well when I uncomment them.

What solution can be given for this case? Greetings.

CodePudding user response:

Make sure the mapper always has values for Q and W even if you don't need them:

mapper = {"Q": 1, "W": 2}
mapper.update({label: i 2 for i, label in enumerate(indata2[~indata2["Label"].isin(["Q","W"])]["Label"].unique())})
          
indata2["Indicator"] = np.select([(indata2["Label"]=="Q")&(indata2["Text"].str.contains("X")), 
                                  (indata2["Label"]=="Q")&(indata2["Text"].str.contains("Y")), 
                                  (indata2["Label"]=="W")&(indata2["Text"].str.contains("X")), 
                                  (indata2["Label"]=="W")&(indata2["Text"].str.contains("Y")),
                                  (indata2["Label"].isin(list("QW"))&~(indata2["Text"].str.contains("[X-Y]", regex=True)))
                                 ],
                                 [mapper["Q"] 0.1, mapper["Q"] 0.2, mapper["W"] 0.1, mapper["W"] 0.2, 0],
                                 indata2["Label"].map(mapper))

>>> indata2

    KeyData     Text  AvgAbs Label  Indicator
0         2   SIS X     9.65     Q        1.1
1         2   SIS X-    5.32     Q        1.1
2         2   SIS Y     8.24     Q        1.2
3         2   SIS Y-    3.27     Q        1.2
4         2      SIS    3.40     Q        0.0
5         2   C. VIV    0.23     L        2.0
6         2  SOBRE P    0.38    SD        3.0
7         2  SOBRE P    0.19    SD        3.0
8         4   SIS X    14.41     Q        1.1
9         4   SIS X-   12.23     Q        1.1
10        4   SIS Y    10.00     Q        1.2
11        4   SIS Y-   11.00     Q        1.2
12        4   C. VIV    0.38     L        2.0
13        4   C. VIV    0.34     L        2.0
14        4   C. VIV    0.13     L        2.0
15        4  SOBRE P    0.62    SD        3.0

As an aside, you don't need any of the following lines (and should remove them from your code):

l = indata2.Label.unique()
m = pd.DataFrame(l, columns = ['Label'])
m['Indicator'] = m.index   1

outputdata = indata2.merge(m[['Indicator','Label']],'left')
  • Related