Group data by conditional using pandas-CodePudding

I have an entry dataframe 'Indata2', whose data I must group according to 'Indata [' Label ']', generating a new column called 'Indicator', this I already have it, what I should do is correct this last column a little .

import pandas as pd   
indata2 = [[2,  'SIS X ',      9.65,    'Q'],
          [2,   'SIS X-',      5.32,    'Q'],
          [2,   'SIS Y ',      8.24,    'Q'],
          [2,   'SIS Y-',      3.27,    'Q'],
          [2,   'SIS',        3.40, 'Q'],
          [2,   'C. VIV',      0.23,    'L'],
          [2,   'SOBRE P',  0.38,   'SD'],
          [2,   'SOBRE P',  0.19,   'SD'],
          [2,   'VIEN X ',  7.36,   'W'],
          [2,   'VIEN X-',  23.09,  'W'],
          [2,   'VIEN Y ',  6.66,   'W'],
          [2,   'VIEN Y-',  2.68,   'W'],
          [4,   'SIS X ',      14.41,   'Q'],
          [4,   'SIS X-',      12.23,   'Q'],
          [4,   'SIS Y ',      10.00,   'Q'],
          [4,   'SIS Y-',      11.00,   'Q'],
          [4,   'C. VIV',      0.38,    'L'],
          [4,   'C. VIV',      0.34,    'L'],
          [4,   'C. VIV',      0.13,    'L'],
          [4,   'SOBRE P',  0.62,   'SD'],
          [4,   'VIEN X ',  29.21,  'W'],
          [4,   'VIEN X-',  8.70,   'W'],
          [4,   'VIEN Y-',  7.46,   'W'],
          [4,   'VIEN Y ',  11.62,  'W'],
          [4,   'VIEN',      9.6,   'W']] 
indata2 = pd.DataFrame(data = indata2, columns = ['KeyData', 'Text', 'AvgAbs', 'Label'])

l = indata2.Label.unique()
m = pd.DataFrame(l, columns = ['Label'])
m['Indicator'] = m.index   1
outputdata = indata2.merge(m[['Indicator','Label']],'left')
# Correct outputdata['Indicator']

To correct I have in 'Label' 2 special tags: 'W' and 'Q'.

You can realize that for each of them there is a text: For Q: [SIS X , SIS X-, SIS Y , SIS Y-] and for W: [VIEN X , VIEN X-, VIEN Y , VIEN Y-] . All these data are already converted to uppercase.

Then, for the new column 'Indicator': all the 'W' values containing in the 'Text' column at least one letter 'X' must have an indicator, and when containing the 'Text' column at least one letter 'Y' Other indicator, this is to differentiate them.

Note: If there is no 'X' or 'Y' in the 'TEXT' column, then the 'indicator' = 0.

The result (outputdata) should be:

KeyData    Text      AvgAbs  Label    Indicator
2         SIS X        9.65      Q         1.1
2         SIS X-       5.32      Q         1.1
2         SIS Y        8.24      Q         1.2
2         SIS Y-       3.27      Q         1.2
2         SIS          3.40      Q         0
2         C. VIV       0.23      L         2
2         SOBRE P      0.38     SD         3
2         SOBRE P      0.19     SD         3
2         VIEN X       7.36      W         4.1
2         VIEN X-      23.09     W         4.1
2         VIEN Y       6.66      W         4.2
2         VIEN Y-      2.68      W         4.2
4         SIS X        14.41     Q         1.1
4         SIS X-       12.23     Q         1.1
4         SIS Y        10.00     Q         1.2
4         SIS Y-       11.00     Q         1.2
4         C. VIV       0.38      L         2
4         C. VIV       0.34      L         2
4         C. VIV       0.13      L         2
4         SOBRE P      0.62     SD         3
4         VIEN X       29.21     W         4.1
4         VIEN X-      8.70      W         4.1
4         VIEN Y-      7.46      W         4.2
4         VIEN Y       11.62     W         4.2
4         VIEN         9.60      W         0

CodePudding user response：

Try with map and numpy.select:

mapper = {label: i 1 for i, label in enumerate(indata2["Label"].unique())}
indata2["Indicator"] = np.select([(indata2["Label"]=="Q")&(indata2["Text"].str.contains("X")), 
                                  (indata2["Label"]=="Q")&(indata2["Text"].str.contains("Y")), 
                                  (indata2["Label"]=="W")&(indata2["Text"].str.contains("X")), 
                                  (indata2["Label"]=="W")&(indata2["Text"].str.contains("Y")),
                                  (indata2["Label"].isin(list("QW"))&~(indata2["Text"].str.contains("[X-Y]", regex=True)))
                                 ],
                                 [mapper["Q"] 0.1, mapper["Q"] 0.2, mapper["W"] 0.1, mapper["W"] 0.2, 0],
                                 indata2["Label"].map(mapper))

>>> indata2

    KeyData     Text  AvgAbs Label  Indicator
0         2   SIS X     9.65     Q        1.1
1         2   SIS X-    5.32     Q        1.1
2         2   SIS Y     8.24     Q        1.2
3         2   SIS Y-    3.27     Q        1.2
4         2      SIS    3.40     Q        0.0
5         2   C. VIV    0.23     L        2.0
6         2  SOBRE P    0.38    SD        3.0
7         2  SOBRE P    0.19    SD        3.0
8         2  VIEN X     7.36     W        4.1
9         2  VIEN X-   23.09     W        4.1
10        2  VIEN Y     6.66     W        4.2
11        2  VIEN Y-    2.68     W        4.2
12        4   SIS X    14.41     Q        1.1
13        4   SIS X-   12.23     Q        1.1
14        4   SIS Y    10.00     Q        1.2
15        4   SIS Y-   11.00     Q        1.2
16        4   C. VIV    0.38     L        2.0
17        4   C. VIV    0.34     L        2.0
18        4   C. VIV    0.13     L        2.0
19        4  SOBRE P    0.62    SD        3.0
20        4  VIEN X    29.21     W        4.1
21        4  VIEN X-    8.70     W        4.1
22        4  VIEN Y-    7.46     W        4.2
23        4  VIEN Y    11.62     W        4.2
24        4     VIEN    9.60     W        0.0