I have an entry dataframe 'Indata2', whose data I must group according to 'Indata [' Label ']', generating a new column called 'Indicator', this I already have it, what I should do is correct this last column a little .
import pandas as pd
indata2 = [[2, 'SIS X ', 9.65, 'Q'],
[2, 'SIS X-', 5.32, 'Q'],
[2, 'SIS Y ', 8.24, 'Q'],
[2, 'SIS Y-', 3.27, 'Q'],
[2, 'SIS', 3.40, 'Q'],
[2, 'C. VIV', 0.23, 'L'],
[2, 'SOBRE P', 0.38, 'SD'],
[2, 'SOBRE P', 0.19, 'SD'],
[2, 'VIEN X ', 7.36, 'W'],
[2, 'VIEN X-', 23.09, 'W'],
[2, 'VIEN Y ', 6.66, 'W'],
[2, 'VIEN Y-', 2.68, 'W'],
[4, 'SIS X ', 14.41, 'Q'],
[4, 'SIS X-', 12.23, 'Q'],
[4, 'SIS Y ', 10.00, 'Q'],
[4, 'SIS Y-', 11.00, 'Q'],
[4, 'C. VIV', 0.38, 'L'],
[4, 'C. VIV', 0.34, 'L'],
[4, 'C. VIV', 0.13, 'L'],
[4, 'SOBRE P', 0.62, 'SD'],
[4, 'VIEN X ', 29.21, 'W'],
[4, 'VIEN X-', 8.70, 'W'],
[4, 'VIEN Y-', 7.46, 'W'],
[4, 'VIEN Y ', 11.62, 'W'],
[4, 'VIEN', 9.6, 'W']]
indata2 = pd.DataFrame(data = indata2, columns = ['KeyData', 'Text', 'AvgAbs', 'Label'])
l = indata2.Label.unique()
m = pd.DataFrame(l, columns = ['Label'])
m['Indicator'] = m.index 1
outputdata = indata2.merge(m[['Indicator','Label']],'left')
# Correct outputdata['Indicator']
To correct I have in 'Label' 2 special tags: 'W' and 'Q'.
You can realize that for each of them there is a text: For Q: [SIS X , SIS X-, SIS Y , SIS Y-] and for W: [VIEN X , VIEN X-, VIEN Y , VIEN Y-] . All these data are already converted to uppercase.
Then, for the new column 'Indicator': all the 'W' values containing in the 'Text' column at least one letter 'X' must have an indicator, and when containing the 'Text' column at least one letter 'Y' Other indicator, this is to differentiate them.
Note: If there is no 'X' or 'Y' in the 'TEXT' column, then the 'indicator' = 0.
The result (outputdata) should be:
KeyData Text AvgAbs Label Indicator
2 SIS X 9.65 Q 1.1
2 SIS X- 5.32 Q 1.1
2 SIS Y 8.24 Q 1.2
2 SIS Y- 3.27 Q 1.2
2 SIS 3.40 Q 0
2 C. VIV 0.23 L 2
2 SOBRE P 0.38 SD 3
2 SOBRE P 0.19 SD 3
2 VIEN X 7.36 W 4.1
2 VIEN X- 23.09 W 4.1
2 VIEN Y 6.66 W 4.2
2 VIEN Y- 2.68 W 4.2
4 SIS X 14.41 Q 1.1
4 SIS X- 12.23 Q 1.1
4 SIS Y 10.00 Q 1.2
4 SIS Y- 11.00 Q 1.2
4 C. VIV 0.38 L 2
4 C. VIV 0.34 L 2
4 C. VIV 0.13 L 2
4 SOBRE P 0.62 SD 3
4 VIEN X 29.21 W 4.1
4 VIEN X- 8.70 W 4.1
4 VIEN Y- 7.46 W 4.2
4 VIEN Y 11.62 W 4.2
4 VIEN 9.60 W 0
CodePudding user response:
Try with map
and numpy.select
:
mapper = {label: i 1 for i, label in enumerate(indata2["Label"].unique())}
indata2["Indicator"] = np.select([(indata2["Label"]=="Q")&(indata2["Text"].str.contains("X")),
(indata2["Label"]=="Q")&(indata2["Text"].str.contains("Y")),
(indata2["Label"]=="W")&(indata2["Text"].str.contains("X")),
(indata2["Label"]=="W")&(indata2["Text"].str.contains("Y")),
(indata2["Label"].isin(list("QW"))&~(indata2["Text"].str.contains("[X-Y]", regex=True)))
],
[mapper["Q"] 0.1, mapper["Q"] 0.2, mapper["W"] 0.1, mapper["W"] 0.2, 0],
indata2["Label"].map(mapper))
>>> indata2
KeyData Text AvgAbs Label Indicator
0 2 SIS X 9.65 Q 1.1
1 2 SIS X- 5.32 Q 1.1
2 2 SIS Y 8.24 Q 1.2
3 2 SIS Y- 3.27 Q 1.2
4 2 SIS 3.40 Q 0.0
5 2 C. VIV 0.23 L 2.0
6 2 SOBRE P 0.38 SD 3.0
7 2 SOBRE P 0.19 SD 3.0
8 2 VIEN X 7.36 W 4.1
9 2 VIEN X- 23.09 W 4.1
10 2 VIEN Y 6.66 W 4.2
11 2 VIEN Y- 2.68 W 4.2
12 4 SIS X 14.41 Q 1.1
13 4 SIS X- 12.23 Q 1.1
14 4 SIS Y 10.00 Q 1.2
15 4 SIS Y- 11.00 Q 1.2
16 4 C. VIV 0.38 L 2.0
17 4 C. VIV 0.34 L 2.0
18 4 C. VIV 0.13 L 2.0
19 4 SOBRE P 0.62 SD 3.0
20 4 VIEN X 29.21 W 4.1
21 4 VIEN X- 8.70 W 4.1
22 4 VIEN Y- 7.46 W 4.2
23 4 VIEN Y 11.62 W 4.2
24 4 VIEN 9.60 W 0.0