I'm trying to create a dictionary with various details about a fighter in the UFC.
I have a list that contains the information I need but, I cannot format the strings inside of the list correctly.
My current code.
DESCRIPTION = s.find_all('li', {'class': 'b-list__box-list-item b-list__box-list-item_type_block'})
text_only = []
for info in DESCRIPTION:
text_only.append(info.text.strip())
pattern = re.compile(r":\s*")
temp = [ pattern.sub(": ", datum) for datum in text_only]
print(temp)
RAW Text from DESCRIPTION
[<li >
<i >
Height:
</i>
--
</li>, <li >
<i >
Weight:
</i>
145 lbs.
</li>, <li >
<i >
Reach:
</i>
--
</li>, <li >
<i >
STANCE:
</i>
</li>, <li >
<i >
DOB:
</i>
--
</li>, <li >
<i >
SLpM:
</i>
0.00
</li>, <li >
<i >
Str. Acc.:
</i>
0%
</li>, <li >
<i >
SApM:
</i>
0.00
</li>, <li >
<i >
Str. Def:
</i>
0%
</li>, <li >
<i >
</i>
</li>, <li >
<i >
TD Avg.:
</i>
0.00
</li>, <li >
<i >
TD Acc.:
</i>
0%
</li>, <li >
<i >
TD Def.:
</i>
0%
</li>, <li >
<i >
Sub. Avg.:
</i>
0.0
</li>]
[<li >
<i >
Height:
</i>
5' 9"
</li>, <li >
<i >
Weight:
</i>
185 lbs.
</li>, <li >
<i >
Reach:
</i>
--
</li>, <li >
<i >
STANCE:
</i>
</li>, <li >
<i >
DOB:
</i>
--
</li>, <li >
<i >
SLpM:
</i>
7.64
</li>, <li >
<i >
Str. Acc.:
</i>
38%
</li>, <li >
<i >
SApM:
</i>
5.45
</li>, <li >
<i >
Str. Def:
</i>
37%
</li>, <li >
<i >
</i>
</li>, <li >
<i >
TD Avg.:
</i>
0.00
</li>, <li >
<i >
TD Acc.:
</i>
0%
</li>, <li >
<i >
TD Def.:
</i>
100%
</li>, <li >
<i >
Sub. Avg.:
</i>
0.0
</li>]
[<li >
<i >
Height:
</i>
5' 7"
</li>, <li >
<i >
Weight:
</i>
155 lbs.
</li>, <li >
<i >
Reach:
</i>
70"
</li>, <li >
<i >
STANCE:
</i>
Orthodox
</li>, <li >
<i >
DOB:
</i>
Apr 04, 1992
</li>, <li >
<i >
SLpM:
</i>
3.93
</li>, <li >
<i >
Str. Acc.:
</i>
52%
</li>, <li >
<i >
SApM:
</i>
1.80
</li>, <li >
<i >
Str. Def:
</i>
61%
</li>, <li >
<i >
</i>
</li>, <li >
<i >
TD Avg.:
</i>
0.00
</li>, <li >
<i >
TD Acc.:
</i>
0%
</li>, <li >
<i >
TD Def.:
</i>
57%
</li>, <li >
<i >
Sub. Avg.:
</i>
1.0
</li>]
[<li >
<i >
Height:
</i>
6' 2"
</li>, <li >
<i >
Weight:
</i>
205 lbs.
</li>, <li >
<i >
Reach:
</i>
74"
</li>, <li >
<i >
STANCE:
</i>
</li>, <li >
<i >
DOB:
</i>
Jun 26, 1982
</li>, <li >
<i >
SLpM:
</i>
3.34
</li>, <li >
<i >
Str. Acc.:
</i>
48%
</li>, <li >
<i >
SApM:
</i>
4.87
</li>, <li >
<i >
Str. Def:
</i>
39%
</li>, <li >
<i >
</i>
</li>, <li >
<i >
TD Avg.:
</i>
1.31
</li>, <li >
<i >
TD Acc.:
</i>
30%
</li>, <li >
<i >
TD Def.:
</i>
50%
</li>, <li >
<i >
Sub. Avg.:
</i>
0.0
</li>]
My output.
['Height: --', 'Weight: 145 lbs.', 'Reach: --', 'STANCE: ', 'DOB: --', 'SLpM: 0.00', 'Str. Acc.: 0%', 'SApM: 0.00', 'Str. Def: 0%', '', 'TD Avg.: 0.00', 'TD Acc.: 0%', 'TD Def.: 0%', 'Sub. Avg.: 0.0']
What I need.
['--', '145 lbs.', '--', ' ', '--', '0.00', '0%', '0.00', '0%', '', '0.00', '0%', '0%', '.0']
I've tried using partition(), but it creates another tuple which will increase my runtime immensely.
CodePudding user response:
I would prefer to use str.split()
instead of a regular expression.
temp = [key_val.split(":")[-1].strip() for key_val in text_only]