I have a number of nested lists from a web scraped table that I want to 'clean' by removing unhelpful HTML characters. They look like this:
example_list = ['12.7x55 mm PS12B',
'<td style="border-bottom:solid 2px">102\n</td>',
'<td style="border-bottom:solid 2px">46\n</td>',
'<td style="border-bottom:solid 2px">57\n</td>',
'<td style="border-bottom:solid 2px; background-color:#00990080;">6\n</td>',
'<td style="border-bottom:solid 2px; background-color:#00640080;">5\n</td>',
'<td style="border-bottom:solid 2px; background-color:#FB9C0E80;">4\n</td>']
I would like it to look like this:
my_list = ['12.7x55 mm PS12B', '102', '46', '57', '6', '5', '4']
I tried simple comprehensions:
my_list[1:] = [i.replace('\n</td>', '') for i in list] # works perfectly
my_list[1:] = [i.replace('<td>', '') for i in list] # works perfectly
# for example the second item in the list is now `102`
# not `<td style="border-bottom:solid 2px">102\n</td>`
but when I try to edit the last six elements using a more specific comprehension:
my_list[1:] = [i.replace(i, i[-1]) for i in list if "back" in i]
It deletes all other list elements that I have just extracted, and I end up with:
my_list = ['12.7x55 mm PS12B', '6', '5', '4']
I am sure being HTML there is a less obscure method to do this (which I would appreciate knowing) but my main concern is that I don't understand what's going on with a simple python comprehension.
CodePudding user response:
The rest of the elements are filtered out by the if
condition in the comprehension. If you wish to keep them, you need to add the else
clause:
my_list[1:] = [
i.replace(i, i[-1])
for i in list
if "back" in i
else i # or however you wish to process the rest of the elements
]