Home > other >  Editing list elements using comprehension deletes part of list
Editing list elements using comprehension deletes part of list

Time:11-04

I have a number of nested lists from a web scraped table that I want to 'clean' by removing unhelpful HTML characters. They look like this:

example_list = ['12.7x55 mm PS12B',
  '<td style="border-bottom:solid 2px">102\n</td>',
  '<td style="border-bottom:solid 2px">46\n</td>',
  '<td style="border-bottom:solid 2px">57\n</td>',
  '<td style="border-bottom:solid 2px; background-color:#00990080;">6\n</td>',
  '<td style="border-bottom:solid 2px; background-color:#00640080;">5\n</td>',
  '<td style="border-bottom:solid 2px; background-color:#FB9C0E80;">4\n</td>']

I would like it to look like this:

my_list =  ['12.7x55 mm PS12B', '102', '46', '57', '6', '5', '4']

I tried simple comprehensions:

my_list[1:] = [i.replace('\n</td>', '') for i in list] # works perfectly
my_list[1:] = [i.replace('<td>', '') for i in list] # works perfectly
# for example the second item in the list is now `102`
# not `<td style="border-bottom:solid 2px">102\n</td>`

but when I try to edit the last six elements using a more specific comprehension:

my_list[1:] = [i.replace(i, i[-1]) for i in list if "back" in i]

It deletes all other list elements that I have just extracted, and I end up with:

my_list =  ['12.7x55 mm PS12B', '6', '5', '4']

I am sure being HTML there is a less obscure method to do this (which I would appreciate knowing) but my main concern is that I don't understand what's going on with a simple python comprehension.

CodePudding user response:

The rest of the elements are filtered out by the if condition in the comprehension. If you wish to keep them, you need to add the else clause:

my_list[1:] = [
    i.replace(i, i[-1])
    for i in list
    if "back" in i
    else i  # or however you wish to process the rest of the elements
]
  • Related