I came across this piece of code during solving a problem. I just cannot understand how the last line of the code before the print
functions. Please explain.
import re
import urllib.request
from bs4 import BeautifulSoup
# url = 'http://py4e-data.dr-chuck.net/comments_42.html'
url = 'http://py4e-data.dr-chuck.net/comments_228869.html'
soup = BeautifulSoup(urllib.request.urlopen(url).read(), 'html.parser')
s = sum(int(td.text) for td in soup.select('td:last-child')[1:])
print(s)
CodePudding user response:
This is the order of operations:
soup.select('td:last-child')
is a method that returns alist
of selected elements[1:]
is a slicing operation - it creates a new list that skips the first (zero'th) item in the listfor td in
is a loop where the items of the list are assigned totd
in turnint(td.text)
takes the "text" attribute of the object intd
and then creates its integer equivalentsum()
sums those integers as they are generated
CodePudding user response:
You can break down the following assignment...
s = sum(int(td.text) for td in soup.select('td:last-child')[1:])
...into several statements:
all_td = soup.select('td:last-child') # get all last TD elements in each TR
rest_td = all_td[1:] # skip the first TD among those
s = 0 # for accumulating a sum
for td in rest_td:
val = int(td.text) # parse the text in the TD as an integer
s = val # add that number to the running sum
Now you can step through these statements with a debugger, or add some print calls here and there, to see what's going on.