Home > other > Consult everybody a issue of reptiles crawl site.
Consult everybody a issue of reptiles crawl site.
Time:09-17
I wrote a crawl in python shuimu community (http://www.newsmth.net/nForum/#! Board/Python? P=1) the crawler to crawl the page of links to each post, I used a browser to view each post links on & lt; Td class='title_9 & gt; & lt; Td> And & lt; Td class='title_9 bg - odd & gt; & lt; td> Such tags, I use td=tbody. Find_all (' td, class_='title_9') returns all class='title_9' and class='title_9 bg - odd all posts, why class=' title_9 bg - odd posts can also find? I use td=tbody. Find_all (' td, class_='title_9 bg - odd'), but returns an empty, a post also can not find, then I print (tbody) print the contents of the tbody, found printed all posts in & lt; Td class='title_9 & gt;
Inside, all class='title_9 bg - odd into class=' title_9 ', why is this so? Why the content of the program printed tbody and on the browser to see not the same? Pray god to help solve, thank you, The browser to see:
The program output:
CodePudding user response:
In the front & lt; Td class='title_9 bg - odd & gt;
Two class, this is he, "title_9" and "bg - odd" Td in the back-end=tbody. Find_all (' td, class_='title_9 bg - odd) this is a class, "title_9 bg - odd", is in the middle of the Spaces, so there is no matching,
CodePudding user response:
"" "
='LXML soup=BeautifulSoup (HTML, the features')
Td1=soup. Find_all (td, {' class ': {' title_9 bg - odd'}}) Print (' td1 & gt;> ', td1) Td4=soup. Find_all (td, class_='title_9 bg - odd') # here is only a single space, can find to Print (' td4 & gt;> ', td4) Td5=soup. Find_all (td, class_={} 'title_9 bg - odd') # can add empty in this line of code with # try no.Print (' td5 & gt;> 'and td5) Td6=soup. Find_all (td, class_=[' title_9 bg - odd ']) # either way, the quotes are strings Print (' td6 & gt;> ', td6) Td7=soup. Find_all (td, class_=[' title_9 ', 'bg - odd']) # either way, the quotes are strings Print (' td7 & gt;> ', td7) Td2=soup. Find_all (td, {' class ':' title_9}) Print (' td2 & gt;> ', td2) Td3=soup. Find_all (td, {' class ':' bg - odd}) Print (' td3 & gt;> ', td3)