Home > other >  Consult everybody a issue of reptiles crawl site.
Consult everybody a issue of reptiles crawl site.

Time:09-17

I wrote a crawl in python shuimu community (http://www.newsmth.net/nForum/#! Board/Python? P=1) the crawler to crawl the page of links to each post, I used a browser to view each post links on & lt; Td class='title_9 & gt; & lt; Td> And & lt; Td class='title_9 bg - odd & gt; & lt; td> Such tags, I use td=tbody. Find_all (' td, class_='title_9') returns all class='title_9' and class='title_9 bg - odd all posts, why class=' title_9 bg - odd posts can also find? I use td=tbody. Find_all (' td, class_='title_9 bg - odd'), but returns an empty, a post also can not find, then I print (tbody) print the contents of the tbody, found printed all posts in & lt; Td class='title_9 & gt; Inside, all class='title_9 bg - odd into class=' title_9 ', why is this so? Why the content of the program printed tbody and on the browser to see not the same? Pray god to help solve, thank you,
The browser to see:

The program output:

CodePudding user response:

In the front & lt; Td class='title_9 bg - odd & gt; Two class, this is he, "title_9" and "bg - odd"
Td in the back-end=tbody. Find_all (' td, class_='title_9 bg - odd) this is a class, "title_9 bg - odd", is in the middle of the Spaces, so there is no matching,

CodePudding user response:

reference 1st floor goofs off explode reply:
in the front & lt; Td class='title_9 bg - odd & gt; Two class, this is he, "title_9" and "bg - odd"
Td in the back-end=tbody. Find_all (' td, class_='title_9 bg - odd) this is a class, "title_9 bg - odd", is in the middle of the Spaces, so there is no matching,


"Title_9" and "bg - odd" is not in a quote? Why are the two class?
The same page, I print (soup. The find (' table ', class_='tiz board - list)) can find the corresponding content, class content is also has a space,

CodePudding user response:

In the front & lt; Td class='title_9 bg - odd & gt; , these are two class attribute, find_all check the parameters of this method, the specific information, I feel how to write the two is a fuzzy query

CodePudding user response:

reference goofs off explode reply: 3/f
in the front & lt; Td class='title_9 bg - odd & gt; , these are two class attribute, find_all check the parameters of this method, the specific information, I feel how to write the two is a fuzzy query

That why class_='tiz board - list' is not a two class attribute?

CodePudding user response:

The
reference 4 floor FoxFiled reply:

Python is the string in quotation marks,
I use td tbody. Find_all (' td, class_='title_9 bg - odd'), but returns an empty, this is your sentence in the original post, I use while on a mobile version of CSDN see title_9 and bg - there are a lot of space between the odd, so there is no match to the space and time,
The following demo can observe analyse
 the from bs4 import BeautifulSoup 

HTML="" "& lt; body>


















"" "

='LXML soup=BeautifulSoup (HTML, the features')

Td1=soup. Find_all (td, {' class ': {' title_9 bg - odd'}})
Print (' td1 & gt;> ', td1)
Td4=soup. Find_all (td, class_='title_9 bg - odd') # here is only a single space, can find to
Print (' td4 & gt;> ', td4)
Td5=soup. Find_all (td, class_={} 'title_9 bg - odd') # can add empty in this line of code with # try
no.Print (' td5 & gt;> 'and td5)
Td6=soup. Find_all (td, class_=[' title_9 bg - odd ']) # either way, the quotes are strings
Print (' td6 & gt;> ', td6)
Td7=soup. Find_all (td, class_=[' title_9 ', 'bg - odd']) # either way, the quotes are strings
Print (' td7 & gt;> ', td7)
Td2=soup. Find_all (td, {' class ':' title_9})
Print (' td2 & gt;> ', td2)
Td3=soup. Find_all (td, {' class ':' bg - odd})
Print (' td3 & gt;> ', td3)

CodePudding user response:

reference 5 floor goofs off explode reply:
Quote: refer to 4th floor FoxFiled response:

Python is the string in quotation marks,
I use td tbody. Find_all (' td, class_='title_9 bg - odd'), but returns an empty, this is your sentence in the original post, I use while on a mobile version of CSDN see title_9 and bg - there are a lot of space between the odd, so there is no match to the space and time,
The following demo can observe analyse
 the from bs4 import BeautifulSoup 

HTML="" "& lt; body>


















"" "

='LXML soup=BeautifulSoup (HTML, the features')

Td1=soup. Find_all (td, {' class ': {' title_9 bg - odd'}})
Print (' td1 & gt;> ', td1)
Td4=soup. Find_all (td, class_='title_9 bg - odd') # here is only a single space, can find to
Print (' td4 & gt;> ', td4)
nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull
  • Related