Home > database >  A failure of python crawler
A failure of python crawler

Time:10-02

I want to crawl baidu experience directory page before 3:
 import re 
The import urllib. Request

Def getHtml (url) :
Page=urllib. Request. Urlopen (url)
HTML=page. The read ()
Return the HTML


Def getTitle (HTML) :
Reg=r "title=" ([. * \ S] *) "target='
Imgre=re.com running (reg);
Imglist=re. The.findall (imgre, HTML)
Return imglist

Url="https://jingyan.baidu.com/user/npublic/? Uid=d1b612bceb0dc22ba8ffe137 & amp; Pn="
For I in range (0, 3 * 7, 7) :
I=STR (I)
A=url + I
Print (' link on this page is: \ n ', a)
HTML=getHtml (a)
HTML=HTML. Decode (' utf-8)
Print (" this page directory as follows ")
For I in getTitle (HTML) :
Print (I)

Is likely to become the first four pages in a few days, because the author may be written a new article,
The operation result is:
links on this page is:
https://jingyan.baidu.com/user/npublic/? Uid=d1b612bceb0dc22ba8ffe137 & amp; Pn=0
This page directory below
How to put the video or dynamic figure play in reverse chronological order?
How to use the network panel drawing implicit function image?
How to know the relationship between the displacement and displacement matrix?
How to deal with large amounts of data fitting (linear programming)?
Lattice in the plane of the relationship with the matrix multiplication
How to use AudioGenerator function?
How to install urllib3
Links on this page is:
https://jingyan.baidu.com/user/npublic/? Uid=d1b612bceb0dc22ba8ffe137 & amp; Pn=7
This page directory below
How to use math manual calculator draw images of the two functions?
Network panel how to implement the dynamic change of graphics fill color?
TXT file is too long, can't open?
How to use the computer a test matrix multiplication meet the associative law?
The python string operation have?
How to play wooden puzzle game on mobile phones?
Basic parameters of the audio based - how to view the audio?
Links on this page is:
https://jingyan.baidu.com/user/npublic/? Uid=d1b612bceb0dc22ba8ffe137 & amp; Pn=14
This page directory below
How to keep in mind that color chart
How did undertale the first level
Guide to navigate how to upgrade
Mobile phone how to set the cylinder rotate?
How to use ali ruban intelligent design platform
Published novels earn money on your phone?

This is a failure of crawler, because an article in the third page of title not climb out, the title of this article is: "how to install VS Code software on the computer?"
Because the title [/size] lead to match failure,

I want to know, how to complete the climb take all of the title of the article?
  • Related