Vscode debug, debug console output; Crawler access labels can't see the child tags, how to solv-CodePudding

Home > other > Vscode debug, debug console output; Crawler access labels can't see the child tags, how to solv

Vscode debug, debug console output; Crawler access labels can't see the child tags, how to solv

Time：09-23

 import re 
The from urllib import request 
The from IO import BytesIO 
The import gzip 
# the crawler purpose is clear, the host popularity ranking 
# in the Google browser to find information about HTML F12 now, click the first option element 
# to find the number of HTML information, small arrow, hovering in the number of 
# 1 number 2 the name of the host need to grab the information 
# to simulate HTTP request, send the request to the server, access to the server returns to our HTML 
# use regular expressions to extract we need data (name, sentiment) 
The # VScode debugging code 


The class spiders () : 
Url='https://www.douyu.com/g_LOL' 
Root_pattern='& lt; Div & gt; ([\ s \ s] *?

'
#? Said not greed, \ s \ s says there are characters, * means to match zero or infinite times

Def __fetch_content (self) :
R=request. Urlopen (spiders. Url)
# private method
# bytes
HTMLS=r.r ead ()
Buff=BytesIO (HTMLS)
F=gzip. GzipFile (fileobj=buff)
HTMLS=f.r ead (). The decode (' utf-8)
Return HTMLS

Def __analysis (self, HTMLS) :
Root_html=re. The.findall (spiders. Root_pattern HTMLS)
Print (root_html [0])
A=1

Def go (self) :
# entry method
HTMLS=self. __fetch_content ()
Self. __analysis (HTMLS)

Spiders=spiders ()
Spiders. The go ()

Page link：https//www.codepudding.com/other/24881.html

Prev:Of new about crawl from the post content and parses the crawler

Tags：

Scripting language (Perl/Python)

Related

Links：
CodePudding