Home > other >  [urgent, help] how url contains Numbers and letters are mixed batch crawl site?
[urgent, help] how url contains Numbers and letters are mixed batch crawl site?

Time:12-02

I is a just began to learn the crawler, students in practice need to crawl the website content analysis of the data, my target site is: https://www.nz86.com/fashion/
I need a lot of crawl the site within the web page, but the web site consists of Numbers and letters mix, the diagram below:
(multiple links are not allowed to put, can only be the case)
How can I loop or using other methods to climb take such website source code?

CodePudding user response:

All the tag=a whole into the page list, according to the list of climb,

CodePudding user response:

To a regular match came out
 import requests 
The import re
Headers={
'the user-agent' : 'Mozilla/5.0'
}
Url='https://www.nz86.com/fashion/'
Resp=requests. Get (url=url, headers=headers)
Re_text='\ "(https://www\.nz86\.com/article/. *? \ "'
Urllist=re. The.findall (re_text, resp. The text)
For article_url urllist in:
Print (article_url)

CodePudding user response:

refer to the second floor xujibicool response:
to a regular match came out
 import requests 
The import re
Headers={
'the user-agent' : 'Mozilla/5.0'
}
Url='https://www.nz86.com/fashion/'
Resp=requests. Get (url=url, headers=headers)
Re_text='\ "(https://www\.nz86\.com/article/. *? \ "'
Urllist=re. The.findall (re_text, resp. The text)
For article_url urllist in:
Print (article_url)

Don't mean that, I'm in addition to containing links on a page will,

CodePudding user response:

That should use scrapy library, crawl recursive calls, was standing up taking
Here also said not clearly, would you like to surf the Internet to find tutorials

CodePudding user response:

Began to don't understand why want to
You have a web site, the links of Numbers and letters affect you to climb on the data? Is the network address of the file, you read, the content inside out not to go,
You may now be haven't climb the web page of thinking, to learn the first crawled web pages, you all of these is not a problem, is just one piece of the process