Home > other > [urgent, help] how url contains Numbers and letters are mixed batch crawl site?
[urgent, help] how url contains Numbers and letters are mixed batch crawl site?
Time:12-02
I is a just began to learn the crawler, students in practice need to crawl the website content analysis of the data, my target site is: https://www.nz86.com/fashion/ I need a lot of crawl the site within the web page, but the web site consists of Numbers and letters mix, the diagram below: (multiple links are not allowed to put, can only be the case) How can I loop or using other methods to climb take such website source code?
CodePudding user response:
All the tag=a whole into the page list, according to the list of climb,
CodePudding user response:
To a regular match came out
import requests The import re Headers={ 'the user-agent' : 'Mozilla/5.0' } Url='https://www.nz86.com/fashion/' Resp=requests. Get (url=url, headers=headers) Re_text='\ "(https://www\.nz86\.com/article/. *? \ "' Urllist=re. The.findall (re_text, resp. The text) For article_url urllist in: Print (article_url)