Home > other >  Python fetching neutron link under the specific content of a page
Python fetching neutron link under the specific content of a page

Time:11-27

The import re
The import requests

R=requests. Get (' url ')
Data=https://bbs.csdn.net/topics/r.text

# using regular find all connection
Link_list=re. The.findall (r "(? <=href=https://bbs.csdn.net/"). +? (?=\ ") | (? <=href=https://bbs.csdn.net/'). +? (?=\ ') ", data)
For the url in link_list:
Print (url)

I use regular expressions to find a page with all the links and save for the lists. TXT or CSV file, could you tell me how to find these links to specific content? The following procedures, as if can't run, please bosses look,

The import requests
The import re

F=open (' lists. TXT ', 'r')
UrlList=f.r eadlines ()
For the url in urlList:
R=requests. Get (' url ')
Data=https://bbs.csdn.net/topics/r.text
[email=re. The.findall (r '0-9 a zA - Z.] + @ [0-9 a zA - Z.] +? Com ", data)
Print (email)

Above procedure upper part can be run separately, can be read printed url list, in the second part if given a url links, also can find the link in the email address, but the two parts together not, don't know is why?

CodePudding user response:

Show piece of data

CodePudding user response:

Suggest you don't directly with regular to do, give you an example below, in addition the library using regular, you can see his source in making,
 
The from simplified_scrapy import spiders, SimplifiedDoc SimplifiedMain, utils

The class MySpider (spiders) :
Name='test_spider'
Start_urls=[' your entry link address]
Refresh_urls=True

Def extract (self, url, HTML, models, modelNames) :
Doc=SimplifiedDoc (HTML)
LstA=None
If the url. The url in the self. Start_urls:
# here from the entry link to the corresponding page link
LstA=doc. Selects (' a ')
The else:
# extract data you want here
Email=doc. GetElement (' a ', attr='class', value='https://bbs.csdn.net/topics/email')
Print (email)
Return {" Urls ": lstA," Data ": None}


SimplifiedMain. StartThread (MySpider ()) # Start download
  • Related