I have some text (strings) that I scrapped. now I want to get only matched link, and want to use get requests on those links. here is sample codes.
[nordway, nord.download and ul is sample domain]
texts = """
https://ul.com/123
https://nord.download/view/11
https://nord.download/view/22
http://nordway.com/view/1
http://nordway.com/view/2
http://nordway.com/view/3
http://nordway.com/view/4
http://nordway.com/view/5
"""
import requests
from bs4 import BeautifulSoup
import re
match1= re.findall(r'(https://nord.download/?\S )', texts)
match2 = re.findall(r'http://nordway.com/?\S ', texts)
print(match1)
print(match2)
Output:
print(match1)
['https://nord.download/view/11', 'https://nord.download/view/22']
print(match2)
['http://nordway.com/view/1', 'http://nordway.com/view/2', 'http://nordway.com/view/3', 'http://nordway.com/view/4', 'http://nordway.com/view/5']
But I want to grab those links - as when nord.download link is found it will print or grab links, if not available (nord.download) output will found nordway.com, else result will be 'Not Found'
But here I could not figure out how to put those two regex as exception or what i don't know. or how to get both domain links if it is available.
and at the end, I want to use those links or a single link for requests data (using requests).
page = requests.get(match1)
soup = BeautifulSoup(page.content, features='lxml')
all links are used as examples. there is no valid link
if I make mistake, pardon me, and please help me get rid of it. Thanks
CodePudding user response:
You can't use match1
directly with requests.get()
because it is a sequence of urls, not just one url. Instead, you need a for loop to iterate over each url that you matched:
for url in match1:
response = requests.get(url)
print(response)
CodePudding user response:
if match :
responses={}
counter=0
for url in match :
counter =1
responses["response" str(counter)]=requests.get(url)
print(responses)