How to get match links and use request on that links in python?-CodePudding

I have some text (strings) that I scrapped. now I want to get only matched link, and want to use get requests on those links. here is sample codes.

[nordway, nord.download and ul is sample domain]

texts = """

https://ul.com/123

https://nord.download/view/11
https://nord.download/view/22

http://nordway.com/view/1
http://nordway.com/view/2
http://nordway.com/view/3
http://nordway.com/view/4
http://nordway.com/view/5

"""

import requests
from bs4 import BeautifulSoup
import re

match1= re.findall(r'(https://nord.download/?\S )', texts)

match2 = re.findall(r'http://nordway.com/?\S ', texts)

print(match1)
print(match2)

Output:

print(match1)
['https://nord.download/view/11', 'https://nord.download/view/22']

print(match2)
['http://nordway.com/view/1', 'http://nordway.com/view/2', 'http://nordway.com/view/3', 'http://nordway.com/view/4', 'http://nordway.com/view/5']

But I want to grab those links - as when nord.download link is found it will print or grab links, if not available (nord.download) output will found nordway.com, else result will be 'Not Found'

But here I could not figure out how to put those two regex as exception or what i don't know. or how to get both domain links if it is available.

and at the end, I want to use those links or a single link for requests data (using requests).

page = requests.get(match1)
soup = BeautifulSoup(page.content, features='lxml')

all links are used as examples. there is no valid link

if I make mistake, pardon me, and please help me get rid of it. Thanks

CodePudding user response：

You can't use match1 directly with requests.get() because it is a sequence of urls, not just one url. Instead, you need a for loop to iterate over each url that you matched:

for url in match1:
    response = requests.get(url)
    print(response)

CodePudding user response：

if match :
    responses={}
    counter=0
    for url in match :
        counter =1
        responses["response" str(counter)]=requests.get(url)
print(responses)