Home > other >  Extract a particular link present in each of the considered web pages
Extract a particular link present in each of the considered web pages

Time:07-27

I'm having trouble extracting a particular link from each of the web pages I'm considering.

In particular, considering for example the following websites:

I would like to know if there is a unique way to extract the field WEBSITE (above the map) shown in the table on the left of the page. For the reported cases, I would like to extract the links:

There are no unique tags to refer to and this makes extraction difficult. I've thought of a solution using the selector, but it doesn't seem to work. For the first link I have:

from bs4 import BeautifulSoup
import requests

url = "https://lefooding.com/en/restaurants/ezkia"

res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
data = soup.find("div", {"class": "e-rowContent"})
print(data)

but there is no trace of the link I need here. Does anyone know of a possible solution?

CodePudding user response:

Try this:

import requests
from bs4 import BeautifulSoup

urls = [
    "https://lefooding.com/en/restaurants/ezkia",
    "https://lefooding.com/en/restaurants/tekes",
]

with requests.Session() as s:
    for url in urls:
        soup = [
            link.strip() for link 
            in BeautifulSoup(
                s.get(url).text, "lxml"
            ).select(".pageGuide__infos a")[-1]
        ]
        print(soup)

Output:

['https://www.ezkia-restaurant.fr']
['https://www.tekesrestaurant.com/']
  • Related