I am trying to getting Fax number but they give me none these is page link https://www.avocats-lille.com/fr/annuaire/avocats-du-tableau-au-barreau-de-lille/3?view=entry
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from time import sleep
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
base_url='https://www.avocats-lille.com/'
url = 'https://www.avocats-lille.com/fr/annuaire/avocats-du-tableau-au-barreau-de-lille?view=entries'
driver = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe")
driver.get(url)
soup = BeautifulSoup(driver.page_source, "html.parser")
tra = soup.find_all('h2',class_='title')
productlinks=[]
for links in tra:
for link in links.find_all('a',href=True):
comp=base_url link['href']
productlinks.append(comp)
for link in productlinks:
driver.get(link)
soup = BeautifulSoup(driver.page_source, "html.parser")
try:
Fax=t.group(1) if (t:=re.search('Fax ([\d\s]*)', soup.select_one('.address .contact p').text)) else None
except:
Fax=''
CodePudding user response:
Are you sure the link in your question is correct - The one without Fax is https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/2?view=entry
You never return
or print
your variable fax
- So try to fix this and you will get your result if Fax is available
for link in productlinks:
driver.get(link)
soup = BeautifulSoup(driver.page_source, "html.parser")
try:
Fax=t.group(1) if (t:=re.search('Fax ([\d\s]*)', soup.select_one('.address .contact p').text)) else None
except:
Fax=''
print(Fax,link)
Output:
None https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/2?view=entry
03 28 53 52 45 https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/3?view=entry
03 20 02 44 19 https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/1938?view=entry
03 20 74 69 79 https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/4?view=entry
03 20 31 21 76 https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/2243?view=entry
03 74 09 65 01 https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/5?view=entry
None https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/2298?view=entry
...
Just in addition, it do not need the try except block, cause the if statement still handles behavior in case there is no fax:
for link in productlinks:
driver.get(link)
soup = BeautifulSoup(driver.page_source, "html.parser")
Fax=t.group(1) if (t:=re.search('Fax ([\d\s]*)', soup.select_one('.address .contact p').text)) else ''
print(Fax)
Output:
...
03 28 53 52 45
03 20 02 44 19
03 20 74 69 79
03 20 31 21 76
03 74 09 65 01
...