Home > Mobile >  Trying to get Fax number using beautifulsoup
Trying to get Fax number using beautifulsoup

Time:08-20

I am trying to getting Fax number but they give me none these is page link https://www.avocats-lille.com/fr/annuaire/avocats-du-tableau-au-barreau-de-lille/3?view=entry

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from time import sleep

headers ={
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
base_url='https://www.avocats-lille.com/'
url = 'https://www.avocats-lille.com/fr/annuaire/avocats-du-tableau-au-barreau-de-lille?view=entries'
driver = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe")
driver.get(url)
soup = BeautifulSoup(driver.page_source, "html.parser")
tra = soup.find_all('h2',class_='title')
productlinks=[]
for links in tra:
    for link in links.find_all('a',href=True):
        comp=base_url link['href']
        productlinks.append(comp)
        
for link in productlinks:
    driver.get(link)
    soup = BeautifulSoup(driver.page_source, "html.parser")
    try:
        Fax=t.group(1) if (t:=re.search('Fax ([\d\s]*)', soup.select_one('.address  .contact p').text)) else None
    except:
        Fax=''

CodePudding user response:

Are you sure the link in your question is correct - The one without Fax is https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/2?view=entry

You never return or print your variable fax - So try to fix this and you will get your result if Fax is available

for link in productlinks:
    driver.get(link)
    soup = BeautifulSoup(driver.page_source, "html.parser")
    try:
        Fax=t.group(1) if (t:=re.search('Fax ([\d\s]*)', soup.select_one('.address  .contact p').text)) else None
    except:
        Fax=''
    print(Fax,link)

Output:

None https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/2?view=entry
03 28 53 52 45 https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/3?view=entry
03 20 02 44 19 https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/1938?view=entry
03 20 74 69 79 https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/4?view=entry
03 20 31 21 76 https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/2243?view=entry
03 74 09 65 01 https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/5?view=entry
None https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/2298?view=entry
...

Just in addition, it do not need the try except block, cause the if statement still handles behavior in case there is no fax:

for link in productlinks:
    driver.get(link)
    soup = BeautifulSoup(driver.page_source, "html.parser")
    Fax=t.group(1) if (t:=re.search('Fax ([\d\s]*)', soup.select_one('.address  .contact p').text)) else ''
    
    print(Fax)

Output:

...

03 28 53 52 45
03 20 02 44 19
03 20 74 69 79
03 20 31 21 76
03 74 09 65 01

...
  • Related