Home > Back-end >  How to extract information from page
How to extract information from page

Time:10-12

I am trying to extract the name and phone number from this page

from selenium import webdriver
# location of chromedriver.exe
browser = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe")

browser.get("https://www.houzz.com/professionals/general-contractor")

for title in browser.find_elements_by_xpath('//span[@class="mlm header-5 text-unbold"]'):
    title.click()
    name=browser.find_elements_by_xpath('//h1[@class="mwxddt-0 jIujVr"]')
    print(name)

CodePudding user response:

You should have a loop for this case, look for name by index and then increase index by 1 for every iteration.

Also, you should scroll to each element to let selenium know that elements are in their view port.

Code :

browser = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe")

browser.maximize_window()
browser.implicitly_wait(30)

browser.get("https://www.houzz.com/professionals/general-contractor")
size = browser.find_elements(By.XPATH, "//span[@itemprop='name']")
j = 1
for i in range(len(size)):
    element =  browser.find_element(By.XPATH, f"(//span[@itemprop='name'])[{j}]")
    browser.execute_script("arguments[0].scrollIntoView(true);", element)
    print(element.text)
    j = j  1

Output :

Capital Remodeling
SOD Home Group
Innovative Construction Inc.
Baron Construction & Remodeling Co.
Luxe Remodel
California Home Builders & Remodeling Inc.
Sneller Custom Homes and Remodeling, LLC
123 Remodeling Inc.
Professional builders & Remodeling, Inc
Rudloff Custom Builders
LAR Construction & Remodeling
Erie Construction Mid West
Regal Construction & Remodeling Inc.
Mr. & Mrs. Construction & Remodeling
Bailey Remodeling and Construction LLC

Update 1 :

browser= webdriver.Chrome(driver_path)
browser.maximize_window()
browser.implicitly_wait(30)
wait = WebDriverWait(browser, 30)
browser.get("https://www.houzz.com/professionals/general-contractor")
size = browser.find_elements(By.XPATH, "//span[@itemprop='name']")
j = 1
for i in range(len(size)):
    element =  browser.find_element(By.XPATH, f"(//span[@itemprop='name'])[{j}]")
    browser.execute_script("arguments[0].scrollIntoView(true);", element)
    print(element.text)
    browser.execute_script("arguments[0].click();", element)
    wait.until(EC.element_to_be_clickable((By.XPATH, "//button[@data-component='Pro Phone Link']"))).click()
    print(wait.until(EC.visibility_of_element_located((By.XPATH, "//a[@data-component='Call Pro']"))).text)
    #wait.until(EC.element_to_be_clickable((By.LINK_TEXT, "Website"))).click()
    browser.execute_script("window.history.go(-1)")
    j = j   1

Imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
  • Related