For example this is the main page link
This is code
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from time import sleep
PATH="C:\Program Files (x86)\chromedriver.exe"
driver =webdriver.Chrome(PATH)
driver.get('https://www.nationalhardwareshow.com/en-us/attend/exhibitor-list.html')
data = []
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
hrefs = [my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//h3[@class='text-center-mobile wrap-word']//ancestor::a[1]")))[:5]]
windows_before = driver.current_window_handle
for href in hrefs:
driver.execute_script("window.open('" href "');")
WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2))
windows_after = driver.window_handles
new_window = [x for x in windows_after if x != windows_before][0]
driver.switch_to.window(new_window)
data.append(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//h1[@class='wrap-word']"))).text)
driver.close()
driver.switch_to.window(windows_before)
print(data)
CodePudding user response:
Your current problem is invalid XPath //div[contains(@class,'company-info']//h3)]
- wrong parentheses usage. You need to use //div[contains(@class,'company-info')]//h3
instead.
However, if you want to scrape data from each company entry on page then your clicking links approach is not good.
Try to implement the following:
get
href
attribute of every link. Since not all links initially displayed on page you need to trigger all possible XHRs, so createcount
variable to get current links count and do in while loop:- execute send
END
hardkey to scroll page down - try to wait until current links count >
count
. If True - re-definecount
with new value. If Exception - break the loop (there are no more links remain to load) - get
href
of all link nodes//div[@]//a
- execute send
in
for
loop navigate to each link withdriver.get(<URL>)
scrape data
CodePudding user response:
With in the 2022 EXHIBITOR LIST webpage to click() on each link to scrape you can collect the href attributes and open them in the adjascent tab as follows:
Code Block (sample for first 5 entries):
driver.get('https://www.nationalhardwareshow.com/en-us/attend/exhibitor-list.html') data = [] WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click() hrefs = [my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//h3[@class='text-center-mobile wrap-word']//ancestor::a[1]")))[:5]] windows_before = driver.current_window_handle for href in hrefs: driver.execute_script("window.open('" href "');") WebDriverWait(driver, 20).until(EC.number_of_windows_to_be(2)) windows_after = driver.window_handles new_window = [x for x in windows_after if x != windows_before][0] driver.switch_to.window(new_window) data.append(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h1[@class='wrap-word']"))).text) driver.close() driver.switch_to.window(windows_before) print(data)
Console Output:
['10X Innovations - Swift ULV', '21st Century Inc', '3V Snap Ring LLC.', 'A-ipower Corp', 'A.A.C. Forearm Forklift Inc']