Home > Enterprise >  Python selenium scrap website returns nothing
Python selenium scrap website returns nothing

Time:03-22

I have used the previous code to download some titles from the website, but somehow it began to return nothing, but no errors.

driver.get('http://www.szse.cn/disclosure/bond/notice/index.html')
wait = WebDriverWait(driver, 30)
datefield_st = wait.until(EC.element_to_be_clickable((By.XPATH, "//div[@class='input-group-wrap form-control dropdown-btn']/input[1]")))
datefield_st.click()
s1 = driver.find_element_by_class_name('input-left')
s1.send_keys("2022-3-7")
s2 = driver.find_element_by_class_name('input-right')
s2.send_keys("2022-3-21")
driver.find_element_by_id("query-btn").click()
links=[link.get_attribute('href') for link in wait.until(EC.presence_of_all_elements_located((By.XPATH,"//a[@attachformat][.//span[contains(text(),'募集说明书' and not(contains(text(),'摘要'))]]")))]
titles=[title.text for title in wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='pull-left title-text multiline' and contains(text(), '募集说明书' and not(contains(text(),'摘要'))]//parent::a")))]
dates=[date.text for date in wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='pull-left title-text multiline' and contains(text(), '募集说明书' and not(contains(text(),'摘要'))]//ancestor::td//following-sibling::td")))]
print(link,title,date)

CodePudding user response:

driver.get('http://www.szse.cn/disclosure/bond/notice/index.html')
wait = WebDriverWait(driver, 30)
datefield_st = wait.until(EC.element_to_be_clickable((By.XPATH, "//div[@class='input-group-wrap form-control dropdown-btn']/input[1]")))
datefield_st.click()
s1 = driver.find_element_by_class_name('input-left')
s1.send_keys("2022-3-7")
s2 = driver.find_element_by_class_name('input-right')
s2.send_keys("2022-3-21")
driver.find_element_by_id("query-btn").click()
links=[link.get_attribute('href') for link in wait.until(EC.presence_of_all_elements_located((By.XPATH,"//a[@attachformat][.//span[contains(text(),'募集说明书') and not(contains(text(),'摘要'))]]")))]
titles=[title.text for title in wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='pull-left title-text multiline' and contains(text(), '募集说明书') and not(contains(text(),'摘要'))]//parent::a")))]
dates=[date.text for date in wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='pull-left title-text multiline' and contains(text(), '募集说明书') and not(contains(text(),'摘要'))]//ancestor::td//following-sibling::td")))]
print(links,titles,dates)
links=[link.get_attribute('href') for link in wait.until(EC.presence_of_all_elements_located((By.XPATH,"//a[@attachformat][.//span[contains(text(),'募集说明书') and not(contains(text(),'摘要'))]]")))]
titles=[title.text for title in wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='pull-left title-text multiline' and contains(text(), '募集说明书') and not(contains(text(),'摘要'))]//parent::a")))]
dates=[date.text for date in wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='pull-left title-text multiline' and contains(text(), '募集说明书') and not(contains(text(),'摘要'))]//ancestor::td//following-sibling::td")))]
print(links,titles,dates)

Missing )'s and wrong variable names for the print.

  • Related