Home > Software engineering >  Python - Simple webscrape not pulling
Python - Simple webscrape not pulling

Time:10-20

My code accesses a webpage, and wants to pull each row of information, however it pulls blank.

Expected output = Print title of each row.

Currently, it just prints out blank for me.

import time
import requests
from selenium import webdriver
driver = webdriver.Chrome()
bracket=[]
url='https://www.sabcs.org/Program/Poster-Sessions/Poster-Session-1'
driver.get(url)
time.sleep(3)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
r=requests.get(url)
page_source=r.content


each_field=driver.find_elements_by_xpath(".//tr[@class='normaltext']")
for item in each_field:
    print(item.text)

CodePudding user response:

There's an <iframe> tag that you need to switch to. Also, I'd just use pandas here to parse the table.

from selenium import webdriver
import pandas as pd

driver = webdriver.Chrome()
bracket=[]
url='https://www.sabcs.org/Program/Poster-Sessions/Poster-Session-1'
driver.get(url)
driver.switch_to.frame(driver.find_elements_by_xpath(".//iframe")[-1])
df = pd.read_html(driver.page_source)[0]

Output:

print(df)
                                                     0                                                  1
0                                                  NaN                                                NaN
1    Poster Session 1 – Wednesday, December 8, 2021...  Poster Session 1 – Wednesday, December 8, 2021...
2                                                  NaN                                                NaN
3                                                  NaN                Axillary Staging and Sentinel Nodes
4                                             P1-01-01  Prospective ultrasonographic surveillance stud...
..                                                 ...                                                ...
279                                           P1-24-04  Spatially resolved cell type heterogeneity unc...
280                                           P1-24-05  Breast conserving surgery for non-metastatic i...
281                                           P1-24-06  Risk factor modeled microenvironment effects l...
282                                           P1-24-07  Management trends and outcomes assessment for ...
283                                                NaN                                                NaN

[284 rows x 2 columns]
  • Related