I am able to scrape individual fields off a website, but would like to map the title to the time.
The fields "have their own class, so I am struggling on how to map the time to the title.
A dictionary would work, but how would i structure/format this dictionary so that it stores values on a line by line basis?
url for reference - https://ash.confex.com/ash/2021/webprogram/STUDIO.html
expected output:
9:00 AM-9:30 AM, Defining Race, Ethnicity, and Genetic Ancestry
11:00 AM-11:30 AM, Definitions of Structural Racism
etc
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
import time
driver.get('https://ash.confex.com/ash/2021/webprogram/STUDIO.html')
time.sleep(3)
page_source = driver.page_source
soup=BeautifulSoup(page_source,'html.parser')
productlist=soup.find_all('div',class_='itemtitle')
for item in productlist:
for eachLine in item.find_all('a',href=True):
title=eachLine.text
print(title)
times=driver.find_elements_by_class_name("time")
for t in times:
print(t.text)
CodePudding user response:
Selenium is an overkill here. Website didn't use any dynamic content, so you can scrape it with Python requests
and BeautifulSoup
. Here is a code how to achieve it. You need to query productlist
and times
separately and then iterate using indexes to be able to get both items at once. I put in range()
length of an productlist
because I assuming that both productlist
and times
will have equal length.
import requests
from bs4 import BeautifulSoup
url = 'https://ash.confex.com/ash/2021/webprogram/STUDIO.html'
res = requests.get(url)
soup = BeautifulSoup(res.content,'html.parser')
productlist = soup.select('div.itemtitle > a')
times = soup.select('.time')
for iterator in range(len(productlist)):
row = times[iterator].text ", " productlist[iterator].text
print(row)
Note: soup.select()
gather items by css.