Home > database >  How store values together after scrape
How store values together after scrape

Time:11-07

I am able to scrape individual fields off a website, but would like to map the title to the time.

The fields "have their own class, so I am struggling on how to map the time to the title.

A dictionary would work, but how would i structure/format this dictionary so that it stores values on a line by line basis?

url for reference - https://ash.confex.com/ash/2021/webprogram/STUDIO.html

expected output:

9:00 AM-9:30 AM, Defining Race, Ethnicity, and Genetic Ancestry

11:00 AM-11:30 AM, Definitions of Structural Racism

etc

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
import time


driver.get('https://ash.confex.com/ash/2021/webprogram/STUDIO.html')
time.sleep(3)
page_source = driver.page_source
soup=BeautifulSoup(page_source,'html.parser')


productlist=soup.find_all('div',class_='itemtitle')
for item in productlist:
    for eachLine in item.find_all('a',href=True):
        title=eachLine.text
        print(title)
times=driver.find_elements_by_class_name("time")
for t in times:
    print(t.text)

CodePudding user response:

Selenium is an overkill here. Website didn't use any dynamic content, so you can scrape it with Python requests and BeautifulSoup. Here is a code how to achieve it. You need to query productlist and times separately and then iterate using indexes to be able to get both items at once. I put in range() length of an productlist because I assuming that both productlist and times will have equal length.

import requests
from bs4 import BeautifulSoup

url = 'https://ash.confex.com/ash/2021/webprogram/STUDIO.html'

res = requests.get(url)
soup = BeautifulSoup(res.content,'html.parser')

productlist = soup.select('div.itemtitle > a')
times = soup.select('.time')

for iterator in range(len(productlist)):
    row = times[iterator].text   ", "   productlist[iterator].text
    print(row)

Note: soup.select() gather items by css.

  • Related