How to webscrape old school website that uses frames-CodePudding

I am trying to webscrape a government site that uses frameset. Here is the URL - https://lakecounty.in.gov/departments/voters/election-results-c/2022GeneralElectionResults/index.htm

I've tried using splinter/selenium

url = "https://lakecounty.in.gov/departments/voters/election-results-c/2022GeneralElectionResults/index.htm"

browser.visit(url)

time.sleep(10)

full_xpath_frame = '/html/frameset/frameset/frame[2]'

tree = browser.find_by_xpath(full_xpath_frame)

for i in tree:
    print(i.text)

It just returns an empty string. I've tried using the requests library.


import requests
from lxml import HTML

url = "https://lakecounty.in.gov/departments/voters/election-results-c/2022GeneralElectionResults/index.htm"

# get response object
response = requests.get(url)
 
# get byte string
data = response.content
print(data)

And it returns this

b"<html>\r\n<head>\r\n<meta http-equiv='Content-Type'\r\ncontent='text/html; charset=iso-

8859-1'>\r\n<title>Lake_ County Election Results</title>\r\n</head>\r\n<FRAMESET rows='20%,

 *'>\r\n<FRAME src='titlebar.htm' scrolling='no'>\r\n<FRAMESET cols='20%, *'>\r\n<FRAME 

src='menu.htm'>\r\n<FRAME src='Lake_ElecSumm_all.htm' name='reports'>\r\n</FRAMESET>

\r\n</FRAMESET>\r\n<body>\r\n</body>\r\n</html>\r\n"

I've also tried using beautiful soup and it gave me the same thing. Is there another python library I can use in order to get the data that's inside the second table?

Thank you for any feedback.

CodePudding user response：

As mentioned you could go for the frames and its src:

BeautifulSoup(r.text).select('frame')[1].get('src')

or directly to the menu.htm:

import requests
from bs4 import BeautifulSoup

r = requests.get('https://lakecounty.in.gov/departments/voters/election-results-c/2022GeneralElectionResults/menu.htm')

link_list = ['https://lakecounty.in.gov/departments/voters/election-results-c/2022GeneralElectionResults' a.get('href') for a in BeautifulSoup(r.text).select('a')]

for link in link_list[:1]:
    r = requests.get(link)
    soup = BeautifulSoup(r.text)
    ###...scrape what is needed