I'm trying very hard to make a webscraping bot to retrieve my grades every hour. I have already coded the part where it logs in to the website but I can't figure out how to extract just the grade with bs4 and instead end up getting most of the page.
# Importing all modules
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time
from bs4 import BeautifulSoup
# Onening myHIES through webdriver
driver=webdriver.Chrome("chromedriver.exe")
driver.get("https://hies.myschoolapp.com/app#login")
time.sleep(2.5)
# Logging in to myHIES then going on algebra grade page
driver.find_element(By.ID, "Username").send_keys("myemail")
driver.find_element(By.ID, "nextBtn").click()
time.sleep(4)
driver.find_element(By.ID, "i0118").send_keys("mypassword")
driver.find_element(By.ID, "idSIButton9").click()
time.sleep(2)
driver.find_element(By.ID, "idSIButton9").click()
print("*Breaths Lightly* WERE IN BABY!")
time.sleep(3.0)
driver.find_element(By.CSS_SELECTOR, "div#showHideGrade > div > label > span").click()
time.sleep(1.3)
driver.find_element(By.XPATH, '//*[@id="coursesContainer"]/div[1]/div[4]/a[1]').click()
print("handing off to bs4")
# Handing off manipulated page to bs4
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'lxml')
print("handed off to bs4")
for tag in soup.find_all():
print(tag.text)
print("should have printed tag text")
And the this is the html of where I am attempting to extract from
<div > <div > <h1> 69.00<span >%</span> </h1> <h6>marking period</h6> </div> <div > <h1>69.00<span >%</span></h1> <h6>year</h6> </div> </div>
The code I'm trying to use to extract (again)
<div > <div > <h1> 69.00<span >%</span> </h1> <h6>marking period</h6> </div> <div > <h1>69.00<span >%</span></h1> <h6>year</h6> </div> </div>
CodePudding user response:
You will need to mention specifically which tag you need to find, otherwise, find_all would return all tags. In your case, since the text you are looking for is in h1 tag, you will need to pass this to find_all.
for tag in soup.find_all("h1"):
print(tag.text)
If you wish to read more on find_all, please see this documentation.
CodePudding user response:
If provided html section is part of your soup you could try this:
....
main_div = soup.find('div', {'class': 'col-md-2'})
data_tags = main_div.find_all('h1')
data_notes = main_div.find_all('h6')
out_dct = {}
for i in range(2):
grades = data_tags[i].text.replace(' ', '').replace('\t', '').split('\n')
notes = data_notes[i].text.replace('\t', '').split('\n')
out_dct['grade_' str(i)] = grades
out_dct['grade_' str(i)].append(notes[0])
print(out_dct)
''' R e s u l t :
{'grade_0': ['69.00%', 'marking period'], 'grade_1': ['69.00%', 'year']}
'''