I am trying to scrape a website to print out events with their time and date
with open('events.html', 'r', encoding='utf-8') as html_file:
content = html_file.read()
soup = BeautifulSoup(content, 'lxml')
free_slot = soup.find_all('tr', class_='views-field views-field-title')
for slot in free_slot:
event_name = slot.a.text
event_time = slot.time.text
print(event_name)
print(event_time)
events.html contains this
Bystander Intervention: Live Workshop Glasnevin Campus Solas Room, The U Student Support & Development February 15, 13:00 - February 15, 13:50The html is from this website : https://www.dcu.ie/students/events When I try run the code it just returns '[]'
CodePudding user response:
What happens?
ResultSet is empty cause there is no <tr>
with these classes defined in your find_all()
.
How to fix?
Remove the classes from your find_all()
and iterate over:
free_slot = soup.find_all('tr')
for slot in free_slot:
print(slot)
event_name = slot.a.text
event_time = slot.time.text
print(event_name)
print(event_time)
How to scrape the table?
You can do it using BeautifulSoup
but I think to get the contents of table it is much more simple to use pandas
built-in read_html
, that will do the job for you:
import pandas as pd
pd.read_html('https://www.dcu.ie/students/events')[0]
Output
Unnamed: 0 | Campus | Venue | Department | Event date |
---|---|---|---|---|
Bystander Intervention: Live Workshop | Glasnevin Campus | Solas Room, The U | Student Support & Development | February 15, 13:00 - February 15, 13:50 |
Emotional Intelligence: Ways to Ease Stress and Anxiety - Session 2 | Online | Online via Zoom | Student Support & Development | February 15, 13:00 - February 15, 14:00 |
Critical writing | Online | Online via Zoom | Student Learning | February 15, 13:00 - February 15, 14:00 |
Skills Session: Ace your Interview Skills | Online | Online | Careers Service | February 15, 13:00 - February 15, 13:50 |
Bystander Intervention: Live Workshop | St Patrick's Campus | B108, Auditorium | Student Support & Development | February 15, 17:00 - February 15, 17:50 |
Bystander Intervention: Live Workshop | Glasnevin Campus | Cuilin Room, The U | Student Support & Development | February 15, 18:00 - February 15, 18:50 |
How to Survive a Technical Interview with Microsoft | Online | Online | Careers Service | February 16, 10:00 - February 16, 11:00 |
Going Global Job Seach Training Session | Online | Virtual | Careers Service | February 16, 10:00 - February 16, 11:00 |
Informative session and a Q&A on the Vodafone Ireland Summer Internship Programme 2022. | Online | Online | Careers Service | February 16, 12:00 - February 16, 13:00 |