I'm trying to webscrape for a project and get find free times on a timetable. When i run this code i get the following error "'charmap' codec can't decode byte 0x8d in position 32078: character maps to " Here is my code
url = "https://opentimetable.dcu.ie/"
response = requests.get(url)
with open('webpage.html', 'r') as html_file:
content = html_file.read()
Any help appreciated
CodePudding user response:
The reason you are not getting anything is because this data is dynamically rendered. You need to select different parameters to query what you are asking for, it will not come in a simple, static request.
With that, there is an api that gives you the option to search by different categories. The category with the least amount of unique values was "Location"
so I went with that.
This will go and grab all the location Ids, then feed that into the filter to find what is booked for what time, at each location.
You have in the table the start and end times (and day) of when there is something booked. I will leave it to you to parse through that info to find when there is a day or time open. What I would simply do is have python create a list that has all the unique day and time starts and when time end, sort it, and have it then find where there are gaps/no overlaps.
import requests
import pandas as pd
s = requests.Session()
url = "https://opentimetable.dcu.ie/broker/api/categoryTypeOptions"
s.get(url)
cookies = s.cookies.get_dict()
cookieStr = ''
for k, v in cookies.items():
cookieStr = f'{k}={v};'
headers = {
'Accept': 'application/json, text/plain, */*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,en-GB;q=0.8',
'Authorization': 'basic T64Mdy7m[',
'Connection': 'keep-alive',
'Content-Type': 'application/json',
'Cookie': cookieStr,
'Host': 'opentimetable.dcu.ie',
'Referer': 'https://opentimetable.dcu.ie/',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'https://opentimetable.dcu.ie/broker/api/CategoryTypes/1e042cb1-547d-41d4-ae93-a1f2c3d34538/Categories/Filter?pageNumber=1'
payload = '[{"Identity": "6359fd0c-1bbe-496a-8998-4fefc5cd18de","Values": ["null"]}]'
jsonData = s.post(url, headers=headers, data=payload).json()
totalPages = jsonData['TotalPages']
print('Page: 1 of %s' %totalPages)
locationList = jsonData['Results']
for page in range(2, totalPages 1):
print('Page: %s of %s' %(page,totalPages))
url = 'https://opentimetable.dcu.ie/broker/api/CategoryTypes/1e042cb1-547d-41d4-ae93-a1f2c3d34538/Categories/Filter?pageNumber=%s' %(page)
jsonData = s.post(url, headers=headers, data=payload).json()
locationList = jsonData['Results']
locationList = [x['Identity'] for x in locationList]
def update_payload(listOfLocations):
true = "true"
false = "false"
payload = {
"ViewOptions": {
"Days": [
{
"Name": "Monday",
"DayOfWeek": 1,
"IsDefault": true
},
{
"Name": "Tuesday",
"DayOfWeek": 2,
"IsDefault": true
},
{
"Name": "Wednesday",
"DayOfWeek": 3,
"IsDefault": true
},
{
"Name": "Thursday",
"DayOfWeek": 4,
"IsDefault": true
},
{
"Name": "Friday",
"DayOfWeek": 5,
"IsDefault": true
}
],
"Weeks": [
{
"WeekNumber": 21,
"WeekLabel": "21",
"FirstDayInWeek": "2022-02-07T00:00:00.000Z"
}
],
"TimePeriods": [
{
"Description": "All Day",
"StartTime": "08:00",
"EndTime": "22:00",
"IsDefault": true
}
],
"DatePeriods": [
{
"Description": "This Week",
"StartDateTime": "2021-09-20T00:00:00.000Z",
"EndDateTime": "2022-09-20T00:00:00.000Z",
"IsDefault": true,
"IsThisWeek": true,
"IsNextWeek": false,
"Type": "ThisWeek"
}
],
"LegendItems": [],
"InstitutionConfig": {},
"DateConfig": {
"FirstDayInWeek": 1,
"StartDate": "2021-09-20T00:00:00 00:00",
"EndDate": "2022-09-20T00:00:00 00:00"
},
"AllDays": [
{
"Name": "Monday",
"DayOfWeek": 1,
"IsDefault": true
},
{
"Name": "Tuesday",
"DayOfWeek": 2,
"IsDefault": true
},
{
"Name": "Wednesday",
"DayOfWeek": 3,
"IsDefault": true
},
{
"Name": "Thursday",
"DayOfWeek": 4,
"IsDefault": true
},
{
"Name": "Friday",
"DayOfWeek": 5,
"IsDefault": true
},
{
"Name": "Saturday",
"DayOfWeek": 6,
"IsDefault": false
},
{
"Name": "Sunday",
"DayOfWeek": 0,
"IsDefault": false
}
]
},
"CategoryIdentities": listOfLocations
}
return payload
x = 20
final_list = lambda test_list, x: [test_list[i:i x] for i in range(0, len(test_list), x)]
locationChunks = final_list(locationList, x)
locationBooked = []
for count, listOfLocations in enumerate(locationChunks, start=1):
print('%s of %s' %(count, len(locationChunks)))
payload = update_payload(listOfLocations)
url = 'https://opentimetable.dcu.ie/broker/api/categoryTypes/1e042cb1-547d-41d4-ae93-a1f2c3d34538/categories/events/filter'
response = s.post(url, headers=headers, json=payload).json()
for each in response:
if len(each['CategoryEvents']) > 0:
locationBooked = each['CategoryEvents']
df = pd.DataFrame(locationBooked)
Output: First 5 rows of 2936
print(df.head().to_string())
EventIdentity HostKey Description EndDateTime EventType IsPublished Location Owner StartDateTime IsDeleted LastModified ExtraProperties UserManuallyAddedEvent StatusIdentity Status StatusBackgroundColor Name Identity
0 026bce78-1cce-9354-43b2-720b96ba9e03 2122#SPLUSCD8040 None 2022-02-10T19:00:00 00:00 Booking True AHC.ODG01 b8cf1f5a-9687-4440-86b8-13da2c69fa62 2022-02-10T17:00:00 00:00 False 2022-01-17T17:27:51.9494682 00:00 [{'Name': 'Activity.TeachingWeekPattern_PatternAsArray', 'DisplayName': 'Weeks', 'Value': '18-26', 'Rank': 3}] False b48c85d4-19aa-4b19-87a6-63a5c6d2e630 None None Booking - AFU 21-22 (Grainne Reddy) 43c03d98-1c80-4ab5-a47a-19db340ab179
1 5dfe3e92-439f-d7b2-1aaf-384328680d90 2122#SPLUS42C975 Contemporary Irish Society 2022-02-10T13:00:00 00:00 Booking True AHC.ODG01 b8cf1f5a-9687-4440-86b8-13da2c69fa62 2022-02-10T10:30:00 00:00 False 2022-01-12T14:31:15.0526265 00:00 [{'Name': 'Activity.TeachingWeekPattern_PatternAsArray', 'DisplayName': 'Weeks', 'Value': '21', 'Rank': 3}] False b48c85d4-19aa-4b19-87a6-63a5c6d2e630 None None Booking - Boston University 21-22 (Sean Harrington) d259d6bd-6849-42a4-b28c-d6849b2623c1
2 bd60b7f0-633e-b90a-79e6-29fceb4c2ea5 2122ED1009[2]OC/L5/01 RE Cert 2022-02-10T16:00:00 00:00 On Campus True AHC.ODG01 b8cf1f5a-9687-4440-86b8-13da2c69fa62 2022-02-10T15:00:00 00:00 False 2021-11-03T10:03:42.4098645 00:00 [{'Name': 'Module Name', 'DisplayName': 'Module Name', 'Value': 'ED1009[0] Religions, Ethics & Moral Values (CIC)', 'Rank': 1}, {'Name': 'Staff Member', 'DisplayName': 'Staff Member', 'Value': 'Wilkinson J', 'Rank': 2}, {'Name': 'Activity.TeachingWeekPattern_PatternAsArray', 'DisplayName': 'Weeks', 'Value': '17-22, 24-26', 'Rank': 3}] False b48c85d4-19aa-4b19-87a6-63a5c6d2e630 None None ED1009[2]OC/L5/01 9589b2ef-17d7-4372-8870-5749c3ae6c37
3 bc161577-4c45-0ce9-a9c0-25fc942b12c0 2122#SPLUS57140A Teacher as a Reflective Practitioner (School Placement)** 2022-02-09T10:00:00 00:00 On Campus True AHC.ODG01 b8cf1f5a-9687-4440-86b8-13da2c69fa62 2022-02-09T09:00:00 00:00 False 2021-10-28T10:35:59.4486955 00:00 [{'Name': 'Module Name', 'DisplayName': 'Module Name', 'Value': 'ED1024[0] Teacher as a Reflective Practitioner (SP)', 'Rank': 1}, {'Name': 'Staff Member', 'DisplayName': 'Staff Member', 'Value': 'Lodge A', 'Rank': 2}, {'Name': 'Activity.TeachingWeekPattern_PatternAsArray', 'DisplayName': 'Weeks', 'Value': '2-11, 17-22, 24-26', 'Rank': 3}] False b48c85d4-19aa-4b19-87a6-63a5c6d2e630 None None ED1024[0]OC/T1/07 d213893f-3e0a-4a01-a0d9-7dac1d873627
4 a804c1a7-ef87-d8f0-16be-502d396699d6 2122ED2009[2]L1/01 Religions, Ethics, Morals and Values Education (REMV) 2022-02-11T15:00:00 00:00 On Campus True AHC.ODG01 b8cf1f5a-9687-4440-86b8-13da2c69fa62 2022-02-11T14:00:00 00:00 False 2021-09-21T16:50:21.8417239 00:00 [{'Name': 'Module Name', 'DisplayName': 'Module Name', 'Value': 'ED2009[0] Religious, Ethics, Morals & Values Education', 'Rank': 1}, {'Name': 'Staff Member', 'DisplayName': 'Staff Member', 'Value': 'Lodge A, Wilkinson J', 'Rank': 2}, {'Name': 'Activity.TeachingWeekPattern_PatternAsArray', 'DisplayName': 'Weeks', 'Value': '17-22, 24-25', 'Rank': 3}] False b48c85d4-19aa-4b19-87a6-63a5c6d2e630 None None ED2009[2]OC/L1/01 6facf5d3-6cb3-45f3-a9c1-3311a74691e7