Home > front end >  web-scraping college timetable keeps giving me 'charmap' error
web-scraping college timetable keeps giving me 'charmap' error

Time:02-16

I'm trying to webscrape for a project and get find free times on a timetable. When i run this code i get the following error "'charmap' codec can't decode byte 0x8d in position 32078: character maps to " Here is my code

url = "https://opentimetable.dcu.ie/"
response = requests.get(url)

with open('webpage.html', 'r') as html_file:

content = html_file.read()

Any help appreciated

CodePudding user response:

The reason you are not getting anything is because this data is dynamically rendered. You need to select different parameters to query what you are asking for, it will not come in a simple, static request.

With that, there is an api that gives you the option to search by different categories. The category with the least amount of unique values was "Location" so I went with that.

This will go and grab all the location Ids, then feed that into the filter to find what is booked for what time, at each location.

You have in the table the start and end times (and day) of when there is something booked. I will leave it to you to parse through that info to find when there is a day or time open. What I would simply do is have python create a list that has all the unique day and time starts and when time end, sort it, and have it then find where there are gaps/no overlaps.

import requests
import pandas as pd

s = requests.Session()

url = "https://opentimetable.dcu.ie/broker/api/categoryTypeOptions"
s.get(url)
cookies = s.cookies.get_dict()

cookieStr = ''
for k, v in cookies.items():
    cookieStr  = f'{k}={v};'

headers = {
'Accept': 'application/json, text/plain, */*',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,en-GB;q=0.8',
'Authorization': 'basic T64Mdy7m[',
'Connection': 'keep-alive',
'Content-Type': 'application/json',
'Cookie': cookieStr,
'Host': 'opentimetable.dcu.ie',
'Referer': 'https://opentimetable.dcu.ie/',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}



        
    
url = 'https://opentimetable.dcu.ie/broker/api/CategoryTypes/1e042cb1-547d-41d4-ae93-a1f2c3d34538/Categories/Filter?pageNumber=1'
payload = '[{"Identity": "6359fd0c-1bbe-496a-8998-4fefc5cd18de","Values": ["null"]}]' 
jsonData = s.post(url, headers=headers, data=payload).json()

totalPages = jsonData['TotalPages']

print('Page: 1 of %s' %totalPages)
locationList = jsonData['Results']

for page in range(2, totalPages 1):
    print('Page: %s of %s' %(page,totalPages))
    url = 'https://opentimetable.dcu.ie/broker/api/CategoryTypes/1e042cb1-547d-41d4-ae93-a1f2c3d34538/Categories/Filter?pageNumber=%s' %(page)
    jsonData = s.post(url, headers=headers, data=payload).json()
    locationList  = jsonData['Results']

locationList = [x['Identity'] for x in locationList]


def update_payload(listOfLocations):
    true = "true"
    false = "false"
    
    payload = {
      "ViewOptions": {
        "Days": [
          {
            "Name": "Monday",
            "DayOfWeek": 1,
            "IsDefault": true
          },
          {
            "Name": "Tuesday",
            "DayOfWeek": 2,
            "IsDefault": true
          },
          {
            "Name": "Wednesday",
            "DayOfWeek": 3,
            "IsDefault": true
          },
          {
            "Name": "Thursday",
            "DayOfWeek": 4,
            "IsDefault": true
          },
          {
            "Name": "Friday",
            "DayOfWeek": 5,
            "IsDefault": true
          }
        ],
        "Weeks": [
          {
            "WeekNumber": 21,
            "WeekLabel": "21",
            "FirstDayInWeek": "2022-02-07T00:00:00.000Z"
          }
        ],
        "TimePeriods": [
          {
            "Description": "All Day",
            "StartTime": "08:00",
            "EndTime": "22:00",
            "IsDefault": true
          }
        ],
        "DatePeriods": [
          {
            "Description": "This Week",
            "StartDateTime": "2021-09-20T00:00:00.000Z",
            "EndDateTime": "2022-09-20T00:00:00.000Z",
            "IsDefault": true,
            "IsThisWeek": true,
            "IsNextWeek": false,
            "Type": "ThisWeek"
          }
        ],
        "LegendItems": [],
        "InstitutionConfig": {},
        "DateConfig": {
          "FirstDayInWeek": 1,
          "StartDate": "2021-09-20T00:00:00 00:00",
          "EndDate": "2022-09-20T00:00:00 00:00"
        },
        "AllDays": [
          {
            "Name": "Monday",
            "DayOfWeek": 1,
            "IsDefault": true
          },
          {
            "Name": "Tuesday",
            "DayOfWeek": 2,
            "IsDefault": true
          },
          {
            "Name": "Wednesday",
            "DayOfWeek": 3,
            "IsDefault": true
          },
          {
            "Name": "Thursday",
            "DayOfWeek": 4,
            "IsDefault": true
          },
          {
            "Name": "Friday",
            "DayOfWeek": 5,
            "IsDefault": true
          },
          {
            "Name": "Saturday",
            "DayOfWeek": 6,
            "IsDefault": false
          },
          {
            "Name": "Sunday",
            "DayOfWeek": 0,
            "IsDefault": false
          }
        ]
      },
      "CategoryIdentities": listOfLocations
      
    }

    return payload
    

    
x = 20
final_list = lambda test_list, x: [test_list[i:i x] for i in range(0, len(test_list), x)]
locationChunks = final_list(locationList, x)


locationBooked = [] 
for count, listOfLocations in enumerate(locationChunks, start=1):
    print('%s of %s' %(count, len(locationChunks)))
    payload = update_payload(listOfLocations)

    url = 'https://opentimetable.dcu.ie/broker/api/categoryTypes/1e042cb1-547d-41d4-ae93-a1f2c3d34538/categories/events/filter'

    response = s.post(url, headers=headers, json=payload).json()
    
    for each in response:
        if len(each['CategoryEvents']) > 0:
            locationBooked  = each['CategoryEvents']
            
            
            
df = pd.DataFrame(locationBooked)

Output: First 5 rows of 2936

print(df.head().to_string())
                          EventIdentity                HostKey                                                Description                EndDateTime  EventType  IsPublished   Location                                 Owner              StartDateTime  IsDeleted                       LastModified                                                                                                                                                                                                                                                                                                                                                  ExtraProperties  UserManuallyAddedEvent                        StatusIdentity Status StatusBackgroundColor                                                 Name                              Identity
0  026bce78-1cce-9354-43b2-720b96ba9e03       2122#SPLUSCD8040                                                       None  2022-02-10T19:00:00 00:00    Booking         True  AHC.ODG01  b8cf1f5a-9687-4440-86b8-13da2c69fa62  2022-02-10T17:00:00 00:00      False  2022-01-17T17:27:51.9494682 00:00                                                                                                                                                                                                                                                   [{'Name': 'Activity.TeachingWeekPattern_PatternAsArray', 'DisplayName': 'Weeks', 'Value': '18-26', 'Rank': 3}]                   False  b48c85d4-19aa-4b19-87a6-63a5c6d2e630   None                  None                  Booking - AFU 21-22 (Grainne Reddy)  43c03d98-1c80-4ab5-a47a-19db340ab179
1  5dfe3e92-439f-d7b2-1aaf-384328680d90       2122#SPLUS42C975                                 Contemporary Irish Society  2022-02-10T13:00:00 00:00    Booking         True  AHC.ODG01  b8cf1f5a-9687-4440-86b8-13da2c69fa62  2022-02-10T10:30:00 00:00      False  2022-01-12T14:31:15.0526265 00:00                                                                                                                                                                                                                                                      [{'Name': 'Activity.TeachingWeekPattern_PatternAsArray', 'DisplayName': 'Weeks', 'Value': '21', 'Rank': 3}]                   False  b48c85d4-19aa-4b19-87a6-63a5c6d2e630   None                  None  Booking - Boston University 21-22 (Sean Harrington)  d259d6bd-6849-42a4-b28c-d6849b2623c1
2  bd60b7f0-633e-b90a-79e6-29fceb4c2ea5  2122ED1009[2]OC/L5/01                                                    RE Cert  2022-02-10T16:00:00 00:00  On Campus         True  AHC.ODG01  b8cf1f5a-9687-4440-86b8-13da2c69fa62  2022-02-10T15:00:00 00:00      False  2021-11-03T10:03:42.4098645 00:00                 [{'Name': 'Module Name', 'DisplayName': 'Module Name', 'Value': 'ED1009[0] Religions, Ethics & Moral Values (CIC)', 'Rank': 1}, {'Name': 'Staff Member', 'DisplayName': 'Staff Member', 'Value': 'Wilkinson J', 'Rank': 2}, {'Name': 'Activity.TeachingWeekPattern_PatternAsArray', 'DisplayName': 'Weeks', 'Value': '17-22, 24-26', 'Rank': 3}]                   False  b48c85d4-19aa-4b19-87a6-63a5c6d2e630   None                  None                                    ED1009[2]OC/L5/01  9589b2ef-17d7-4372-8870-5749c3ae6c37
3  bc161577-4c45-0ce9-a9c0-25fc942b12c0       2122#SPLUS57140A  Teacher as a Reflective Practitioner (School Placement)**  2022-02-09T10:00:00 00:00  On Campus         True  AHC.ODG01  b8cf1f5a-9687-4440-86b8-13da2c69fa62  2022-02-09T09:00:00 00:00      False  2021-10-28T10:35:59.4486955 00:00            [{'Name': 'Module Name', 'DisplayName': 'Module Name', 'Value': 'ED1024[0] Teacher as a Reflective Practitioner (SP)', 'Rank': 1}, {'Name': 'Staff Member', 'DisplayName': 'Staff Member', 'Value': 'Lodge A', 'Rank': 2}, {'Name': 'Activity.TeachingWeekPattern_PatternAsArray', 'DisplayName': 'Weeks', 'Value': '2-11, 17-22, 24-26', 'Rank': 3}]                   False  b48c85d4-19aa-4b19-87a6-63a5c6d2e630   None                  None                                    ED1024[0]OC/T1/07  d213893f-3e0a-4a01-a0d9-7dac1d873627
4  a804c1a7-ef87-d8f0-16be-502d396699d6     2122ED2009[2]L1/01      Religions, Ethics, Morals and Values Education (REMV)  2022-02-11T15:00:00 00:00  On Campus         True  AHC.ODG01  b8cf1f5a-9687-4440-86b8-13da2c69fa62  2022-02-11T14:00:00 00:00      False  2021-09-21T16:50:21.8417239 00:00  [{'Name': 'Module Name', 'DisplayName': 'Module Name', 'Value': 'ED2009[0] Religious, Ethics, Morals & Values Education', 'Rank': 1}, {'Name': 'Staff Member', 'DisplayName': 'Staff Member', 'Value': 'Lodge A, Wilkinson J', 'Rank': 2}, {'Name': 'Activity.TeachingWeekPattern_PatternAsArray', 'DisplayName': 'Weeks', 'Value': '17-22, 24-25', 'Rank': 3}]                   False  b48c85d4-19aa-4b19-87a6-63a5c6d2e630   None                  None                                    ED2009[2]OC/L1/01  6facf5d3-6cb3-45f3-a9c1-3311a74691e7
  • Related