Home > Back-end >  List index out of range when writing to a document with selenium
List index out of range when writing to a document with selenium

Time:04-03

I am trying to write uni names, department names and ratings to a file from https://www.whatuni.com/university-course-reviews/?pageno=14. It goes well until I reach a post without a department name it gives me the error

file.write(user_name[k].text   ";"   uni_names[k].text   ";"   department[k].text   ";"   date_posted[k].text  
IndexError: list index out of range

Here is the code I use. I believe I need to somehow write null or use space when the department doesn't exist. I use if not and else but it didn't work for me. I would appreciate any help. Thank you

for i in range(20):
try:
    driver.refresh()
    uni_names = driver.find_elements_by_xpath('//div[@]/h2/a')
    department_names = driver.find_elements_by_xpath('//div[@]/h3/a')
    user_name = driver.find_elements_by_xpath('//div[@]')
    date_posted = driver.find_elements_by_xpath('//div[@]')
    uni_rev = driver.find_elements_by_xpath('(//div[@]/div[@]/p)')
    uni_rating = driver.find_elements_by_xpath('(//div[@]/div[@]/span[starts-with(@class,"ml5")])')
    job_prospects = driver.find_elements_by_xpath('//span[text()="Job Prospects"]/following-sibling::span')
    course_and_lecturers = driver.find_elements_by_xpath('//span[text()="Course and Lecturers"]/following-sibling::span')
    if not course_and_lecturers:
        lecturers= "None"
    else:
        lecturers = course_and_lecturers

    uni_facilities = driver.find_elements_by_xpath('//span[text()= "Facilities" or "Uni Facilities"]/following-sibling::span')
    if not uni_facilities:
        facilities = "None"
    else:
        facilities = uni_facilities

    student_support = driver.find_elements_by_xpath('//span[text()="Student Support"]/following-sibling::span')
    if not student_support:
        support = "None"
    else:
        support = student_support

    with open('uni_scraping.csv', 'a') as file:
            for k in range(len(uni_names)):
                if not department_names:
                    department = "None"
                else:
                    department = department_names
                    file.write(user_name[k].text   ";"   uni_names[k].text   ";"   department[k].text   ";"   date_posted[k].text  
                               ";"   uni_rating[k].get_attribute("class")   ";"   job_prospects[k].get_attribute("class")  
                               ";"   lecturers[k].get_attribute("class")   ";"   facilities[k].get_attribute("class")  
                               ";"   support[k].get_attribute("class")   ";"   uni_rev[k].text   "\n")
            next_page = driver.find_element_by_class_name('mr0')
            next_page.click()
            file.close()
except exceptions.StaleElementReferenceException as e:
    print('e')
    pass
driver.close()

CodePudding user response:

You had a good feeling when you tried if not department_names but it only works if the list is empty. In your case, the issue is that the list is too short. Due to the universitie whithout departments, department_names will be a shorter list than uni_names.

As a result, in you loop for k in range(len(uni_names)): the department[k].text will not always be the department of the uni with the same index, and at some point k will have a greater value than your department list. That's why department[k] will cause an error.

I don't know what is most efficient way to go around this but I think that you could get larger elements with the full details of every uni (the whole rlst_wrap for example), then search in it the details for the uni (with regexp for example). That way you would know when there is no department, and avoid the issue.

CodePudding user response:

Thank you Vimizen for the answer. I did what you suggested and it worked for me. I wrote something like this.

driver = webdriver.Chrome()
driver.get("https://www.whatuni.com/university-course-reviews/?pageno=14")

posts = []

driver.refresh()
post_elements = driver.find_elements_by_xpath('//div[@]')
for post_element_index in range(len(post_elements)):
    post_element = post_elements[post_element_index]
    uni_name = post_element.find_element_by_tag_name('h2')
    try:
        department_name = post_element.find_element_by_tag_name('h3')
        department = department_name
        department = department.text
    except NoSuchElementException:
        department = "aaaaaaaa"
    user_name = post_element.find_element_by_class_name('rev_name')
    postdict = {
        "uni_name": uni_name.text,
        "department": department,
        "user_name": user_name.text
    }
    posts.append(postdict)


print(posts)
driver.close()

Best

  • Related