Home > Enterprise >  Crawling issue but get loading only
Crawling issue but get loading only

Time:07-05

I tried the following code, but I only get loading instead

My code is


    driver = webdriver.Chrome()
    wait = WebDriverWait(driver, 20) 
    driver.get("https://www.college.upenn.edu/majors-list")
    #print(driver.title)
    td5 = pq(driver.page_source)

The output is like this

Penn List of College Majors\nLoading...  List of College Majors\nLoading... List of College Majors\nLoading... 

I need to get the College Major list, please help me.

Already tried Pyquery and Selenuim but failed.

The information that I want!

CodePudding user response:

wait = WebDriverWait(driver, 30)
driver.get("https://www.college.upenn.edu/majors-list")
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@title='Fission Embed']")))
elems=[x.text for x in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".major_header")))]
print(elems)
  1. Your element is in an iframe switch to it.
  2. Your 63 elements within major lists just wait for all elements with the class major_header and grab text there is an extra span but it also has no text.

Output:

['Africana Studies', 'Ancient History', 'Anthropology', 'Architecture', 'Asian American Studies (minor)', 'Biochemistry', 'Biology', 'Biophysics', 'Chemistry', 'Cinema and Media Studies', 'Classical Studies', 'Cognitive Science', 'Communication', 'Comparative Literature', 'Criminology', 'Design', 'Digital Humanities', 'Earth Science', 'East Asian Languages and Civilizations', 'Economics', 'Engineering Major', 'English', 'Environmental Studies', 'Fine Arts', 'French and Francophone Studies', "Gender, Sexuality and Women's Studies", 'German', 'Health and Societies', 'Hispanic Studies', 'History', 'History of Art', 'Huntsman Program in International Studies and Business', 'Individualized Major', 'International Relations', 'Italian Studies', 'Jewish Studies', 'Latin American and Latinx Studies', 'Linguistics', 'Logic, Information and Computation', 'Mathematical Economics', 'Mathematics', 'Modern Middle Eastern Studies', 'Music', 'Near Eastern Languages and Civilizations', 'Neuroscience', 'Nutrition Science', 'Philosophy', 'Philosophy, Politics and Economics', 'Physics and Astronomy', 'Political Science', 'Psychology', 'Religious Studies', 'Romance Languages Dual Major', 'Russian and East European Studies', 'Science, Technology and Society', 'Sociology', 'South Asia Studies', 'Theatre Arts', 'Urban Studies', 'Vagelos Integrated Program in Energy Research', 'Vagelos Program in Life Sciences and Management', 'Vagelos Scholars Program in Molecular Life Sciences', 'Visual Studies']

Imports:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
  • Related