Home > front end >  Use Selenium to click on items in a UL one by one and scrape some information
Use Selenium to click on items in a UL one by one and scrape some information

Time:11-26

i'm just practicing scraping with selenium

inspect element image

What i would like to do is go through each item in the unordered list

href

get every list item

 wait.until(EC.presence_of_element_located((By.XPATH, "//*[@id='main_content']/ul" )))
ul_element = driver.find_element(By.XPATH, "//*[@id='main_content']/ul")

all_li_element = ul_element.find_elements(By.CSS_SELECTOR, "li")

then after i got the list items to go to each one and scrape some data

is there a better way because the way i'm thinking about it, it will turn into a nested list

CodePudding user response:

Probably this can be done much faster, without opening all those links, but not with Selenium. Selenium imitates human GUI actions, so as a human do scrape all that data you do need to open all those links and read the data on the opened pages. However this can be done much clearer and faster via API calls or with Beautifulsoup etc. But again, these are not Selenium UI approaches.

CodePudding user response:

the way i'm thinking about it, it will turn into a nested list

unless you concatenate all details into one string [which I would not advise], or you only need one detail, there will have to be some nesting to whatever data structure you have the output as; since you a set of details for a list of items, there will have to be at least one level of nesting [list of items -> set of details]

However, it's not that complicated - you can just make a list of dictionaries with the details you want from each card [including, and especially, the link], and then go through that list of dictionaries and add to each dictionary by going to the link and scraping the rest of the info.


First, I like to separate out the actual detail-extraction into a function that I can reuse.

[This is a simplified version of another function I often use when scraping; if interested, crunchyCards.csv

  • Related