I've tried scouring the bowels of the internet for an answer to this particular puzzle; however, I have not had too much luck with getting insight into this specific situation.
So, I am currently trying to scrape the last four or so pages of last.fm entries for "Jazz Metal" (see the URL).
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.headless = True
driver = webdriver.Firefox(options = options)
driver.get('https://www.last.fm/tag/jazz metal/artists?page=20')
super_list = []
wait = WebDriverWait(driver, 10)
while True:
try:
entries = wait.until(
EC.presence_of_element_located((By.CLASS_NAME, 'grid-items-section'))
)
grid = driver.find_element(By.CLASS_NAME, 'grid-items-section')
grid_children = grid.find_elements(By.TAG_NAME, 'li')
super_list.append(grid_children)
pagination = wait.until(
EC.presence_of_element_located((By.CLASS_NAME, 'pagination-next'))
)
pagination.click()
except:
break
The thing is, super_list.append(grid_children)
is not very helpful because once the while loop ends and I'm working with super_list
outside of that scope, I can no longer call the .text
method to get the contents and am only left with a list that's nearly indecipherable to a human.
<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="11b49c8e-eec7-45f2->9e2a-e2034b93077a", element="ffe29b8e-5b65-4df3-985e-68e501e3a546")>
But, if I change super_list.append(grid_children)
to super_list.append([entry.text for entry in grid_children])
, the entire cookie crumbles. What gives? Also, if I remove super_list.append(grid_children)
entirely, then it visits every page (yes, as it currently stands, it doesn't even visit the last page)!
The plot thickens, as if I include
finally:
driver.quit()
then only the first page is visited. Can somebody please help me with this black magic?
CodePudding user response:
Recognize that super_list
is a 2D list. To call .text
, you need to use a 2D index. Try printing something at the end like
print(super_list[-1][-1].text)
Now .text
should work normally.
CodePudding user response:
Welp, I've officially given up on Selenium. I'm gonna go back to requests-html
. Sorry to disappoint anybody who came here looking for a solution.