I have a simple exmaple of the problem. I run driver.find_elements(By.ID, "thumbnail") It works, I click on a random element and I rescrape the info again in a loop and the 2nd time, i always get the exact same results:
driver.get("https://www.somepage.com")
time.sleep(7)
items = []
for i in range(3):
print("LOOP #: " str(i))
random_number = random.randint(1, 5)
items = driver.find_elements(By.ID, "thumbnail")
url = i.get_attribute("href")
print(str(url))
items[random_number].click()
time.sleep(100)
OUTPUT
LOOP #: 0
URL 1
URL 2
URL 3
URL 4
LOOP #: 1
URL 1
URL 2
URL 3
URL 4
LOOP #: 2
URL 1
URL 2
URL 3
URL 4
The second loop should have different URL. The find_elements(By.ID, "thumbnail")
still applies
I don't know what I'm doing wrong.
I even tried to add items.clear()
at the end of the loop, same result.
CodePudding user response:
The below answer pertains to YouTube
since it was given as an example when asked.
When YouTube
opens, it would have thumbnail
id, and there are a good collection of those thumbnails. So the strategy is to iterate in rhe range of 3 and in that loop, for each iteration, collect all the elements with the id thumbnail
and select a random one and fetch it's href
and then click on it. The question here now is how to reiterate: There are 2 options: (1) Continue with the click and select one of the options (thumbnail
i suppose) from the left pane, or, (2) click on homepage (YouTube
icon) and then again continue the iteration process.
I went with the 2nd option, and here it the code for it:
driver.get('https://www.youtube.com/')
for i in range(3):
print("LOOP #: " str(i))
time.sleep(10)
items = driver.find_elements(By.ID, "thumbnail")
# here, instead of selecting from the items, you are trying to fetch the attribute from i, which is not an element at all and it didn't work for me.
# I , instead, fetched the href from items stored it in a variable, and clicked on it, then clicked on homepage and reiterated the process
rand = random.choice(items)
print(rand.get_attribute('href'))
rand.click()
time.sleep(3)
driver.find_element(By.XPATH, "(//*[@title='YouTube Home'])[1]").click()
driver.quit()
Output:
LOOP #: 0
https://www.youtube.com/watch?v=YIKz49-aGas
LOOP #: 1
https://www.youtube.com/watch?v=51Qs0Ej2RUc
LOOP #: 2
https://www.youtube.com/watch?v=OeShsZPOP-s
Process finished with exit code 0
Note: You may replace time.sleep
with better explicit wait
like webdriverwait
if you wish for a robust code. Having said that, YouTube being a Google property, would have a great randomization in the element attributes and gets flaky often. Also, the bot would get detected if there are too many requests.
UPDATED ANSWER:
Updated answer to click on the right pane thumbnails after clicking on the front page thumbnail
driver.maximize_window()
driver.get('https://www.youtube.com/')
time.sleep(10)
items = driver.find_elements(By.ID, "thumbnail")
rand = random.choice(items)
print(rand.get_attribute('href'))
rand.click()
time.sleep(3)
for i in range(3):
print(f"Loop#: {str(i)}")
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "movie_player")))
yt_left_pane_items = driver.find_elements(By.XPATH, "//*[@id='items']//*[@id='thumbnail']")
rand_left_pane = random.choice(yt_left_pane_items)
print(rand_left_pane.get_attribute('href'))
rand_left_pane.click()
time.sleep(5)
driver.quit()
Imports:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Output:
https://www.youtube.com/watch?v=9YSbflKeOZQ
Loop#: 0
https://www.youtube.com/watch?v=ENOEgKeI_D0
Loop#: 1
https://www.youtube.com/watch?v=PeByUAhHXqs
Loop#: 2
https://www.youtube.com/watch?v=GUHfY84weMw
Process finished with exit code 0