I have tried everything to get this task done, but nothing works. This is my code that I have written for web_scraping The website is some what hierarchical, I am using selenium so that i can see each click is fetching the data.
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from webdriver_manager.chrome import ChromeDriverManager
ITEM = []
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get("https://www.google.com")
search_url="https://northladder.com/en/ae/electronics/laptop/dell/g-series/v1/5e2c888401c848ac4f695c87"
driver.get(search_url)
clickable = driver.find_elements(By.XPATH, "//div[@class='select-box-main align-items-center']/label/div[@class='select-item m-1']")
n = len(clickable)
for i in range(n):
items =[]
items.append(clickable[i].text)
clickable[i].click()
time.sleep(2)
generation = driver.find_elements(By.XPATH, "//label/div[@class='select-item m-1']")
g1 = len(generation)
for g in range(g1):
gen = []
gen.append(generation[g].text)
generation[g].click()
time.sleep(2)
ram = driver.find_elements(By.XPATH, "//label/div[@class='select-item m-1']")
r1 = len(ram)
for r in range(r1):
ram_list= []
ram_list.append(ram[r].text)
gen.append(ram_list)
ram[r].click()
time.sleep(2)
model = driver.find_elements(By.XPATH, "//label/div[@class='select-item m-1']")
m1 = len(model)
for m in range(m1):
model_list = []
model_list.append(model[m].text)
ram_list.append(model_list)
model[m].click()
screensize= driver.find_elements(By.XPATH, "//label/div[@class='select-item m-1']")
s1 = len(screensize)
for s in range(s1):
screen_list= []
screen_list.append(screensize[s].text)
ram_list.append(screen_list)
screensize[s].click()
memory = driver.find_elements(By.XPATH, "//label/div[@class='select-item m-1']")
mm1 = len(memory)
for mm in range(mm1):
memory_list =[]
memory_list.append(memory[mm].text)
screen_list.append(memory_list)
ITEM.append(items)
print(ITEM)
THis is the error
Traceback (most recent call last):
selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
(Session info: chrome=102.0.5005.115)
Please suggest me something
CodePudding user response:
Selenium doesn't give real object but references to objects in browser memory - and when you click()
then it loads new objects and this change objects in browser memory and later references can't access previous objects.
If you would click links to new pages sthen you have to get all links as strings and use get(url)
instead of click()
.
In other situation you may need to use alway find_elements() before every click()
(to get again correct references) and use index to get next element.
I see other problem - in all for
-loops you use the same absolute xpath "//label/div[@class='select-item m-1']"
so it always search all elements - starting at processers
. You should use relative xpath to search only generations, or only ram, etc.. You could use print() to see what you really get with current xpath.
In this code I always search first all sections with labels
sections = driver.find_elements(By.XPATH, '//div[@]')
and later I use relative xpath (staring with dot
- .//
) to search only in selected section
generation = sections[1].find_elements(By.XPATH, ".//label/div[@class='select-item m-1']")
EDIT:
When item is clicked then it scroll page down and it hides top sections (processor, generation) and later Chrome has problem to click this hidden elements - so I added driver.execute_script("arguments[0].click()", processor[i])
instead of processor[i].click()
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
#from webdriver_manager.firefox import GeckoDriverManager
import time
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
#driver = webdriver.Firefox(service=Service(GeckoDriverManager().install()))
search_url = "https://northladder.com/en/ae/electronics/laptop/dell/g-series/v1/5e2c888401c848ac4f695c87"
driver.get(search_url)
time.sleep(3)
ALL_ITEMS = []
item = ["", "", "", "", "", ""]
#item = {"processor": "", "generation": "", ...}
# ---
sections = driver.find_elements(By.XPATH, '//div[@]')
processor = sections[0].find_elements(By.XPATH, ".//label/div[@class='select-item m-1']")
print('processor:', len(processor), '|', [x.text for x in processor])
for i in range(len(processor)):
item[0] = processor[i].text
#processor[i].click()
driver.execute_script("arguments[0].click()", processor[i])
time.sleep(2)
sections = driver.find_elements(By.XPATH, '//div[@]')
generation = sections[1].find_elements(By.XPATH, ".//label/div[@class='select-item m-1']")
print('generation:', len(generation), '|', [x.text for x in generation])
for g in range(len(generation)):
item[1] = generation[g].text
#generation[g].click()
driver.execute_script("arguments[0].click()", generation[g])
time.sleep(2)
sections = driver.find_elements(By.XPATH, '//div[@]')
ram = sections[2].find_elements(By.XPATH, ".//label/div[@class='select-item m-1']")
print('ram:', len(ram), '|', [x.text for x in ram])
for r in range(len(ram)):
item[2] = ram[r].text
#ram[r].click()
driver.execute_script("arguments[0].click()", ram[r])
time.sleep(2)
sections = driver.find_elements(By.XPATH, '//div[@]')
model = sections[3].find_elements(By.XPATH, ".//label/div[@class='select-item m-1']")
print('model:', len(model), '|', [x.text for x in model])
for m in range(len(model)):
item[3] = model[m].text
#model[m].click()
driver.execute_script("arguments[0].click()", model[m])
time.sleep(2)
sections = driver.find_elements(By.XPATH, '//div[@]')
screensize = sections[4].find_elements(By.XPATH, ".//label/div[@class='select-item m-1']")
print('screensize:', len(screensize), '|', [x.text for x in screensize])
for s in range(len(screensize)):
item[4] = screensize[s].text
#screensize[s].click()
driver.execute_script("arguments[0].click()", screensize[s])
time.sleep(2)
sections = driver.find_elements(By.XPATH, '//div[@]')
drive = sections[5].find_elements(By.XPATH, ".//label/div[@class='select-item m-1']")
print('drive:', len(drive), '|', [x.text for x in drive])
for d in range(len(drive)):
item[5] = drive[d].text
ALL_ITEMS.append(item.copy()) # duplicate `item` because I will use the same list to get new results
print(item)
for item in ALL_ITEMS:
print(item)
Result:
['Core i7', '10th Gen', '32 GB', 'G7', '17 Inch', '1 TB SSD']
['Core i7', '10th Gen', '16 GB', 'G5', '17 Inch', '512 GB SSD']
['Core i7', '10th Gen', '16 GB', 'G3 3500', '15 Inch', '1 TB 256 GB SSD']
['Core i7', '9th Gen', '16 GB', 'G7 E1297', '15 Inch', '1 TB 256 GB SSD']
['Core i7', '9th Gen', '16 GB', 'G5 5590', '15 Inch', '1 TB 512 GB SSD']
['Core i7', '9th Gen', '16 GB', 'G5 5590', '15 Inch', '1 TB 256 GB']
['Core i7', '9th Gen', '16 GB', 'G5 5590', '15 Inch', '1 TB']
['Core i7', '9th Gen', '16 GB', 'G3 3590', '15 Inch', '512 GB']
['Core i7', '9th Gen', '16 GB', 'G3 3590', '15 Inch', '1 TB 256 GB SSD']
['Core i7', '8th Gen', '16 GB', 'G5 5587', '15 Inch', '1 TB 256 GB SSD']
['Core i7', '8th Gen', '16 GB', 'G3 3579', '15 Inch', '256 GB']
['Core i7', '8th Gen', '16 GB', 'G3 3579', '15 Inch', '1 TB 512 GB SSD']
['Core i7', '8th Gen', '8 GB', 'G3 3579', '15 Inch', '1 TB']
['Core i5', '10th Gen', '16 GB', 'G3 3500', '15 Inch', '1 TB 256 GB SSD']
['Core i5', '10th Gen', '8 GB', 'G3 3500', '15 Inch', '256 GB SSD']
['Core i5', '8th Gen', '8 GB', 'G3 3579', '15 Inch', '1 TB']
CodePudding user response:
DOM Issue:
This probably to do with the element not being attached to the DOM. See Selenium Docs on this error.
A common technique used for simulating a tabbed UI in a web app is to prepare DIVs for each tab, but only attach one at a time, storing the rest in variables. In this case, it's entirely possible that your code might have a reference to an element that is no longer attached to the DOM (that is, that has an ancestor which is document.documentElement).
If WebDriver throws a stale element exception in this case, even though the element still exists, the reference is lost. You should discard the current reference you hold and replace it, possibly by locating the element again once it is attached to the DOM.
Your use of .click() commands changes the DOM structure of the webpage. Make sure you re-establish your variables after every DOM change to have the element be a part of the current DOM structure so that you can interact with the elements.