Home > Software engineering >  Python Scraper - Chrome Driver & Selenium - XPath element ID that keeps changing on refresh?
Python Scraper - Chrome Driver & Selenium - XPath element ID that keeps changing on refresh?

Time:01-16

How do you deal with XPath element ID that keeps changing on refresh?

I have a Python scraper that someone wrote for me couple months ago. For some reason, it stopped working correctly couple days ago. All dropdown menu selections are working fine. It works as it should, until it needs to enter the "Begin Date" and "End Date" specified in an input spreadsheet file(input.xlsx).

I am using the correct Chrome Driver version (108) for the Chrome Browser version (108), all necessary modules installed. The scraper stopped working on all of my computers, so I'm guessing something on the website changed around the start of the new year, although it looks identical visually. I inspected the elements for the date input field and it seems like the ID keeps changing on each refresh.

1st load: input id="idbb"

2nd load: input id="id55"

3rd load & on: another input id.

How would you deal with a variable element like this? Am I even looking at the correct place? The person who wrote it is MIA, so trying to figure out the issue myself. Any help is appreciated!

WebDriverWait(driver, 15).until(
    EC.presence_of_element_located((By.XPATH, "//input[@id='id58']"))).clear()
time.sleep(.5)

WebDriverWait(driver, 15).until(
    EC.presence_of_element_located((By.XPATH, "//input[@id='id58']"))).send_keys(row[0])
time.sleep(.5)

WebDriverWait(driver, 15).until(
    EC.presence_of_element_located((By.XPATH, "//input[@id='id5a']"))).clear()
time.sleep(.5)
WebDriverWait(driver, 15).until(
    EC.presence_of_element_located((By.XPATH, "//input[@id='id5a']"))).send_keys(row[1])
time.sleep(.5)

# input("All filters applied")

enter image description here

30sec video running the script, failing to input the dates, then crashing. https://drive.google.com/file/d/1NIYJKKkahAeYt7GZvTU6dlbf6RMpLiYo/view?usp=share_link

Website being scraped: https://www.masscourts.org/

  • Tried re-installing Chrome Driver, no difference.
  • Tried running the script on different computers, same issue.

CodePudding user response:

If the ID keeps changing, you can try to reach the element without using the ID at all.

In the inspector window, right-click on the element, copy -> copy full XPath. This should give a longer and more precise XPath that specifies the element using tag names, such as:

html/body/div[3]/div[2]/div/div[1]/div[3]/div[1]/div/div[2]/div[1]/ul/li[2]

This means that identifiers such as tag IDs and classes which keep changing in your case will not be used. However, note that this solution is not very robust since the layout of the page changing can break the script.

  • Related