I am scraping a page to get the URLs and then use them to scrape a bunch of info. I'd like to avoid copying and pasting all the time but I cannot find how to make get() to work with the object. The first part of my code works perfectly well but when I get to the part that tries to get the url I get the following error message:
Traceback (most recent call last):
File "/Users/rcastong/Desktop/imgs/try-creating-object-url.py", line 61, in <module>
driver4.get(urlworks2)
File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
(Session info: chrome=98.0.4758.109)
Here is part of the code
#this part works well
for number, item in enumerate(imgs2, 1):
# print('---', number, '---')
img_url = item.get_attribute("href")
if not img_url:
print("none")
else:
print('"' img_url '",')
# the error happens on driver4.get(urlworks2)
for i in range(0,30):
urlworks = img_url[i]
urlworks2 = urlworks.encode('ascii', 'ignore').decode('unicode_escape')
driver4 = webdriver.Chrome()
driver4.get(urlworks2)
def check_exists_by_xpath(xpath):
try:
WebDriverWait(driver3,55).until(EC.presence_of_all_elements_located((By.XPATH, xpath)))
except TimeoutException:
return False
return True
imgsrc2 = WebDriverWait(driver3,55).until(EC.presence_of_all_elements_located((By.XPATH, "//p[@data-testid='artistName']/ancestor::a[contains(@class,'ChildrenLink')]")))
for number, item in enumerate(imgsrc2, 1):
# print('---', number, '---')
artisturls = item.get_attribute("href")
if not artisturls:
print("none")
else:
print('"' artisturls '",')
CodePudding user response:
This error message...
Traceback (most recent call last):
.
driver4.get(urlworks2)
.
self.execute(Command.GET, {'url': url})
.
self.error_handler.check_response(response)
.
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
(Session info: chrome=98.0.4758.109)
...implies that the url
passed as an argument to get()
was an argument was invalid.
Deep Dive
With in the first for
loop item.get_attribute("href")
returns a url string and img_url
gets updated at every iteration. So practically img_url remains a string but not a list of url as you assumed. As a result, in the second for
loop when you try to iterate over the elements of a string and pass them to get()
you see the error InvalidArgumentException: Message: invalid argument
.
Demonstartion
As an example the below line of code:
img_url = 'https://www.google.com/'
for i in range(0,5):
urlworks = img_url[i]
urlworks2 = urlworks.encode('ascii', 'ignore').decode('unicode_escape')
print(urlworks2)
prints:
h
t
t
p
s
Solution
Declare a empty list img_url
within the global scope and keep on appending the hrefs to the list, so you can iterate the list later.
img_url = []
for number, item in enumerate(imgs2, 1):
img_url.append(item.get_attribute("href"))
Reference
You can find a couple of relevant detailed discussions in: