Home > Software engineering >  selenium.common.exceptions.InvalidArgumentException: Message: invalid argument while iterating throu
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument while iterating throu

Time:02-28

I am scraping a page to get the URLs and then use them to scrape a bunch of info. I'd like to avoid copying and pasting all the time but I cannot find how to make get() to work with the object. The first part of my code works perfectly well but when I get to the part that tries to get the url I get the following error message:

Traceback (most recent call last):
  File "/Users/rcastong/Desktop/imgs/try-creating-object-url.py", line 61, in <module>
    driver4.get(urlworks2) 
  File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
    self.execute(Command.GET, {'url': url})
  File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/Users/rcastong/Library/Python/3.9/lib/python/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
  (Session info: chrome=98.0.4758.109)

Here is part of the code

  #this part works well    
    for number, item in enumerate(imgs2, 1):
            # print('---', number, '---')
        
            img_url = item.get_attribute("href")
            if not img_url:
                print("none")
            else:
                print('"' img_url '",')
        
  # the error happens on driver4.get(urlworks2)        
        for i in range(0,30):
            urlworks = img_url[i]
            urlworks2 = urlworks.encode('ascii', 'ignore').decode('unicode_escape')
            driver4 = webdriver.Chrome()
            driver4.get(urlworks2) 
            def check_exists_by_xpath(xpath):
                try:
                    WebDriverWait(driver3,55).until(EC.presence_of_all_elements_located((By.XPATH, xpath)))
                except TimeoutException:
                    return False
                return True
            
            imgsrc2 = WebDriverWait(driver3,55).until(EC.presence_of_all_elements_located((By.XPATH, "//p[@data-testid='artistName']/ancestor::a[contains(@class,'ChildrenLink')]")))                                                                                                                 
            for number, item in enumerate(imgsrc2, 1):
                # print('---', number, '---')
                artisturls = item.get_attribute("href")
                if not artisturls:
                    print("none")
                else:
                    print('"' artisturls '",')

CodePudding user response:

This error message...

Traceback (most recent call last):
  .
    driver4.get(urlworks2) 
  .
    self.execute(Command.GET, {'url': url})
  .
    self.error_handler.check_response(response)
  .
selenium.common.exceptions.InvalidArgumentException: Message: invalid argument
  (Session info: chrome=98.0.4758.109)

...implies that the url passed as an argument to get() was an argument was invalid.


Deep Dive

With in the first for loop item.get_attribute("href") returns a url string and img_url gets updated at every iteration. So practically img_url remains a string but not a list of url as you assumed. As a result, in the second for loop when you try to iterate over the elements of a string and pass them to get() you see the error InvalidArgumentException: Message: invalid argument.


Demonstartion

As an example the below line of code:

img_url = 'https://www.google.com/'
for i in range(0,5):
    urlworks = img_url[i]
    urlworks2 = urlworks.encode('ascii', 'ignore').decode('unicode_escape')
    print(urlworks2)

prints:

h
t
t
p
s

Solution

Declare a empty list img_url within the global scope and keep on appending the hrefs to the list, so you can iterate the list later.

img_url = []
for number, item in enumerate(imgs2, 1):
    img_url.append(item.get_attribute("href"))

Reference

You can find a couple of relevant detailed discussions in:

  • Related