Home > Software engineering >  Python Web Scraping error - Reading from JSON- IndexError: list index out of range - how do I ignore
Python Web Scraping error - Reading from JSON- IndexError: list index out of range - how do I ignore

Time:04-21

I am performing web scraping via Python \ Selenium \ Chrome headless driver. I am reading the results from JSON - here is my code:

CustId=500
while (CustId<=510):
  
  print(CustId)

  # Part 1: Customer REST call:
  urlg = f'https://mywebsite/customerRest/show/?id={CustId}'
  driver.get(urlg)

  soup = BeautifulSoup(driver.page_source,"lxml")

  dict_from_json = json.loads(soup.find("body").text)
  # print(dict_from_json)

  #try:
 
  CustID = (dict_from_json['customerAddressCreateCommand']['customerId'])

  # Addr = (dict_from_json['customerShowCommand']['customerAddressShowCommandSet'][0]['addressDisplayName'])

  writefunction()

  CustId = CustId 1

The issue is sometimes 'addressDisplayName' will be present in the result set and sometimes not. If its not, it errors with the error:

IndexError: list index out of range

Which makes sense, as it doesn't exist. How do I ignore this though - so if 'addressDisplayName' doesn't exist just continue with the loop? I've tried using a TRY but the code still stops executing.

CodePudding user response:

If you get an IndexError (with an index of '0') it means that your list is empty. So it is one step in the path earlier (otherwise you'd get a KeyError if 'addressDisplayName' was missing from the dict).

You can check if the list has elements:

if dict_from_json['customerShowCommand']['customerAddressShowCommandSet']:
    # get the data

Otherwise you can indeed use try..except:

try:
    # get the data
except IndexError, KeyError:
    # handle missing data

CodePudding user response:

try..except block should resolved your issue.

CustId=500
while (CustId<=510):
  
  print(CustId)

  # Part 1: Customer REST call:
  urlg = f'https://mywebsite/customerRest/show/?id={CustId}'
  driver.get(urlg)

  soup = BeautifulSoup(driver.page_source,"lxml")

  dict_from_json = json.loads(soup.find("body").text)
  # print(dict_from_json)

  
 
  CustID = (dict_from_json['customerAddressCreateCommand']['customerId'])
  try:
      Addr = (dict_from_json['customerShowCommand']['customerAddressShowCommandSet'][0]'addressDisplayName'])

  except:
      Addr ="NaN"

  CustId = CustId 1 
  • Related