Home > Mobile >  Web scraping table with missing attributes via Python Selenium and Pandas
Web scraping table with missing attributes via Python Selenium and Pandas

Time:08-24

Scraping a table from a website. But encountering empty cells during the process. Below try-except block is screwing up the data at the end. Also dont want to exclude the complete row, as the information is still relevant even when the some attribute is missing.

try:
    for i in range(10):
        data = {'ID': IDs[i].get_attribute('textContent'),
                'holder': holder[i].get_attribute('textContent'),
                'view': view[i].get_attribute('textContent'),
                'material': material[i].get_attribute('textContent'),
                'Addons': addOns[i].get_attribute('textContent'),
                'link': link[i].get_attribute('href')}
        list.append(data)
except:
    print('Error')

Any ideas?

CodePudding user response:

What you can do is place all the objects to which you want to access the attributes to in a dictionary like this:

objects={"IDs":IDs,"holder":holder,"view":view,"material":material...]

Then you can iterate through this dictionary and if the specific attribute does not exist, simply append an empty string to the value corresponding to the dict key. Something like this:

the_keys=list(objects.keys())
for i in range(len(objects["IDs"])): #I assume the ID field will never be empty
   #so making a for loop like this is better since you iterate only through 
   #existing objects
   data={}
   
   for j in range(len(objects)):
      try:
         data[the_keys[j]]=objects[the_keys[j]][i].get_attribute('textContent')
      except Exception as e:
         print("Exception: {}".format(e))
         data[the_keys[j]]="" #this means we had an exception
         #it is better to catch the specific exception that is thrown
         #when the attribute of the element does not exist but I don't know what it is
   list.append(data)

I don't know if this code works since I didn't try it but it should give you an overall idea on how to solve your problem.

If you have any questions, doubts, or concerns please ask away.

Edit: To get another object's attribute like the href you can simply include an if statement checking the value of the key. I also realized you can just loop through the objects dictionary getting the keys and values instead of accessing each key and value by an index. You could change the inner loop to be like this:

for key,value in objects.items():
   try:
      if key=="link":
         data[key]=objects[key][i].get_attribute("href")
      else:
         data[key]=objects[key][i].get_attribute("textContent")
   except Exception as e:
      print("Error: ",e)
      data[key]=""
  • Related