I have a script where I get some data from a web page in Selenium using Python. However for some of the pages I'm scraping through, some of the elements are not present and this throws a NoSuchElementException
error.
How do I return a null value for when the element is not present. I reied using or None
but it still throws the error. Also, the elements following this one also depend on the presence of the first one as shown below:
metadata = driver.find_element(By.PARTIAL_LINK_TEXT, 'Complete metadata on ') or None
metadata_url = metadata.get_attribute('href') or None
dataset_id = metadata_url.split('metadata/')[1] or None
output_dict['datasets'].append({'title': dataset_title, 'url': dataset_link, 'metadata_url': metadata_url})
The element that is missing from some pages is the metadata
.
I'm looking to populate the metadata_url
field as null
.
Please assist with this.
CodePudding user response:
This code:
var = function_call(param) or None
runs the function, gets the output, transforms this output into a boolean (see truthyness in python), and if that output is False
, then it sets that variable to None
instead.
However, the function (find_element
, here) doesn't return a Falsy value, but raises a NoSuchElementException
exception if it doesn't find anything.
That means you need to use a try except
block in your code instead of the or None
try:
metadata = driver.find_element(By.PARTIAL_LINK_TEXT, 'Complete metadata on ')
# If we are at this line, then find_element found something, and we
# can set values for our url and dataset id
metadata_url = metadata.get_attribute('href') # this will be None if there's no href attribute
dataset_id = metadata_url.split('metadata/')[1]
except selenium.common.exceptions.NoSuchElementException:
metadata_url = None
dataset_id = None
In the case when metadata_url
is None
, you will need to handle that case, because metadata_url.split
will not work, it will raise a AttributeError: 'NoneType' object has no attribute 'split'
.
CodePudding user response:
I guess you're trying to use JS syntax in Python. You'll have to instead check if the element exists first.
if not metadata_url:
return None