From my understanding of OOP in Python, if there is no attribute named xyz
on an object a
, then invoking a.xyz
raises "AttributeError."
But in beautifulsoup, if we call any arbitrary attribute on an object of type Tag
, we always get some output.
For instance,
>>> from bs4 import BeautifulSoup
>>> import requests
>>> html = requests.get("https://wwww.bing.com").text
>>> tag = BeautifulSoup(html, 'html5lib')
>>> print(tag.title) # makes sense
<title>Bing</title>
>>> print(tag.no_such_attrib) # should throw AttributeError
None
Here, it is
implied that tag_obj.anything.something
gets executed as tag_obj.find("anything").find("something")
. But I just can't imagine which type of construct transforms the former form into the later one.
CodePudding user response:
No imagination necessary. We can just look at the source: (abbreviated by me)
class Tag(PageElement):
...
def __getattr__(self, tag):
"""Calling tag.subtag is the same as calling tag.find(name="subtag")"""
if not tag.startswith("__") and not tag == "contents":
return self.find(tag)
raise AttributeError("'%s' object has no attribute '%s'" % (self.__class__, tag))
See the Python data model documentation for more information about attribute access.
Here is another very simple illustration of how you can override attribute access to get None
instead of an AttributeError
, when an attribute does not exist on an object:
class Foo:
def __getattr__(self, item: str):
return self.__dict__.get(item)
if __name__ == "__main__":
foo = Foo()
foo.bar = 1
print(foo.bar) # 1
print(foo.baz) # None
Making use of dict.get
defaulting to None
here.
In short: Attribute access is a method call. Always. Though not always via the same method.
CodePudding user response:
Are you familiar with getattr(obj,"no_such_attrib","xxx")
form? xxx can be anything: None, an empty dict, a default value. Even another function call. It needs no complicated __getattr__
method and you can vary what you are up at the point of call.
So,yes find_something()
. Not sure if it gets called anyway if nothing is found (I assume so). If that is undesirable, boolean short circuiting helps:
X = getattr(obj,"no_such_attrib",None) or find_something()
Within a an predefined set of classes, __getattr__
is likely to be doing the job- as stated in DF’s answer- , as the provider. As a user, or just to avoid edge case work on your own classes for particular internal uses, getattr
is low effort
(and avoids recursion errors that often whack you on __getattr__
):
def __getattr__(self, attrname):
" sloppy __getattr__ love recursion error :-( "
if attrname = "missing_attribute"):
return self.another_missing_attribute
elif attrname = "another_missing_attribute"):
return self.missing_attribute
else:
raise AttributeError(attrname)
Note: similarly dict
subclasses can implement __missing__
.