Home > Back-end >  Add "collection" of attributes directly to top level of a class
Add "collection" of attributes directly to top level of a class

Time:11-30

I am trying to capture (S3) logs in a structured way. I am capturing the access-related elements with this type of tuple:

class _Access(NamedTuple):
    time: datetime
    ip: str
    actor: str
    request_id: str
    action: str
    key: str
    request_uri: str
    status: int
    error_code: str

I then have a class that uses this named tuple as follows (edited just down to relevant code):

class Logs:
    def __init__(self, log: str):
        raw_logs = match(S3_LOG_REGEX, log)
        if raw_logs is None:
            raise FormatError(log)
        logs = raw_logs.groups()
        timestamp = datetime.strptime(logs[2], "%d/%b/%Y:%H:%M:%S %z")
        http_status = int(logs[9])
        access = _Access(
            timestamp,
            logs[3],
            logs[4],
            logs[5],
            logs[6],
            logs[7],
            logs[8],
            http_status,
            logs[10],
        )
        self.access = access

The problem is that it is too verbose when I now want to use it:

>>> log_struct = Logs(raw_log)
>>> log_struct.access.action # I don't want to have to add `access`

As I mention above, I'd rather be able to do something like this:

>>> log_struct = Logs(raw_log)
>>> log_struct.action

But I still want to have this clean named tuple called _Access. How can I make everything from access available at the top level?

Specifically, I have this line:

        self.access = access

which is giving me that extra "layer" that I don't want. I'd like to be able to "unpack" it somehow, similar to how we can unpack arguments by passing the star in *args. But I'm not sure how I can unpack the tuple in this case.

CodePudding user response:

What you really need for your use case is an alternative constructor for your NamedTuple subclass to parse a string of a log entry into respective fields, which can be done by creating a class method that calls the __new__ method with arguments parsed from the input string.

Using just the fields of ip and action as a simplified example:

from typing import NamedTuple

class Logs(NamedTuple):
    ip: str
    action: str

    @classmethod
    def parse(cls, log: str) -> 'Logs':
        return cls.__new__(cls, *log.split())

log_struct  = Logs.parse('192.168.1.1 GET')
print(log_struct)
print(log_struct.ip)
print(log_struct.action)

This outputs:

Logs(ip='192.168.1.1', action='GET')
192.168.1.1
GET

CodePudding user response:

I agree with @blhsing and recommend that solution. This is assuming that there are not extra attributes required to be apply to the named tuple (say storing the raw log value).

If you really need the object to remain composed, another way to support accessing the properties of the _Access class would be to override the __getattr__ method [PEP 562] of Logs

The __getattr__ function at the module level should accept one argument which is the name of an attribute and return the computed value or raise an AttributeError:

def __getattr__(name: str) -> Any: ...

If an attribute is not found on a module object through the normal lookup (i.e. object.__getattribute__), then __getattr__ is searched in the module __dict__ before raising an AttributeError. If found, it is called with the attribute name and the result is returned. Looking up a name as a module global will bypass module __getattr__. This is intentional, otherwise calling __getattr__ for builtins will significantly harm performance.

E.g.

from typing import NamedTuple, Any


class _Access(NamedTuple):
    foo: str
    bar: str


class Logs:
    def __init__(self, log: str) -> None:
        self.log = log
        self.access = _Access(*log.split())

    def __getattr__(self, name: str) -> Any:
        return getattr(self.access, name)

When you request an attribute of Logs which is not present it will try to access the attribute through the Logs.access attribute. Meaning you can write code like this:

logs = Logs("fizz buzz")
print(f"{logs.log=}, {logs.foo=}, {logs.bar=}")
logs.log='fizz buzz', logs.foo='fizz', logs.bar='buzz'

Note that this would not preserve the typing information through to the Logs object in most static analyzers and autocompletes. That to me would be a compelling enough reason not to do this, and continue to use the more verbose way of accessing values as you describe in your question.

If you still really need this, and want to remain type safe. Then I would add properties to the Logs class which fetch from the _Access object.

class Logs:
    def __init__(self, log: str) -> None:
        self.log = log
        self.access = _Access(*log.split())

    @property
    def foo(self) -> str:
        return self.access.foo

    @property
    def bar(self) -> str:
        return self.access.bar

This avoids the type safty issues, and depending on how much code you write using the Logs instances, still can cut down on other boilerplate dramatically.

  • Related