Home > Net >  How can we write a `__getitem__` method which accepts any iterable as input, and chains the items to
How can we write a `__getitem__` method which accepts any iterable as input, and chains the items to

Time:08-13

Can we write class named CookieJar having a __getitem__ method defined so no longer matters if:

  • many indicies are shoved inside of a single call to __getitem__ e.g. myArray[x, y, z]
  • if __getitem__ is called many different times for each index? myArray[x][y][z]

How can we write __getitem__ to accept both iterables (such as (5, 2)) and non-iterables (5 and 2 separately).

In the following table, I want the code in the column at left to exhibit the same outward behavior as the code in the column at right.

How can we accomplish this?

 ---------------------------- ------------------------------- 
|  cookie_jar[1][2][3]       |  cookie_jar[(1, 2, 3)]        |
 ---------------------------- ------------------------------- 
|  cookie_jar[x][y]          |  cookie_jar[(x, y)]           |
 ---------------------------- ------------------------------- 
|  cookie_jar[99]            |  cookie_jar[(99,)]            |
 ---------------------------- ------------------------------- 
|  cookie_jar[99]            |  cookie_jar[[[[99]]]          |
 ---------------------------- ------------------------------- 
|  cookie_jar[1][2][3]       |  cookie_jar[1, 2][3]          |
 ---------------------------- ------------------------------- 
|  cookie_jar[1][2][3]       |  cookie_jar[[1, [2]], [3]]    |
 ---------------------------- ------------------------------- 
|  cookie_jar[1][2][3]       |  cookie_jar[1, 2, 3]          |
 ---------------------------- ------------------------------- 
|  cookie_jar[3][11][19]     |  cookie_jar[3:20:8]           |
 ---------------------------- ------------------------------- 
|  cookie_jar[3][11][19]     |  cookie_jar[range(3, 20, 8)]  |
 ---------------------------- ------------------------------- 

In an answer to this question you will write a __getitem__ method which accepts three types of input:

Name of Input Type Example(s) of Input Type
slice objects 10:40:3 or slice(10, 40, 3)
leaf nodes 912, 1.11, "boa", "anaconda", "water moccasin", "viper"
shallow iterables [1, 2, 3, "apple", "orange", "cherry", "kiwi"]
deep iterables [1, 2, [3, 4, [5],['hi']], [6, [[[7, 'hello']]]]]

Maybe we have something like this:

import functools

class CookieJar:

    def __init__(self, dct:dict):
        self._dct = dct
        # self._dct

    class CookieKey:
        def __init__(self, x):
            self._x = x
           
        def unwrap(self):
            return self._x

    class ShallowKey(CookieKey):
        # `self._x` 
        #     is something like 
        # `[1, 2, 3, 4]` 

    class DeepKey(CookieKey): 
        # `self._x` 
        #     is something like 
        # `[1, [3, [5],['hi']], [6, [[[7, 'hello']]]]]` 

    class LeafKey(CookieKey):
        """
            `unwrap()` returns something like `12`
        """
        # `self._x` is something like `12`

    @classmethod
    def key_factory(key:object):
        # factory?  
        # should return a `CookieKey`
        # returns one of: 
        #     `LeafKey`
        #     `ShallowKey`
        #     `DeepKey`
        raise NotImplementedError()
    
    @functools.singledispatchmethod
    def _getitem(self, key:CookieKey):
        raise NotImplementedError()

    @_getitem.register
    def _getitem_for_slices(self, slyce:type(0:1)):
        """ Convert slice into a shallow key """  
        raynge  = range(slyce.start, slyce.stop, slyce.step)
        new_key = type(self).ShallowKey(raynge)
        # call `self._getitem` for shallow keys 
        return self._getitem(new_key)

    @_getitem.register
    def _getitem_for_shallow_keys(self, key:ShallowKey):
        """
            let `leaf` be the leftmost leaf out of shallow key
            let `leftovers` be the shallow key after leaf is removed
            return `self[leaf][leftovers]`
        """ 
        raise NotImplementedError()

    @_getitem.register
    def _getitem_for_deep_keys(self, key:DeepKey):
        """
            (1) flatten the deep key into a shallow key

            (2) dispatch the shallow key to an
                implementation of `_getitem` which 
                processes shallow keys

            deep_key    = [1, [3, [5],['hi']], [6, [[[7, 'hello']]]]] 
            shallow_key = [1,  3,  5,  'hi'  ,  6,    7, 'hello'    ] 
        """ 
        raise NotImplementedError()

    @_getitem.register
    def _getitem_for_leaves(self, key:LeafKey)
        return self._dct[key.unwrap()]

    def __getitem__(self, *args):
        key = type(self).key_factory(args)
        return self._getitem(key)
         

Desired Behavior of __getitem__ for slices

In an answer, to this question I think that we can convert slices into shallow iterables. We can then dispatch the shallow iterable to the implementation of __getitem__ which accepts shallow iterables.

Feel free to ignore my recommendation as long as your answer gets the job done. I am just sharing my partial solution to the problem in order to make it easier to answer.

cookie_jar[3:20:8] becomes cookie_jar[range(3, 20, 8)]

3:20:8 is not iterable, but range(3, 20, 8) is an iterable.

What is a leaf?

A leaf is any object knode such that knode is not an instance of the slice class and the following function returns True

def is_leaf(cls, knode):
    if hasattr(knode, "__iter__"):
        return str(knode) == "".join(str(elem) for elem in knode)
    else: # not iterable
        return True

I really wish slices had an iter method, but slices are not iterable.

# I sometimes wish that the following were all equivalent:   
(3, 11, 19)
tuple(3:20:8)
tuple(iter(slice(3, 20, 8)))
tuple([3, 11, 19])
tuple(range(3, 20, 8))
tuple(iter(3:20:8)) 

The following are some examples of leaves and non-leaves:

IS A LEAF IS NOT A LEAF
float 3.1459 slice 3:20:8
string "hello" list ["hello", "world"]
integer 8731 iterator range(0, 10)

What is the difference between a shallow iterable and a deep iterable?

Suppose that it is an iterator to a container like the list [0, 1, 2, 3, 4]

Specifically it represent a container which:

  • contains at least one element.

  • the container is a shallow container (I provide a definition of the word "*shallow" means later in this question)

  • An example of shallow container is the list [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

  • Example of a deep container might be the list [1, 2, [3, 4, [5],['hi']], [6, [[[7, 'hello']]]]]. deep containers are NOT shallow

"shallow" containers are also sometimes called "flat" containers.
"deep" containers are sometimes referred to as "nested" containers.

More formally, a container bag is shallow if the following function returns False for bag but returns True for all the element in bag:

def is_leaf(knode):
    if hasattr(knode, "__iter__"):
        return str(knode) == "".join(str(elem) for elem in knode)
    else: # not iterable
        return True

When bag is a shallow container we should have the following assert statements not raise any exceptions:

# bag is shallow
assert(not is_leaf(bag)) 
assert(all(is_leaf(elem) for elem in iter(bag)))

What is the desired behavior for shallow iterables?

The following two pieces of code (LEFT CODE and RIGHT CODE) will do the same thing when calling __getitem__

 ---------------------- -------------------------- 
|      LEFT CODE       |        RIGHT CODE        |
 ---------------------- -------------------------- 
| cjar   = CookieJar() | cjar   = CookieJar()     |
| result = cjar[it]    | leaf = next(it)          |
|                      | result = cjar.[leaf][it] |
 ---------------------- -------------------------- 

Desired Behavior for Deep Iterables

If the input to CookieJar.__getitem__() is a deep container, then our __getitem__ method should convert the deep container into a shallow container.

deep_key    = [1, [3, [5],[991]], [6, [[[7, 'hello']]]]] 
shallow_key = [1,  3,  5,  991  ,  6,    7, 'hello'    ] 

After converting a deep container into a shallow container, we should have the following two things should be the same:

element1 = cjar[deep_container]
element2 = cjar[leaf 1][leaf 2][leaf 3]...[leaf n-1][leaf n]
id(element1) == id(element2)  

I have kind of sort-of partially solved the problem.

I know how to convert deep containers into shallow containers.

One of the tools provided in the following code block might help us flatten the nested-containers:

import itertools as itools

class Base:
    @classmethod
    def is_leaf(cls, knode):
        if hasattr(knode, "__iter__"):
            return str(knode) == "".join(str(elem) for elem in knode)
        else:  # not iterable
            return True

class RecursiveFlatten(Base):
    @classmethod
    def get_flat_iter(cls, *parent):
        for kid in parent:
            if not cls.is_leaf(kid):
                for grand_kid in cls.get_flat_iter(kid):
                    yield grand_kid
        else: # kid is a leaf
            yield kid

class IterGator(Base):
    def __init__(self, it):
        self._it = iter(it)

    def __iter__(self):
        return self

    @classmethod
    def nexter(cls, old_it):
        next_it = cls.get_almost_flat_iter(old_it)
        leaf = next(next_it)
        return leaf, next_it

    def __next__(self):
        value, self.it = type(self).nexter(self._it)
        return value

    @classmethod
    def get_almost_flat_iter(cls, it):
        # * input an old iterator
        #
        # * output a new iterator
        #
        # * the first call next(new_it) is
        #   guaranteed to be a leaf
        #
        # * successive calls to `next`
        #   are not guaranteed to return a leaf
        while True:
            try:
                knode = next(it)
            except StopIteration:
                return iter([])
            if cls.is_leaf(knode):
                return itools.chain(iter((knode,)), it)
            it = itools.chain(iter(knode), it)

    @classmethod
    def get_flat_iter(cls, it):
        it = iter(it)
        while True:
            it = cls.get_almost_flat_iter(it)
            try:
                leaf = next(it)
            except StopIteration:
                break
            yield leaf

    def __next__(self):
        leaf, self._it = type(self).nexter(self._it)
        return leaf


class IterShmator(Base):
    def __init__(self, *args):
        self._it = type(self).flat_iter(iter(args))

    def __iter__(self):
        return self

    @classmethod
    def flat_iter(cls, it):
        while True:
            try:
                knode = next(it)
            except StopIteration:
                return
            if cls.is_leaf(knode):
                yield knode
            else:
                it = itools.chain(iter(knode), it)

    def __next__(self):
        return next(self._it)


lyst = [1, 2, [3, 4, [5],['hi']], [6, [[[7, 'hello']]]]]

it = IterGator(lyst)
# it = IterGator.get_flat_iter(lyst)
# it = IterShmator(lyst)
# it = RecursiveFlatten.get_flat_iter(lyst)

print(", ".join(repr(ch) for ch in it))
# prints: 1, 2, 3, 4, 5, 'hi', 6, 7, 'hello'

Output of __getitem__

  • We know how to convert deep keys into shallow keys
  • We know how to convert shallow keys into leaf keys

However, What should __getitem__ return after all of the keys have been converted into leaves?

If cookjar = CookieJar() then we must have:

id(cookjar[87]) != id(cookjar)  

The number 87 in unimportant.
obj.__getitem__() must not return obj

The following is unacceptable:

class CookieJar:
    # BAD
    def __getitem__(*ignore_all_keys):
        return self
    # VERY BAD  

cj = CookieJar()
# all of the following are the same: 
cj[1][2][3]
cj[1, 2, 3]
cj[1, 2][3]
cj[1, 2][3]
cj.__getitem__(1).__getitem__(2).__getitem__(3)
cj.__getitem__(1, 2).__getitem__(3)
cj.__getitem__(1)

We could have instances of the CookieJar class act like nodes in an "information retrieval tree" data-structure (also known as atrie).

__getitem__ should return a different tree node than the node we started with.

We could use a passage of text from a United States constitution as test data:

s = """Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the government for a redress of grievances.
"""
s = s.split()

... we can view the test data as a tree...

# tree 
"Congress shall make no law".split()
├── "respecting an establishment of religion.".split()
├── "prohibiting the free exercise of religion.".split()
├── "abridging the".split()
│   ├── "freedom of".split() 
│   │   ├── "speech.".split()
│   │   └── "the press.".split()
│   └── "right of the people to".split()
│       ├── "assemble peaceably.".split()
│       └── "petition the government for a redress of grievances.".split()

... or we can view out test data as a set of sentences ...

sents = """
Congress shall make no law respecting an establishment of religion.
Congress shall make no law prohibiting the free exercise of religion.
Congress shall make no law abridging the freedom of speech.
Congress shall make no law abridging the freedom of the press.
Congress shall make no law abridging the right of the people to assemble peaceably.    
Congress shall make no law abridging the right of the people to petition the government for a redress of grievances.
"""
congressional_sentences = sents.split("\n")

The question is:

** How can we define __getitem__ so that all of the following are the same: **

knode1 = root["Congress"]["shall"]["make"]["no"]["law"]
knode2 = root["Congress", "shall", "make"]["no"]["law"]
knode3 = root["Congress"][("shall", ("make", "no", "law"))]
knode4 = root["Congress"]["shall"]["make", "no", "law"]
knode4 = root[("Congress", "shall", "make", "no", "law")]
knode4 = root[(((("Congress", "shall"), "make"), "no"), "law")]
knode4 = root[((("Congress", "shall", "make", "no", "law"),),)]

I am hoping that id(knode1) == id(knode2) I am also hoping that id(knod2) == id(knode3)

For mathematical applications... if you had a 3-dimentional table of numbers...

element = table[2][7][3]
element = table[2, 7, 3]
element = table[(2, 7, 3)]

CodePudding user response:

Basic idea

Instead of making a separate container type, make a view for containers. The semantics are:

  • A view instance tracks some iterable (which might be an element of some other iterable). For simplicity, we won't bother checking whether it's a proper container type or lazily evaluated.

  • When the view is indexed with a value of a non-iterable type, it indexes into the container with that value.

  • When the view is indexed with a value of an iterable type, it repeats the indexing for each element in that value.

  • If the result of the indexing is iterable, the result is a view around that iterable. Otherwise, the result is the value itself.

It can be implemented quite simply:

class View:
    def __init__(self, data):
        self._data = data

    def __getitem__(self, indices):
        result = self._data
        # We can't easily distinguish a `TypeError` due to `indices`
        # being a non-iterable, from a `TypeError` due to reaching a 
        # leaf in the data prematurely. So we explicitly check first.
        try:
            iter(indices)
        except TypeError:
            result = result[indices]
        else:
            for i in indices:
                result = result[i]
        # Now decide whether to wrap the result
        try:
            iter(result)
        except TypeError:
            return result
        else:
            return View(result)

As a refactoring, we could use __new__ rather than __init__ so that the argument is returned unchanged if it isn't iterable. That prevents explicitly creating bad Views, and can also simplify the __getitem__ logic:

class View:
    def __new__(cls, data):
        try:
            iter(data)
            result = object.__new__(cls)
            result._data = data
        except TypeError:
            result = data
        return result

    def __getitem__(self, indices):
        result = self._data
        try:
            iter(indices)
        except TypeError:
            result = result[indices]
        else:
            for i in indices:
                result = result[i]
        return View(result)

Special cases

There are two problems with this result compared to the specification:

  1. slice objects are not actually iterable. We want to interpret myview[3:20:8] as if it were actually being indexed with the values described by that range, in sequence. Fortunately, it is trivial to convert a slice into the corresponding range object with the same start, stop and step.

    However, we need to complain if the start or stop are unspecified, since otherwise the semantics don't make any sense; and we have to keep in mind that ranges don't accept None as a step value (slices treat it as equivalent to 1). Finally, we have to accept that negative values will not index from the end, since again it will be far too difficult to interpret what should happen for all the corner cases.

  2. Strings (and possibly other types) are iterable, and the elements are themselves non-empty strings - thus they can be indexed into arbitrarily many times. We need to special-case these in order for them to work as leaf nodes.

We need helper logic to treat strings as if they were not iterable. It should apply to construction, too (since otherwise we could make a totally useless View instance from a string). We don't want that logic to handle slices, because View(slice(0)) should give us the original slice back, not a range.

With some refactoring, we get:

def _make_range(a_slice):
    start, stop, step = a_slice.start, a_slice.stop, a_slice.step
    if start is None or stop is None:
        raise ValueError('start and stop must be defined to convert to range')
    return range(start, stop, 1 if step is None else step)

def _non_string_iterable(obj):
    try:
        iter(data)
        return not isinstance(obj, str)
    except TypeError:
        return False

class View:
    def __new__(cls, data):
        if _non_string_iterable(data):
            result = object.__new__(cls)
            result._data = data
            return result
        return data

    def __getitem__(self, indices):
        result = self._data
        if isinstance(indices, slice):
            indices = _make_range(indices)
        if _non_string_iterable(indices):
            for i in indices:
                result = result[i]
        else:
            result = result[indices]
        return View(result)

CodePudding user response:

Combining collapse() and a Python version of dig(), with special slice handling, reproduces your input table of examples:

from more_itertools import collapse  # or implement this yourself
from unittest.mock import MagicMock


def dig(collection, *keys):
    """Dig into nested subscriptable objects, e.g. dict and list, i.e JSON."""
    curr = collection
    for k in keys:
        if curr is None:
            break

        if not hasattr(curr, '__getitem__') or isinstance(curr, str):
            raise TypeError(f'cannot dig into {type(curr)}')

        try:
            curr = curr[k]
        except (KeyError, IndexError):
            curr = None

    return curr


def what_you_wanted(collection, *keys):  # If I understood you correctly
    slic = keys[0] if len(keys) == 1 and isinstance(keys[0], slice) else None
    dig_keys = range(slic.stop)[slic] if slic else collapse(keys)
    return dig(collection, *dig_keys)


def test_getitem_with(*keys):
    mock = MagicMock()
    mock.__getitem__.returns = mock
    what_you_wanted(mock, *keys)
    print(mock.mock_calls)


test_getitem_with((1, 2, 3))
test_getitem_with(('x', 'y'))
test_getitem_with((99,))
test_getitem_with([[[99]]])
test_getitem_with((1, 2), 3)
test_getitem_with(([1, [2]], [3]))
test_getitem_with(1, 2, 3)
test_getitem_with(slice(3, 20, 8))
test_getitem_with(range(3, 20, 8))

Prints:

[call.__getitem__(1),
 call.__getitem__().__getitem__(2),
 call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__('x'), call.__getitem__().__getitem__('y')]
[call.__getitem__(99)]
[call.__getitem__(99)]
[call.__getitem__(1),
 call.__getitem__().__getitem__(2),
 call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(1),
 call.__getitem__().__getitem__(2),
 call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(1),
 call.__getitem__().__getitem__(2),
 call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(3),
 call.__getitem__().__getitem__(11),
 call.__getitem__().__getitem__().__getitem__(19)]
[call.__getitem__(3),
 call.__getitem__().__getitem__(11),
 call.__getitem__().__getitem__().__getitem__(19)]

For completion, could define a collection object (or View) that implements __getitem__() using what_you_wanted().

  • Related