Can we write class named CookieJar
having a __getitem__
method defined so no longer matters if:
- many indicies are shoved inside of a single call to
__getitem__
e.g.myArray[x, y, z]
- if
__getitem__
is called many different times for each index?myArray[x][y][z]
How can we write __getitem__
to accept both iterables (such as (5, 2)
) and non-iterables (5
and 2
separately).
In the following table, I want the code in the column at left to exhibit the same outward behavior as the code in the column at right.
How can we accomplish this?
---------------------------- -------------------------------
| cookie_jar[1][2][3] | cookie_jar[(1, 2, 3)] |
---------------------------- -------------------------------
| cookie_jar[x][y] | cookie_jar[(x, y)] |
---------------------------- -------------------------------
| cookie_jar[99] | cookie_jar[(99,)] |
---------------------------- -------------------------------
| cookie_jar[99] | cookie_jar[[[[99]]] |
---------------------------- -------------------------------
| cookie_jar[1][2][3] | cookie_jar[1, 2][3] |
---------------------------- -------------------------------
| cookie_jar[1][2][3] | cookie_jar[[1, [2]], [3]] |
---------------------------- -------------------------------
| cookie_jar[1][2][3] | cookie_jar[1, 2, 3] |
---------------------------- -------------------------------
| cookie_jar[3][11][19] | cookie_jar[3:20:8] |
---------------------------- -------------------------------
| cookie_jar[3][11][19] | cookie_jar[range(3, 20, 8)] |
---------------------------- -------------------------------
In an answer to this question you will write a __getitem__
method which accepts three types of input:
Name of Input Type | Example(s) of Input Type |
---|---|
slice objects | 10:40:3 or slice(10, 40, 3) |
leaf nodes | 912 , 1.11 , "boa" , "anaconda" , "water moccasin" , "viper" |
shallow iterables | [1, 2, 3, "apple", "orange", "cherry", "kiwi"] |
deep iterables | [1, 2, [3, 4, [5],['hi']], [6, [[[7, 'hello']]]]] |
Maybe we have something like this:
import functools
class CookieJar:
def __init__(self, dct:dict):
self._dct = dct
# self._dct
class CookieKey:
def __init__(self, x):
self._x = x
def unwrap(self):
return self._x
class ShallowKey(CookieKey):
# `self._x`
# is something like
# `[1, 2, 3, 4]`
class DeepKey(CookieKey):
# `self._x`
# is something like
# `[1, [3, [5],['hi']], [6, [[[7, 'hello']]]]]`
class LeafKey(CookieKey):
"""
`unwrap()` returns something like `12`
"""
# `self._x` is something like `12`
@classmethod
def key_factory(key:object):
# factory?
# should return a `CookieKey`
# returns one of:
# `LeafKey`
# `ShallowKey`
# `DeepKey`
raise NotImplementedError()
@functools.singledispatchmethod
def _getitem(self, key:CookieKey):
raise NotImplementedError()
@_getitem.register
def _getitem_for_slices(self, slyce:type(0:1)):
""" Convert slice into a shallow key """
raynge = range(slyce.start, slyce.stop, slyce.step)
new_key = type(self).ShallowKey(raynge)
# call `self._getitem` for shallow keys
return self._getitem(new_key)
@_getitem.register
def _getitem_for_shallow_keys(self, key:ShallowKey):
"""
let `leaf` be the leftmost leaf out of shallow key
let `leftovers` be the shallow key after leaf is removed
return `self[leaf][leftovers]`
"""
raise NotImplementedError()
@_getitem.register
def _getitem_for_deep_keys(self, key:DeepKey):
"""
(1) flatten the deep key into a shallow key
(2) dispatch the shallow key to an
implementation of `_getitem` which
processes shallow keys
deep_key = [1, [3, [5],['hi']], [6, [[[7, 'hello']]]]]
shallow_key = [1, 3, 5, 'hi' , 6, 7, 'hello' ]
"""
raise NotImplementedError()
@_getitem.register
def _getitem_for_leaves(self, key:LeafKey)
return self._dct[key.unwrap()]
def __getitem__(self, *args):
key = type(self).key_factory(args)
return self._getitem(key)
Desired Behavior of __getitem__
for slices
In an answer, to this question I think that we can convert slices into shallow iterables. We can then dispatch the shallow iterable to the implementation of __getitem__
which accepts shallow iterables.
Feel free to ignore my recommendation as long as your answer gets the job done. I am just sharing my partial solution to the problem in order to make it easier to answer.
cookie_jar[3:20:8]
becomes cookie_jar[range(3, 20, 8)]
3:20:8
is not iterable, but range(3, 20, 8)
is an iterable.
What is a leaf?
A leaf is any object knode
such that knode
is not an instance of the slice
class and the following function returns True
def is_leaf(cls, knode):
if hasattr(knode, "__iter__"):
return str(knode) == "".join(str(elem) for elem in knode)
else: # not iterable
return True
I really wish slices had an iter
method, but slices are not iterable.
# I sometimes wish that the following were all equivalent:
(3, 11, 19)
tuple(3:20:8)
tuple(iter(slice(3, 20, 8)))
tuple([3, 11, 19])
tuple(range(3, 20, 8))
tuple(iter(3:20:8))
The following are some examples of leaves and non-leaves:
IS A LEAF | IS NOT A LEAF |
---|---|
float 3.1459 | slice 3:20:8 |
string "hello" |
list ["hello", "world"] |
integer 8731 |
iterator range(0, 10) |
What is the difference between a shallow iterable and a deep iterable?
Suppose that it
is an iterator to a container like the list [0, 1, 2, 3, 4]
Specifically it
represent a container which:
contains at least one element.
the container is a shallow container (I provide a definition of the word "*shallow" means later in this question)
An example of shallow container is the list
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Example of a deep container might be the list
[1, 2, [3, 4, [5],['hi']], [6, [[[7, 'hello']]]]]
. deep containers are NOT shallow
"shallow" containers are also sometimes called "flat" containers.
"deep" containers are sometimes referred to as "nested" containers.
More formally, a container bag
is shallow if the following function returns False
for bag
but returns True
for all the element in bag
:
def is_leaf(knode):
if hasattr(knode, "__iter__"):
return str(knode) == "".join(str(elem) for elem in knode)
else: # not iterable
return True
When bag
is a shallow container we should have the following assert statements not raise any exceptions:
# bag is shallow
assert(not is_leaf(bag))
assert(all(is_leaf(elem) for elem in iter(bag)))
What is the desired behavior for shallow iterables?
The following two pieces of code (LEFT CODE
and RIGHT CODE
) will do the same thing when calling __getitem__
---------------------- --------------------------
| LEFT CODE | RIGHT CODE |
---------------------- --------------------------
| cjar = CookieJar() | cjar = CookieJar() |
| result = cjar[it] | leaf = next(it) |
| | result = cjar.[leaf][it] |
---------------------- --------------------------
Desired Behavior for Deep Iterables
If the input to CookieJar.__getitem__()
is a deep container, then our __getitem__
method should convert the deep container into a shallow container.
deep_key = [1, [3, [5],[991]], [6, [[[7, 'hello']]]]]
shallow_key = [1, 3, 5, 991 , 6, 7, 'hello' ]
After converting a deep container into a shallow container, we should have the following two things should be the same:
element1 = cjar[deep_container]
element2 = cjar[leaf 1][leaf 2][leaf 3]...[leaf n-1][leaf n]
id(element1) == id(element2)
I have kind of sort-of partially solved the problem.
I know how to convert deep containers into shallow containers.
One of the tools provided in the following code block might help us flatten the nested-containers:
import itertools as itools
class Base:
@classmethod
def is_leaf(cls, knode):
if hasattr(knode, "__iter__"):
return str(knode) == "".join(str(elem) for elem in knode)
else: # not iterable
return True
class RecursiveFlatten(Base):
@classmethod
def get_flat_iter(cls, *parent):
for kid in parent:
if not cls.is_leaf(kid):
for grand_kid in cls.get_flat_iter(kid):
yield grand_kid
else: # kid is a leaf
yield kid
class IterGator(Base):
def __init__(self, it):
self._it = iter(it)
def __iter__(self):
return self
@classmethod
def nexter(cls, old_it):
next_it = cls.get_almost_flat_iter(old_it)
leaf = next(next_it)
return leaf, next_it
def __next__(self):
value, self.it = type(self).nexter(self._it)
return value
@classmethod
def get_almost_flat_iter(cls, it):
# * input an old iterator
#
# * output a new iterator
#
# * the first call next(new_it) is
# guaranteed to be a leaf
#
# * successive calls to `next`
# are not guaranteed to return a leaf
while True:
try:
knode = next(it)
except StopIteration:
return iter([])
if cls.is_leaf(knode):
return itools.chain(iter((knode,)), it)
it = itools.chain(iter(knode), it)
@classmethod
def get_flat_iter(cls, it):
it = iter(it)
while True:
it = cls.get_almost_flat_iter(it)
try:
leaf = next(it)
except StopIteration:
break
yield leaf
def __next__(self):
leaf, self._it = type(self).nexter(self._it)
return leaf
class IterShmator(Base):
def __init__(self, *args):
self._it = type(self).flat_iter(iter(args))
def __iter__(self):
return self
@classmethod
def flat_iter(cls, it):
while True:
try:
knode = next(it)
except StopIteration:
return
if cls.is_leaf(knode):
yield knode
else:
it = itools.chain(iter(knode), it)
def __next__(self):
return next(self._it)
lyst = [1, 2, [3, 4, [5],['hi']], [6, [[[7, 'hello']]]]]
it = IterGator(lyst)
# it = IterGator.get_flat_iter(lyst)
# it = IterShmator(lyst)
# it = RecursiveFlatten.get_flat_iter(lyst)
print(", ".join(repr(ch) for ch in it))
# prints: 1, 2, 3, 4, 5, 'hi', 6, 7, 'hello'
Output of __getitem__
- We know how to convert deep keys into shallow keys
- We know how to convert shallow keys into leaf keys
However, What should __getitem__
return after all of the keys have been converted into leaves?
If cookjar = CookieJar()
then we must have:
id(cookjar[87]) != id(cookjar)
The number 87
in unimportant.
obj.__getitem__()
must not return obj
The following is unacceptable:
class CookieJar:
# BAD
def __getitem__(*ignore_all_keys):
return self
# VERY BAD
cj = CookieJar()
# all of the following are the same:
cj[1][2][3]
cj[1, 2, 3]
cj[1, 2][3]
cj[1, 2][3]
cj.__getitem__(1).__getitem__(2).__getitem__(3)
cj.__getitem__(1, 2).__getitem__(3)
cj.__getitem__(1)
We could have instances of the CookieJar
class act like nodes in an "information retrieval tree" data-structure (also known as atrie).
__getitem__
should return a different tree node than the node we started with.
We could use a passage of text from a United States constitution as test data:
s = """Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the government for a redress of grievances.
"""
s = s.split()
... we can view the test data as a tree...
# tree
"Congress shall make no law".split()
├── "respecting an establishment of religion.".split()
├── "prohibiting the free exercise of religion.".split()
├── "abridging the".split()
│ ├── "freedom of".split()
│ │ ├── "speech.".split()
│ │ └── "the press.".split()
│ └── "right of the people to".split()
│ ├── "assemble peaceably.".split()
│ └── "petition the government for a redress of grievances.".split()
... or we can view out test data as a set of sentences ...
sents = """
Congress shall make no law respecting an establishment of religion.
Congress shall make no law prohibiting the free exercise of religion.
Congress shall make no law abridging the freedom of speech.
Congress shall make no law abridging the freedom of the press.
Congress shall make no law abridging the right of the people to assemble peaceably.
Congress shall make no law abridging the right of the people to petition the government for a redress of grievances.
"""
congressional_sentences = sents.split("\n")
The question is:
** How can we define
__getitem__
so that all of the following are the same: **knode1 = root["Congress"]["shall"]["make"]["no"]["law"] knode2 = root["Congress", "shall", "make"]["no"]["law"] knode3 = root["Congress"][("shall", ("make", "no", "law"))] knode4 = root["Congress"]["shall"]["make", "no", "law"] knode4 = root[("Congress", "shall", "make", "no", "law")] knode4 = root[(((("Congress", "shall"), "make"), "no"), "law")] knode4 = root[((("Congress", "shall", "make", "no", "law"),),)]
I am hoping that
id(knode1) == id(knode2)
I am also hoping thatid(knod2) == id(knode3)
For mathematical applications... if you had a 3-dimentional table of numbers...
element = table[2][7][3]
element = table[2, 7, 3]
element = table[(2, 7, 3)]
CodePudding user response:
Basic idea
Instead of making a separate container type, make a view for containers. The semantics are:
A view instance tracks some iterable (which might be an element of some other iterable). For simplicity, we won't bother checking whether it's a proper container type or lazily evaluated.
When the view is indexed with a value of a non-iterable type, it indexes into the container with that value.
When the view is indexed with a value of an iterable type, it repeats the indexing for each element in that value.
If the result of the indexing is iterable, the result is a view around that iterable. Otherwise, the result is the value itself.
It can be implemented quite simply:
class View:
def __init__(self, data):
self._data = data
def __getitem__(self, indices):
result = self._data
# We can't easily distinguish a `TypeError` due to `indices`
# being a non-iterable, from a `TypeError` due to reaching a
# leaf in the data prematurely. So we explicitly check first.
try:
iter(indices)
except TypeError:
result = result[indices]
else:
for i in indices:
result = result[i]
# Now decide whether to wrap the result
try:
iter(result)
except TypeError:
return result
else:
return View(result)
As a refactoring, we could use __new__
rather than __init__
so that the argument is returned unchanged if it isn't iterable. That prevents explicitly creating bad Views, and can also simplify the __getitem__
logic:
class View:
def __new__(cls, data):
try:
iter(data)
result = object.__new__(cls)
result._data = data
except TypeError:
result = data
return result
def __getitem__(self, indices):
result = self._data
try:
iter(indices)
except TypeError:
result = result[indices]
else:
for i in indices:
result = result[i]
return View(result)
Special cases
There are two problems with this result compared to the specification:
slice
objects are not actually iterable. We want to interpretmyview[3:20:8]
as if it were actually being indexed with the values described by that range, in sequence. Fortunately, it is trivial to convert aslice
into the correspondingrange
object with the samestart
,stop
andstep
.However, we need to complain if the
start
orstop
are unspecified, since otherwise the semantics don't make any sense; and we have to keep in mind that ranges don't acceptNone
as a step value (slice
s treat it as equivalent to1
). Finally, we have to accept that negative values will not index from the end, since again it will be far too difficult to interpret what should happen for all the corner cases.Strings (and possibly other types) are iterable, and the elements are themselves non-empty strings - thus they can be indexed into arbitrarily many times. We need to special-case these in order for them to work as leaf nodes.
We need helper logic to treat strings as if they were not iterable. It should apply to construction, too (since otherwise we could make a totally useless View
instance from a string). We don't want that logic to handle slices, because View(slice(0))
should give us the original slice
back, not a range
.
With some refactoring, we get:
def _make_range(a_slice):
start, stop, step = a_slice.start, a_slice.stop, a_slice.step
if start is None or stop is None:
raise ValueError('start and stop must be defined to convert to range')
return range(start, stop, 1 if step is None else step)
def _non_string_iterable(obj):
try:
iter(data)
return not isinstance(obj, str)
except TypeError:
return False
class View:
def __new__(cls, data):
if _non_string_iterable(data):
result = object.__new__(cls)
result._data = data
return result
return data
def __getitem__(self, indices):
result = self._data
if isinstance(indices, slice):
indices = _make_range(indices)
if _non_string_iterable(indices):
for i in indices:
result = result[i]
else:
result = result[indices]
return View(result)
CodePudding user response:
Combining collapse()
and a Python version of dig()
, with special slice
handling, reproduces your input table of examples:
from more_itertools import collapse # or implement this yourself
from unittest.mock import MagicMock
def dig(collection, *keys):
"""Dig into nested subscriptable objects, e.g. dict and list, i.e JSON."""
curr = collection
for k in keys:
if curr is None:
break
if not hasattr(curr, '__getitem__') or isinstance(curr, str):
raise TypeError(f'cannot dig into {type(curr)}')
try:
curr = curr[k]
except (KeyError, IndexError):
curr = None
return curr
def what_you_wanted(collection, *keys): # If I understood you correctly
slic = keys[0] if len(keys) == 1 and isinstance(keys[0], slice) else None
dig_keys = range(slic.stop)[slic] if slic else collapse(keys)
return dig(collection, *dig_keys)
def test_getitem_with(*keys):
mock = MagicMock()
mock.__getitem__.returns = mock
what_you_wanted(mock, *keys)
print(mock.mock_calls)
test_getitem_with((1, 2, 3))
test_getitem_with(('x', 'y'))
test_getitem_with((99,))
test_getitem_with([[[99]]])
test_getitem_with((1, 2), 3)
test_getitem_with(([1, [2]], [3]))
test_getitem_with(1, 2, 3)
test_getitem_with(slice(3, 20, 8))
test_getitem_with(range(3, 20, 8))
Prints:
[call.__getitem__(1),
call.__getitem__().__getitem__(2),
call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__('x'), call.__getitem__().__getitem__('y')]
[call.__getitem__(99)]
[call.__getitem__(99)]
[call.__getitem__(1),
call.__getitem__().__getitem__(2),
call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(1),
call.__getitem__().__getitem__(2),
call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(1),
call.__getitem__().__getitem__(2),
call.__getitem__().__getitem__().__getitem__(3)]
[call.__getitem__(3),
call.__getitem__().__getitem__(11),
call.__getitem__().__getitem__().__getitem__(19)]
[call.__getitem__(3),
call.__getitem__().__getitem__(11),
call.__getitem__().__getitem__().__getitem__(19)]
For completion, could define a collection object (or View) that implements __getitem__()
using what_you_wanted()
.