TypeError using pprint on Counter() objects that have been updated (bit of an edge case)-CodePudding

Under some circumstances, Python pretty print (pprint.pprint) produces a TypeError, and it caught me a bit by surprise.

We can create a Counter object from (eg) a list of integers and pretty print it:

from collections import Counter
from pprint import pprint

intlist = [1,2,3,4,5,6,5,2,5,9,4,7,2,1,4,6,8,54,6,2,45,6,8,4,21,23,6,7,3,35561,1,6,8,]
intcounter = Counter(intlist)
pprint(intcounter)

Counter({6: 6, 2: 4, 4: 4, 1: 3, 5: 3, 8: 3, 3: 2, 7: 2, 9: 1, 54: 1, 45: 1, 21: 1, 23: 1, 35561: 1})

We can add a key to it without converting it to a "native" dictionary too (because Counters are a subclass of dict)

from collections import Counter
from pprint import pprint

intlist = [1,2,3,4,5,6,5,2,5,9,4,7,2,1,4,6,8,54,6,2,45,6,8,4,21,23,6,7,3,35561,1,6,8,]
intcounter = Counter(intlist)
intcounter["Hello"] = "World"
# and you can print that too
print(intcounter)

Counter({1: 3, 2: 4, 3: 2, 4: 4, 5: 3, 6: 6, 9: 1, 7: 2, 8: 3, 54: 1, 45: 1, 21: 1, 23: 1, 35561: 1, 'Hello': 'World'})

but can we then prettyprint the updated object?

try:
    pprint(intcounter)
except Exception as t:
    print(t)

Nope.

Counter({'<' not supported between instances of 'int' and 'str'

Ok how about we turn pprint's default sorting behaviour off?

try:
    pprint(intcounter, sort_dicts=False)
except TypeError as t:
    print(t)

also nope:

Counter({'<' not supported between instances of 'int' and 'str'

Note also that we can't use update on a Counter() object if a value in the updating dict is type str (even though, as above, we can add the key:value "directly")

try:
    intcounter.update({"Hello": "World"})
except TypeError as t:
    print(t)

can only concatenate str (not "int") to str

I think (but I'm just hamfisted amateur coder so I'm not sure) that the Python docs for Counter() might cover why we can't use the update method :

Note Counters were primarily designed to work with positive integers to represent running counts; however, care was taken to not unnecessarily preclude use cases needing other types or negative values. To help with those use cases, this section documents the minimum range and type restrictions. The Counter class itself is a dictionary subclass with no restrictions on its keys and values. The values are intended to be numbers representing counts, but you could store anything in the value field.

The most_common() method requires only that the values be orderable.

For in-place operations such as c[key] = 1, the value type need only support addition and subtraction. So fractions, floats, and decimals would work and negative values are supported. The same is also true for update() and subtract() which allow negative and zero values for both inputs and outputs.

The multiset methods are designed only for use cases with positive values. The inputs may be negative or zero, but only outputs with positive values are created. There are no type restrictions, but the value type needs to support addition, subtraction, and comparison.

The elements() method requires integer counts. It ignores zero and negative counts.

Obviously if we force the Counter object to a "native" dictionary (dict(intcounter)) everything will work as expected, but I wondered if pprint should handle this a bit more elegantly, even though I realise this is quite edge-casey and very few people will trip over this in the same way I did.

(I was passing a Counter() to a bokeh charting function & it seemed convenient to pass some extra k:v pairs that the function used by simply updating the Counter() object, pprint was just used to visually check my work)

Python 3.8 btw.

CodePudding user response：

pprint is not at blame here. When you perform your call:

pprint(intcounter)

This will actually call __repr__ from Counter Which is the one calling most_common

def __repr__(self):
    if not self:
        return f'{self.__class__.__name__}()'
    try:
        # dict() preserves the ordering returned by most_common()
        d = dict(self.most_common())
    except TypeError:
        # handle case where values are not orderable
        d = dict(self)
    return f'{self.__class__.__name__}({d!r})'

Note that when you add your key/value, either by assignment ([key] = value) or using update they are not validated.

The class Counter assume that you pass the value as type int but does no such validation for it.

When you using update, the code won't validate it either but will crash at line:

self[elem] = count self_get(elem, 0)

Since count is the value you passed of type str and it cannot be concatenate with 0.

As opposed to using assignment, where the line is basically:

self[key] = value

The update method will concatenate the previous value with the new value. So basically if the value was 5 and you add 1, the result would be 6. In the case you assigned a str value it will raise an unhandled exception.

Now again this will pass using assigment, but once any methods must do computation it will eventually crash.

Always ensure your value when using counter is of type int