functools.reduce in Python not working as expected-CodePudding

I would like to sum across keys of dictionaries nested within a list using the functools.reduce function

I can accomplish this WITHOUT the functools.reduce function with the following simple program:

dict1 = {'a': '1', 'b': '2'}
dict2 = {'a': '5', 'b': '0'}
dict3 = {'a': '7', 'b': '3'}

data_list = [dict1, dict2, dict3]

total_a = 0
total_b = 0
for record in data_list:
    total_a  = eval(record['a'])
    total_b  = eval(record['b'])

print(total_a)
print(total_b)

As I said however, I would like to produce the same results using the functools.reduce method instead.

Here is my attempt at using functools.reduce with a lambda expression:

from functools import reduce

dict1 = {'a': '1', 'b': '2'}
dict2 = {'a': '5', 'b': '0'}
dict3 = {'a': '7', 'b': '3'}

data_list = [dict1, dict2, dict3]

total_a = reduce(lambda x, y: int(x['a'])   int(y['a']),data_list)
total_b = reduce(lambda x, y: int(x['b'])   int(y['b']),data_list )

print(total_a)
print(total_b)

Unfortunately, I get the following error and do not know why:

TypeError: 'int' object is not subscriptable

Does someone know why I am getting this error?

CodePudding user response：

TypeError: 'int' object is not subscriptable

Does someone know why I am getting this error?

First, let's reduce (pun intended) the sample to a minimum:

>>> from functools import reduce
>>> data = [{"a": 1}, {"a": 2}, {"a": 3}]
>>> reduce(lambda x, y: x["a"]   y["a"], data)   
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
TypeError: 'int' object is not subscriptable

Same error. But observe this:

>>> reduce(lambda x, y: x["a"]   y["a"], data[:2])   
3

That's working. So what's going on? For simplification, let's assign the lambda expression to a variable:

f = lambda x, y: x['a']   y['a']

Reduce combines the input like this:

# Example: reduce(lambda x, y: x["a"]   y["a"], data[:2])   
>>> f(data[0], data[1])

# which evaluates in steps like this:

>>> data[0]["a"]   data[1]["a"]
>>> 1   2
>>> 3

But what's happening when reducing the full list? This evaluates to

# Example: reduce(lambda x, y: x["a"]   y["a"], data)   
>>> f(f(data[0], data[1]), data[2])

# which evaluates in steps like this:

>>> f(data[0]["a"]   data[1]["a"], data[2])
>>> f(1   2, data[2])
>>> f(3, data[2])
>>> 3["a"]   data[2]["a"]

So this errors out because it tries to access item "a" from the integer 3.

Basically: The output of the function passed to reduce must be acceptable as it's first parameter. In your example, the lambda expects a dictionary as its first parameter and returns an integer.

CodePudding user response：

The reduction function receives the current reduced value plus the next iterated item to be reduced. The trick is in choosing what that reduction value looks like. In your case, if you choose a 2 item list holding the reduced values of 'a' and 'b', then the reduction function just adds the next 'a' and 'b' to those values. The reduction is most easily written as a couple of statements so should be moved from an anonymous lambda to a regular function. Start with an initializer of [0, 0] to hold the reduced 'a' and 'b', and you get:

from functools import reduce

def reducer(accum, next_dict):
    print(accum, next_dict) # debug trace
    accum[0]  = int(next_dict['a'])
    accum[1]  = int(next_dict['b'])
    return accum

dict1 = {'a': '1', 'b': '2'}
dict2 = {'a': '5', 'b': '0'}
dict3 = {'a': '7', 'b': '3'}

data_list = [dict1, dict2, dict3]

total_a, total_b = reduce(reducer, data_list, [0, 0])

CodePudding user response：

You misunderstand how reduce() works. In the function passed to it, its first argument is a partial result so far, and has nothing directly to do with the iterable passed to reduce(). The iterable is passed one element at a time, to the function's second argument. Since you want a sum, the initial value of the "partial result" needs to be 0, which also needs to be passed to reduce().

So, in all, these lines will print what you want:

print(reduce(lambda x, y: x   int(y['a']), data_list, 0))
print(reduce(lambda x, y: x   int(y['b']), data_list, 0))

EDIT: replaced eval() with int() above, so it matches the edited question. It's irrelevant to the answer, though.

EDIT 2: you keep changing the question, but I'm not going to keep changing the answer to match ;-) The code just above fully answers an earlier version of the question, and nothing material has changed. Exactly the same things are still at work, and exactly the same kind of approach is needed.

Gloss on types

While Python doesn't require explicit type declarations, sometimes they can be helpful.

If you have an iterable delivering objects of type A, and the result of reduce() is of type B, then the signature of the the first argument passed to reduce() must be

def reduction_function(x: B, y: A) -> B

In the example, A is dict and B is int. Passing a dict for both can't possibly work. That's essentially why we need to specifiy an initial value of type B (int) in this case.

In doc examples, A and B are typically both int or float. Then a simple or * is already type-correct for reduce()'s first argument.