Home > Software engineering >  Tuple manipulation in PySpark with lambda, python
Tuple manipulation in PySpark with lambda, python

Time:12-17

I have a parallelized list of tuples in the format:

data= [('Emily', (4, 2)),
        ('Alfred', (1, 12)),
        ('George', (10, 2))]
list = sc.parallelize(data)

What I want is to multiply the integers within the tuples which will give me this output:

[('Emily', (8)),
 ('Alfred', (12)),
 ('George', (20))]

I have tried:

list = list.map(lambda x: (x[0], x[1]*x[2]))

But with not effect.

CodePudding user response:

In you lambda x[1] is a tuple ((4, 2)...), so you need to access the first and second values you want to multiply (x[1][0]...).

Try this instead:

result = list.map(lambda x: (x[0], x[1][0] * x[1][1]))

print(result.collect())
#[('Emily', 8), ('Alfred', 12), ('George', 20)]

Another way by passing the tuple to reduce function with mul operator:

import operator
import functools

list.map(lambda x: (x[0], functools.reduce(operator.mul, x[1], 1)))
  • Related