I have a class Dimensions(Int, Int, Int) and a Shape(String name), put into a Tuple(Shape, Dimensions)
My dataset is:
(Cube, Dimensions(5,5,5))
(Sphere, Dimensions(5,10,15))
(Cube, Dimensions(3,3,3))
I need to return this:
(Cube, Dimensions(8,8,8))
(Sphere, Dimensions(5,10,15))
where I group by the name of the shape then sum up all of the dimension values. Currently I am able to map into a (Name, Int, Int, Int) but I am unsure of how to wrap it back to a Dimension object.
data.map(_._2.map(x => (x.length,x.width,x.height)))
Any help would be appreciated
CodePudding user response:
Assuming there are no very specific special cases and you have a RDD. You just need an aggregateByKey
.
case class Dimensions(i1: Int, i2: Int, i3: Int)
val initialRdd: RDD[(Shape, Dimensions)] = ???
def combineDimensions(dimensions1: Dimensions, dimensions2: Dimensions): Dimensions =
Dimensions(
dimensions1.i1 dimensions2.i1,
dimensions1.i2 dimensions2.i2,
dimensions1.i3 dimensions2.i3
)
val finalRdd: RDD[(Shape, Dimensions)] =
initialRdd
.aggregateByKey(Dimensions(0, 0, 0))(
{ case (accDimensions, dimensions) =>
combineDimensions(accDimensions, dimensions)
},
{ case (partitionDimensions1, partitionDimensions2) =>
combineDimensions(partitionDimensions1, partitionDimensions1)
}
)