I have a dataframe which looks like this
ID col
1 [item1 -> 0.2, Item2 -> 0.3, item3 -> 0.4]
2 [item2 -> 0.1, Item2 -> 0.7, item3 -> 0.2]
I want to sum of all the row wise decimal values and store into a new column
ID col total
1 [item1 -> 0.2, Item2 -> 0.3, item3 -> 0.4] 0.9
2 [item2 -> 0.1, Item2 -> 0.7, item3 -> 0.2] 1.0
My approach
df = df.withColumn('total', F.expr('aggregate(map_values(col),0,(acc,x) -> acc x)'))
This is not working as it says, it can be applied only to int
CodePudding user response:
data_sdf. \
withColumn('map_vals', func.map_values('col')). \
withColumn('sum_of_vals', func.expr('aggregate(map_vals, cast(0 as double), (x, y) -> x y)'))
Since, your values are of float
type, the initial value passed within the aggregate
should match the type of the values in the array. So, casting the initial 0
to double
instead of using 0
should work fine.