Home > Software design >  How to make aggregation like count in scala
How to make aggregation like count in scala

Time:04-20

Helloo,

I have stuck with collections in scala. I have a list like that:

val a2 = List(("Comedy",2.0,288,Some(1978)),
  ("Comedy",2.5,307,Some(1978)),
  ("Comedy",3.0,312,Some(1978)),
  ("Comedy",2.5,377,Some(1978)),
  ("Comedy",1.0,571,Some(1978)),
  ("Horror"," ",288,Some(1978)),
    ("Horror",2.5,307,Some(1978)),
  ("Adventure",4.0,187,Some(2003)),
  ("Adventure",3.0,260,Some(2003)))

My aim is to use group by with 1st and 4th value (Genre and year) and to count 2nd and 3rd values. Then, a new column should be added to make count_2nd/count_3rd. The final look should be something like that:

Comedy, some(1978) -> 5, 5 , 1.0
Horror, some(1978) -> 1, 2, 0.5
Adventure, some(2003) -> 2, 2, 1.0

I have tried that:

a2.flatMap(row => row._2.map( nm => (row._1,nm.rating,nm.userId,row._3))) 

but IDE didn't allow to add something like that .count(i => i._2). I guess I am in a bad way to do it.

There should be also null value in data and in the 6th line of data there is a null value. By considering null values, I expect to have some different values in count_2nd/count_3rd.

Is it possible in scala and if it is possible, I will be appreciated :)

CodePudding user response:

If you need group by - use group by:

val a2: List[(String, Double, Int, Option[Int])] = List(("Comedy", 2.0, 288, Some(1978)),
    ("Comedy", 2.5, 307, Some(1978)),
    ("Comedy", 3.0, 312, Some(1978)),
    ("Comedy", 2.5, 377, Some(1978)),
    ("Comedy", 1.0, 571, Some(1978)),
    ("Horror", Double.NaN, 288, Some(1978)),
    ("Horror", 2.5, 307, Some(1978)),
    ("Adventure", 4.0, 187, Some(2003)),
    ("Adventure", 3.0, 260, Some(2003)))

val result = a2.groupBy(t => (t._1, t._4))
    .view
    .mapValues(lst => (lst.filter(!_._2.isNaN).length, lst.filter(!_._3.isNaN).length)) // transform grouping into tuple with needed counts 
    .mapValues(t => (t._1, t._2, t._1.toFloat / t._2)) // "add" 3rd column
    .toMap

println(result) // prints Map((Horror,Some(1978)) -> (1,2,0.5), (Comedy,Some(1978)) -> (5,5,1.0), (Adventure,Some(2003)) -> (2,2,1.0)) 
  • Related