Helloo,
I have stuck with collections in scala. I have a list like that:
val a2 = List(("Comedy",2.0,288,Some(1978)),
("Comedy",2.5,307,Some(1978)),
("Comedy",3.0,312,Some(1978)),
("Comedy",2.5,377,Some(1978)),
("Comedy",1.0,571,Some(1978)),
("Horror"," ",288,Some(1978)),
("Horror",2.5,307,Some(1978)),
("Adventure",4.0,187,Some(2003)),
("Adventure",3.0,260,Some(2003)))
My aim is to use group by with 1st and 4th value (Genre and year) and to count 2nd and 3rd values. Then, a new column should be added to make count_2nd/count_3rd. The final look should be something like that:
Comedy, some(1978) -> 5, 5 , 1.0
Horror, some(1978) -> 1, 2, 0.5
Adventure, some(2003) -> 2, 2, 1.0
I have tried that:
a2.flatMap(row => row._2.map( nm => (row._1,nm.rating,nm.userId,row._3)))
but IDE didn't allow to add something like that .count(i => i._2). I guess I am in a bad way to do it.
There should be also null value in data and in the 6th line of data there is a null value. By considering null values, I expect to have some different values in count_2nd/count_3rd.
Is it possible in scala and if it is possible, I will be appreciated :)
CodePudding user response:
If you need group by - use group by:
val a2: List[(String, Double, Int, Option[Int])] = List(("Comedy", 2.0, 288, Some(1978)),
("Comedy", 2.5, 307, Some(1978)),
("Comedy", 3.0, 312, Some(1978)),
("Comedy", 2.5, 377, Some(1978)),
("Comedy", 1.0, 571, Some(1978)),
("Horror", Double.NaN, 288, Some(1978)),
("Horror", 2.5, 307, Some(1978)),
("Adventure", 4.0, 187, Some(2003)),
("Adventure", 3.0, 260, Some(2003)))
val result = a2.groupBy(t => (t._1, t._4))
.view
.mapValues(lst => (lst.filter(!_._2.isNaN).length, lst.filter(!_._3.isNaN).length)) // transform grouping into tuple with needed counts
.mapValues(t => (t._1, t._2, t._1.toFloat / t._2)) // "add" 3rd column
.toMap
println(result) // prints Map((Horror,Some(1978)) -> (1,2,0.5), (Comedy,Some(1978)) -> (5,5,1.0), (Adventure,Some(2003)) -> (2,2,1.0))