I have an array of strings defined in a variable which contains name of the column. I would like to perform group by and get count.
I am trying below code but throws error.
val keys = Array("Col1", "Col2")
val grouppedByDf = myDf.groupBy(keys.mkString(",").count
Can you please guide me what I am doing wrong here ?
CodePudding user response:
import spark.implicits._
val df = Seq(("βήτα", "άλφα", 20), ("άλφα", "βήτα", 10), ("άλφα", "βήτα", 20), ("βήτα", "άλφα", 10)).toDF("α", "β", "ω")
val keys = Array("α", "β")
df
.groupBy(keys.map(col(_)): _*)
.count()
.show()
---- ---- -----
| α| β|count|
---- ---- -----
|βήτα|άλφα| 2|
|άλφα|βήτα| 2|
---- ---- -----