I try to pass a "DataFrame" to one of a particular type but it throws me this error. I am new to Scala and I am trying to realize that.
case class RegionClass(name: String, count: Int)
implicit val encoder: Encoder[RegionClass] = Encoders.product[RegionClass]
df.groupBy("Region")
.count().as[RegionClass](encoder)
error:
> implicit val encoder: Encoder[RegionClass] = Encoders.product[RegionClass]
^
CodePudding user response:
There is three problems in your code:
1. No need to specify the encoder
Scala and Spark are smart enough to automatically create an encoder.
2. Your class attribute names don't match the dataframe column names
Ths first column of the dataframe is Region
while the first attribute of your case class is name
. You need to rename it in order to be able to cast a row to RegionClass
.
3. The schema of your dataframe does not match the schema of your class
While the type of count
is Int
in your class, the .count()
method of Spark return a bigint
. You need to cast it to an Int
in order to be able to case a row to RegionClass
.
Solution
df.groupBy("Region")
.count()
.select(
'Region as "name", // rename the "Region" column to match class attribute name
'count cast "int" // cast the "count" column to match class attribute type
)
.as[RegionClass]