Home > Software engineering >  Convert a DataFrame to another with "case class" type
Convert a DataFrame to another with "case class" type

Time:09-02

I try to pass a "DataFrame" to one of a particular type but it throws me this error. I am new to Scala and I am trying to realize that.

case class RegionClass(name: String, count: Int)
implicit val encoder: Encoder[RegionClass] = Encoders.product[RegionClass]
df.groupBy("Region")
  .count().as[RegionClass](encoder)

error:

>  implicit val encoder: Encoder[RegionClass] = Encoders.product[RegionClass]
                                                             ^

CodePudding user response:

There is three problems in your code:

1. No need to specify the encoder

Scala and Spark are smart enough to automatically create an encoder.

2. Your class attribute names don't match the dataframe column names

Ths first column of the dataframe is Region while the first attribute of your case class is name. You need to rename it in order to be able to cast a row to RegionClass.

3. The schema of your dataframe does not match the schema of your class

While the type of count is Int in your class, the .count() method of Spark return a bigint. You need to cast it to an Int in order to be able to case a row to RegionClass.

Solution

df.groupBy("Region")
  .count()
  .select(
    'Region as "name",  // rename the "Region" column to match class attribute name
    'count cast "int"  // cast the "count" column to match class attribute type
  )
  .as[RegionClass]
  • Related