Home > Enterprise >  How to select columns that exist in case classes from DataFrame
How to select columns that exist in case classes from DataFrame

Time:09-17

Given a spark DataFrame with columns "id", "first", "last", "year"

val df=sc.parallelize(Seq(
  (1, "John", "Doe", 1986),
  (2, "Ive", "Fish", 1990),
  (4, "John", "Wayne", 1995)
)).toDF("id", "first", "last", "year")

and case class

case class IdAndLastName(
id: Int,
last:String )

I would like to only select columns in case class which are id and last. In other words, I would like to have this output df.select("id","last") by using case class. I am avoiding hardcoding the attributes. Could you please help me how can I achieve this in a compact way.

CodePudding user response:

You can create explictly an encoder for the case class (usually this happens implicitly here). Then you can get the field names from the encoder and use them in the select statement:

val fieldnames = Encoders.product[IdAndLastName].schema.fieldNames
df.select(fieldnames.head, fieldnames.tail:_*).show()

Output:

 --- ----- 
| id| last|
 --- ----- 
|  1|  Doe|
|  2| Fish|
|  4|Wayne|
 --- ----- 
  • Related