Home > database >  Is there any way to specify type in scala dinamically
Is there any way to specify type in scala dinamically

Time:11-24

I'm new in Spark, Scala, so sorry for stupid question. So I have a number of tables:

table_a, table_b, ...

and number of corresponding types for these tables

case class classA(...), case class classB(...), ...

Then I need to write a methods that read data from these tables and create dataset:

def getDataFromSource: Dataset[classA] = {
       val df: DataFrame = spark.sql("SELECT * FROM table_a")
       df.as[classA]
}

The same for other tables and types. Is there any way to avoid routine code - I mean individual fucntion for each table and get by with one? For example:

def getDataFromSource[T: Encoder](table_name: String): Dataset[T] = {
       val df: DataFrame = spark.sql(s"SELECT * FROM $table_name")
       df.as[T]
}

Then create list of pairs (table_name, type_name):

val tableTypePairs = List(("table_a", classA), ("table_b", classB), ...)

Then to call it using foreach:

tableTypePairs.foreach(tupl => getDataFromSource[what should I put here?](tupl._1))

Thanks in advance!

CodePudding user response:

Something like this should work

def getDataFromSource[T](table_name: String, encoder: Encoder[T]): Dataset[T] =
  spark.sql(s"SELECT * FROM $table_name").as(encoder)

val tableTypePairs = List(
  "table_a" -> implicitly[Encoder[classA]],
  "table_b" -> implicitly[Encoder[classB]]
)

tableTypePairs.foreach {
  case (table, enc) =>
    getDataFromSource(table, enc)
}

Note that this is a case of discarding a value, which is a bit of a code smell. Since Encoder is invariant, tableTypePairs isn't going to have that useful of a type, and neither would something like

tableTypePairs.map {
  case (table, enc) =>
    getDataFromSource(table, enc)
}

CodePudding user response:

One option is to pass the Class to the method, this way the generic type T will be inferred:

def getDataFromSource[T: Encoder](table_name: String, clazz: Class[T]): Dataset[T] = {
       val df: DataFrame = spark.sql(s"SELECT * FROM $table_name")
       df.as[T]
}

tableTypePairs.foreach { case (table name, clazz) => getDataFromSource(tableName, clazz) }

But then I'm not sure of how you'll be able to exploit this list of Dataset without .asInstanceOf.

  • Related