I have a function like this in Scala code (Scala 2.13) for use with Spark
def getDataset[T <: Product: TypeTag](name:String): Dataset[T] = {
import spark.implicits._
val ds = spark.read.parquet(BASE_PATH "/" name).as[T]
ds.createOrReplaceTempView(name)
ds
}
Now I want to turn a Seq
of case classes, and for each class, call this function:
case class CLASS1(...)
case class CLASS2(...)
case class CLASS3(...)
Seq(CLASS1, CLASS2, CLASS3, ....).foreach {
c => getDataset[c??](name=c???)
}
I'm having a hard time figuring out the exact syntax; the symbol for the name of the case class, represented by the variable c
inside the foreach
, seems to represent the type of the apply
method (() => Product
). What I really want is the type of the case class to use as the type parameter, and the name of the case class.
It feels like I should be able to do this - what am I missing here?
CodePudding user response:
The problem is that you want to substitute T
(known at compile time) at type level and name
(known at runtime) at value level.
Normally T
and name
do not exist at the same time.
One option is to replace Seq(Class1, Class2, Class3)
on value level with Class1 :: Class2 :: Class3 :: HNil
on type level and use Shapeless
import shapeless.{::, HNil, Poly0, Poly1, Typeable}
import shapeless.ops.hlist.FillWith
import scala.reflect.runtime.universe.{TypeTag, typeOf}
object datasetPoly extends Poly1 {
implicit def cse[T <: Product : TypeTag /*: Typeable*/]: Case.Aux[T, Dataset[T]] =
at(_ => getDataset[T](/*Typeable[T].describe*/typeOf[T].toString))
}
object nullPoly extends Poly0 {
implicit def cse[T >: Null]: Case0[T] = at(null)
}
FillWith[nullPoly.type, Class1 :: Class2 :: Class3 :: HNil].apply().map(datasetPoly)
Alternatively you can use macros or runtime reflection.
In Seq(Class1, Class2, Class3)
Class1
, Class2
, Class3
are the companion objects of case classes.
For example with reflective toolbox
import scala.reflect.runtime.universe.Quasiquote
import scala.reflect.runtime.{currentMirror => cm}
import scala.tools.reflect.ToolBox
val tb = cm.mkToolBox()
Seq(Class1, Class2, Class3).foreach(c => {
val classSymbol = cm.reflect(c).symbol.companion
tb.eval(q"App.getDataset[$classSymbol](${classSymbol.name.toString})")
})
You should add to build.sbt
libraryDependencies = scalaOrganization.value % "scala-reflect" % scalaVersion.value
libraryDependencies = scalaOrganization.value % "scala-compiler" % scalaVersion.value
CodePudding user response:
You could define your own companion objects for the case classes and include a method in each which calls getDataset
. For example, this should work (passed by my mental compiler):
abstract class DatasetProvider[T <: Product : TypeTag] {
val name: String
def dataset: Dataset[T] =
getDataset[T](name)
}
case class Class1(...)
object Class1 extends DatasetProvider[Class1] {
override val name: String = "class1"
}
// and so forth for Class2, Class3
Seq(Class1, Class2, Class3).foreach { c =>
val ds = c.dataset
???
}
Note that if defining your own companion object, you will have to explicitly mark it as a function if you want to use it as one: this may or may not be desirable.