I have one method that takes class as parameter like below.
val hBaseRDD = spark.sparkContext.newAPIHadoopFile(path,
classOf[org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat[ImmutableBytesWritable, Result]],
classOf[ImmutableBytesWritable],
classOf[Result], conf)
I want to write a method that takes parameter as Type of class and than I can call this line inside it. like below.
case class SequenceInput(conf: Configuration,
path: String,
storageClass: String,
keyClass: String,
valueClass: String,
){
override def read(sparkSession: SparkSession): DataFrame = {
val rdd = sparkSession.sparkContext.newAPIHadoopFile(path,
classOf[storageClass],
classOf[keyClass],
classOf[valueClass], conf)
rdd
}
but this ask me to create storaClass, keyClass, valueClass but these are the variable that holds the class type.
How to do this?
CodePudding user response:
You're writing a constructor, not a method, but change
storageClass: String,
keyClass: String,
valueClass: String
To be Class objects, not Strings
Then your function can
return sparkSession.sparkContext.newAPIHadoopFile(path,
storageClass,
keyClass
valueClass, conf)
Then
val storageClass = Class.forName(config.get("storage_class"))
...
// remove path from the constructor since you should be able to use multiple paths
val df = SequenceInput(storageClass,...).read(spark, path)
Keep in mind that, Class.forName
expects the fully qualified name, not just "ImmutableBytesWritable"
, for example
CodePudding user response:
If I understand correctly, you need to convert a String
into Class
. You can do this with Class.forName(String)
case class SequenceInput(conf: Configuration,
path: String,
storageClass: String,
keyClass: String,
valueClass: String,
) {
override def read(sparkSession: SparkSession): DataFrame = {
val rdd = sparkSession.sparkContext.newAPIHadoopFile(path,
Class.forName(storageClass),
Class.forName(keyClass),
Class.forName(valueClass), conf)
rdd
}
}