I'm new in scala, so getting difficulty to visualize what's happening in below code.
I've one Dataset
and it has one map function.
here is map function
def map[U](func: MapFunction[T, U], encoder: Encoder[U]): Dataset[U]
as we can see its taking two parameters, func trait (interface in java) and Encoder trait
In my code, there is one function as below where am using my Dataset, like this
def myDef(spark : SparkSession, dataset: DataSet) : Dataset[MyType] {
import spark.implicits._
dataset.map{ x=> //here is one Row set from Dataset
//then extracting values from Row
//then assign those extracted values to my object
MyType(val1, val2 etc)
}
}
}
My understanding is, when its directly map{ x=> }
, its a implementation of MapFunction
trait on any anonymous class (thinking from java point of view
), but how the second parameter (encoder
) of map
function is automatically supplied by just passing import spark.implicits._
, its very confusing to me to visualize and understand
My understanding wherever we use spark.implicits._
, it will provide the extra functionalities defined with addition to already provided in class.
But still not able to clearly understand that, how come by just adding spark.implicits._
tells map
to pick MyType
as second parameter and use it ? How its being even used ?
Because in Java, even after implementation of interface parameter of map
method, had to pass Encoder type
, here in scala its too confusing to understand/visualize.
Can anyone please help me on this understand ? it will help to me have clear understanding what's happening here
CodePudding user response:
The map
method that is being picked up here is not the one you think, it's:
def map[U](func: (T) ⇒ U)(implicit arg0: Encoder[U]): Dataset[U]
See Scaladoc.
It is very similar but an important thing is that there is 2 "groups" of parameters: the first one having only one parameter func
and the second having one parameter arg0: Encoder[U]
but with a special indicator: implicit
.
implicit
on a parameter instructs the compiler to automatically pass a value of the desired type marked as implicit
as well without you having to explicitly pass it.
In your case import spark.implicits._
imports some values (marked as implicit
) in the scope, including a implicit val whatever: Encoder[U]
. And then the compiler is able to find this value and pass it to the method.
Note: it's actually slightly more complex because here the expected implicit parameter is generic. The import
will actually import a implicit def whatever[U]: Encoder [U]
but the idea is the same.
EDIT: and the U
that the compiler must know to find the correct Encoder[U]
is resolved thanks to the 1st parameter as well as the return type you declared: MyType
.