Home > other >  About & quot; The Task not serializable" Doubts, also please master
About & quot; The Task not serializable" Doubts, also please master

Time:09-30

Recent studies using Spark to do data mining algorithm implementation, need to record in RDD sampling, through several methods of RDD first (), sample (withReplacement: Boolean, fraction: Double, seed: Long=Utils. Random. NextLong), take (num: Int), takeSample (withReplacement: Boolean, num: Int, seed: Long=Utils. Random. NextLong) are able to obtain the specified records, observe their source, found that the most critical part of the method are called SparkContext runJob (... ), so have the following experiments,
Into the spark - shell command line environment
 lj: @ ubuntu - 1 ~ $$SPARK_HOME/bin/spark - the shell - master the local 

Enter the following code:
 
Val conf=new SparkConf (.) setAppname (" test "). SetMaster (" local ")
Val sc=new SparkContext (conf)
Val RDD=sc. Parallelize (1 to 100)
Def func (sc: SparkContext, RDD: RDD [Int]) {
Var id1=0; Var id2=0
For (I & lt; 1 to 2) {
Id1=22; Id2=79
Var temp=sc. RunJob (RDD, ite: Iterator (Int))=& gt; Ite. ToSeq. Apply (id2 - id1), Seq (0), true). Apply (0)
Print (temp + "\ n")
}
}
Func (sc, RDD)

Will Spark the operating mode is set to "local", so only the RDD only one partition, the purpose of the code is to 58 (in number of 57=id2 - all the elements of id1)
According to two times, but a run error "org. Apache. Spark. SparkException: Task not serializable", after several attempts, the code fragment
 
Var temp=sc. RunJob (RDD, ite: Iterator (Int))=& gt; Ite. ToSeq. Apply (id2 - id1), Seq (0), true). Apply (0)

Changed to
 
Var id3=id2 - id1
Var temp=sc. RunJob (RDD, ite: Iterator (Int))=& gt; Ite. ToSeq. Apply (id3), Seq (0), true). Apply (0)

After code is no problem, the operation is successful, really can't imagine why, please comment, thank!
  • Related