Into the spark - shell command line environment
lj: @ ubuntu - 1 ~ $$SPARK_HOME/bin/spark - the shell - master the local
Enter the following code:
Val conf=new SparkConf (.) setAppname (" test "). SetMaster (" local ")
Val sc=new SparkContext (conf)
Val RDD=sc. Parallelize (1 to 100)
Def func (sc: SparkContext, RDD: RDD [Int]) {
Var id1=0; Var id2=0
For (I & lt; 1 to 2) {
Id1=22; Id2=79
Var temp=sc. RunJob (RDD, ite: Iterator (Int))=& gt; Ite. ToSeq. Apply (id2 - id1), Seq (0), true). Apply (0)
Print (temp + "\ n")
}
}
Func (sc, RDD)
Will Spark the operating mode is set to "local", so only the RDD only one partition, the purpose of the code is to 58 (in number of 57=id2 - all the elements of id1)
According to two times, but a run error "org. Apache. Spark. SparkException: Task not serializable", after several attempts, the code fragment
Var temp=sc. RunJob (RDD, ite: Iterator (Int))=& gt; Ite. ToSeq. Apply (id2 - id1), Seq (0), true). Apply (0)
Changed to
Var id3=id2 - id1
Var temp=sc. RunJob (RDD, ite: Iterator (Int))=& gt; Ite. ToSeq. Apply (id3), Seq (0), true). Apply (0)
After code is no problem, the operation is successful, really can't imagine why, please comment, thank!