The object WordCount {
Var STR: String=null
Def main (args: Array [String]) {
If (args. Length & lt; 1) {
System. Err. Println (" the Usage: & lt; File>" )
System. The exit (1)
}
Val conf=new SparkConf ()
Val sc=new SparkContext (conf)
Val line=sc. TextFile (args (0))
Val counts=line. FlatMap (_. The split (" ")). The map (word=& gt; {
STR="welcome to spark"
(word, 1)
})
Println (STR)
Val finalRdd=counts. ReduceByKey ((x, y)=& gt; {
X + y
Println (STR)
}). Collect (). The foreach (println)
Sc. Stop ()
}
}
Wordcount code it is a simple, I just declared a global variable STR, and assignment of STR in the map, I want to know why the println statements between the map and reduce printed STR is null, is it because I didn't do the action operation, the reason of the map statement has not been performed? I hope the map to reduce the content of the assignment of STR, operate in the reduce, excuse me what method?
CodePudding user response:
STR is local variables, you can't be used by other nodes in the cluster to share, you can use the broadcast (radio) to go out, make all the worker nodes Shared this variable is used,CodePudding user response:
As far as I know radio variables cannot be defined in the map, and I need values through the map, you need to values to variables, at the time of serialization will appear problem,CodePudding user response:
Your analysis is right because there is no action, so the RDD just memory operation, and does not operate,import org. Apache. Spark. {SparkConf, SparkContext}
/* *
* Created by mahuichao on 16/8/12.
*/
The object Test04 {
Var STR: String=""
Def main (args: Array [String]) : Unit={
Val conf=new SparkConf (.) setAppName (" test04 "). SetMaster (" local "[2])
Val sc=new SparkContext (conf)
Sc. SetLogLevel (" FATAL ")
Val path="/Users/mahuichao/Downloads/test. TXT"
Val file=sc. TextFile (path)
File. FlatMap (_. The split (" ")). The map {word=& gt;
STR="hello the crude world"
(word, (1, STR))
//(word, 1)
}. ReduceByKey {
Case (x: (Int, String), y: (Int, String))=& gt;
Println (" I am the value of STR: "+ STR)
(x) _1 + y) _1, STR)
}. The map {case (x, y1, y2))=& gt;
(x, y1)
}. Collect (). The foreach (println)
}
}