System. SetProperty (" user name ", "webuser");
JavaSparkContext ct=new JavaSparkContext (" spark://192.168.90.74:7077 ", "test - 1", "/home/webuser/spark/spark - 1.5.2 bi - n - hadoop2.4", "C://newWorkSpace/Java. Spark. The test/target/Java. The spark. The test - 0.0.1 - the SNAPSHOT. Jar");
List the List=new ArrayList ();
List. The add (1);
List. The add (6);
List. The add (9);
JavaRDDRDD=ct. Parallelize (list);
System. The out. Println (RDD. Collect ());
RDD. SaveAsTextFile ("/home/webuser/temp ");
Ct. The close ();
1. The run time to add the jar package, here is the use of sparkContext constructor added, could you tell me the jar on the package must be uploaded to the master in advance and then use the path on the master?
2. When I put the jar path specified as maste jars on the path of the program can run, until an error can not find the drive c, after the operation and new problems, and program success with collect results, but the saveAsTextFile result is not correct, he in my development of the computer's disk c created under/home/webuser/temp folder, not running on the server of the spark to create, what is this principle? I understand the action of RDD should be run in the worker nodes, and the place where I program is running is a driver, is what he would create a file rather than the worker on driver?
CodePudding user response:
This what path with Linux and Windows, don't understandCodePudding user response:
Tech support