Spark version: spark - 2.2.1 - bin - hadoop2.7. TGZ
Hadoop version: hadoop - 2.7.3. Tar. Gz
Package the spark
The import org. Apache. Spark. {SparkConf, SparkContext}
The object WordCount {
Def main (args: Array [String]) : Unit={
Val conf=new SparkConf ()
SetMaster (" spark://192.168.1.110:7077 ")
SetAppName (WordCount)
SetJars (Seq (" out/artifacts/WordCountExample_jar/WordCountExample jar "))
Val sc=new SparkContext (conf)
Val inputRdd=sc. TextFile (" HDFS://192.168.1.110:8020/data/words. The log ")
Val wordCountRDD=inputRdd. FlatMap (_. The split (" \ t ")). The map ((_, 1)). ReduceByKey + (_ _) sortBy (_) _2, false)
Val result=wordCountRDD. Collect ()
For (pair & lt; - the result) {
Println (pair)
}
Sc. Stop ()
}
}
Project I use the maven build or ordinary scala project build operation error are the same,
Here is the error log information
18/01/20 01:33:06 WARN TaskSetManager: Lost task in stage 0.0 0.0 (dar 0, 192.168.1.113, executor (1) : Java. Lang. ClassCastException: always assign the instance of scala. Collections. Immutable. List $SerializationProxy to field org.apache.spark.rdd.RDD.org $apache $$$$$dependencies_ RDD RDD spark of type scala. Collections. Seq in the instance of org. Apache. Spark. RDD. MapPartitionsRDD
The at Java. IO. ObjectStreamClass $FieldReflector. SetObjFieldValues (ObjectStreamClass. Java: 2233)
The at Java. IO. ObjectStreamClass. SetObjFieldValues (ObjectStreamClass. Java: 1405)
The at Java. IO. ObjectInputStream. DefaultReadFields (ObjectInputStream. Java: 2288)
The at Java. IO. ObjectInputStream. ReadSerialData (ObjectInputStream. Java: 2206)
The at Java. IO. ObjectInputStream. ReadOrdinaryObject (ObjectInputStream. Java: 2064)
The at Java. IO. ObjectInputStream. ReadObject0 (ObjectInputStream. Java: 1568)
The at Java. IO. ObjectInputStream. DefaultReadFields (ObjectInputStream. Java: 2282)
The at Java. IO. ObjectInputStream. ReadSerialData (ObjectInputStream. Java: 2206)
The at Java. IO. ObjectInputStream. ReadOrdinaryObject (ObjectInputStream. Java: 2064)
The at Java. IO. ObjectInputStream. ReadObject0 (ObjectInputStream. Java: 1568)
The at Java. IO. ObjectInputStream. ReadObject (ObjectInputStream. Java: 428)
The at org. Apache. Spark. Serializer. JavaDeserializationStream. ReadObject (JavaSerializer. Scala: 75)
The at org. Apache. Spark. Serializer. JavaSerializerInstance. Deserialize (JavaSerializer. Scala: 114)
The at org. Apache. Spark. The scheduler. ShuffleMapTask. RunTask (ShuffleMapTask. Scala: 85)
The at org. Apache. Spark. The scheduler. ShuffleMapTask. RunTask (ShuffleMapTask. Scala: 53)
At org. Apache. Spark. The scheduler. Task. Run (108) Task. Scala:
The at org. Apache. Spark. Executor. $TaskRunner executor. Run (executor. Scala: 338)
The at Java. Util. Concurrent. ThreadPoolExecutor. RunWorker (ThreadPoolExecutor. Java: 1149)
The at Java. Util. Concurrent. ThreadPoolExecutor $Worker. The run (ThreadPoolExecutor. Java: 624)
The at Java. Lang. Thread. The run (Thread. Java: 748)
.
18/01/20 01:33:08 INFO DAGScheduler: ShuffleMapStage 0 (map at WordCount.scala:14) failed in 47.488 s due to Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 4, 192.168.1.113, executor 1): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
The at Java. IO. ObjectStreamClass $FieldReflector. SetObjFieldValues (ObjectStreamClass. Java: 2233)
The at Java. IO. ObjectStreamClass. SetObjFieldValues (ObjectStreamClass. Java: 1405)
The at Java. IO. ObjectInputStream. DefaultReadFields (ObjectInputStream. Java: 2288)
The at Java. IO. ObjectInputStream. ReadSerialData (ObjectInputStream. Java: 2206)
The at Java. IO. ObjectInputStream. ReadOrdinaryObject (ObjectInputStream. Java: 2064)
The at Java. IO. ObjectInputStream. ReadObject0 (ObjectInputStream. Java: 1568)
The at Java. IO. ObjectInputStream. DefaultReadFields (ObjectInputStream. Java: 2282)
The at Java. IO. ObjectInputStream. ReadSerialData (ObjectInputStream. Java: 2206)
The at Java. IO. ObjectInputStream. ReadOrdinaryObject (ObjectInputStream. Java: 2064)
The at Java. IO. ObjectInputStream. ReadObject0 (ObjectInputStream. Java: 1568)
The at Java. IO. ObjectInputStream. ReadObject (ObjectInputStream. Java: 428)
The at org. Apache. Spark. Serializer. JavaDeserializationStream. ReadObject (JavaSerializer. Scala: 75)
The at org. Apache. Spark. Serializer. JavaSerializerInstance. Deserialize (JavaSerializer. Scala: 114)
The at org. Apache. Spark. The scheduler. ShuffleMapTask. RunTask (ShuffleMapTask. Scala: 85)
The at org. Apache. Spark. The scheduler. ShuffleMapTask. RunTask (ShuffleMapTask. Scala: 53)
At org. Apache. Spark. The scheduler. Task. Run (108) Task. Scala:
The at org. Apache. Spark. Executor. $TaskRunner executor. Run (executor. Scala: 338)
The at Java. Util. Concurrent. ThreadPoolExecutor. RunWorker (ThreadPoolExecutor. Java: 1149)
nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull