Home > other >  Help!!!!! Python + spark to run the program errors
Help!!!!! Python + spark to run the program errors

Time:09-23

Saw an analysis of some data on the net a post, and then run down down, out a lot of wrong, I am new one, please everyone a great god help see how to solve this problem

The following is a source:
# - * - coding: utf-8 - * -

The from pyspark import SparkConf, SparkContext
The import re

The conf=SparkConf (.) setMaster (" local "). SetAppName (" MY First App ")
Sc=SparkContext (conf=conf)
CsdnRDD=sc. TextFile (" data/test. TXT ")
Tmprdd1=csdnRDD. The map (lambda x: (x.s plit (" \ t ") [2]))
Tmprdd2=tmprdd1. The map (lambda x: x.s plit (" ___csdn_1quot ") [0])
Tmprdd4=tmprdd2. Filter (lambda x: re match (" \ \ w + ([‐ +.] \ \ w +) * @ \ \ w + ([‐.] \ \ w +) * \ \. \ \ w + ([‐.] \ \ w +) * ", STR (x)))
Tmprdd5=tmprdd4. The map (lambda x: STR (x). The split (" @ ") [1])
Tmprdd6=tmprdd5. The map (lambda x: x.l power ())
Tmprdd7=tmprdd6. The map (lambda x: (x, 1)). The cache ()
Num=tmprdd7. The count ()
Tmprdd8=tmprdd7. ReduceByKey (lambda x, y, x + y). The cache ()
Tmprdd9=tmprdd8. The map (lambda x: [x x [1], [0]])
Tmprdd10=tmprdd9. SortBy (ascending=False, numPartitions=None, keyfunc=lambda x: x)
Res=tmprdd10. Take (10)
For each in res:
Print (each, STR ((each [0]/num) * 100) + "%")

The following is the error code:
Org. Apache. Spark. API. Python. PythonException: Traceback (the most recent call last) :
The File "C: \ eclipse \ spark - 2.0.2 - bin - hadoop2.7 \ python \ lib \ pyspark zip \ pyspark \ worker py", line 172, the main in
The File "C: \ eclipse \ spark - 2.0.2 - bin - hadoop2.7 \ python \ lib \ pyspark zip \ pyspark \ worker py", line 167, in the process
The File "C: \ eclipse \ spark - 2.0.2 - bin - hadoop2.7 \ python \ lib \ pyspark zip \ pyspark \ serializers py", line 263, in dump_stream
V=list (itertools islice (iterator, batch))
The File "C: \ workspace \ Python_Spark \ Spark_Study \ test py", line 9, the in & lt; Lambda>
Tmprdd1=csdnRDD. The map (lambda x: (x.s plit (" \ t ") [2]))
IndexError: list index out of range

At org. Apache. Spark. API. Python. PythonRunner $$$1. -anon read (193). PythonRDD scala:
The at org. Apache. Spark. API. Python. PythonRunner $$$1. -anon & lt; init> (PythonRDD. Scala: 234)
At org.apache.spark.api.python.PythonRunner.com pute (PythonRDD. Scala: 152)
At org.apache.spark.api.python.PythonRDD.com pute (PythonRDD. Scala: 63)
At org.apache.spark.rdd.RDD.com puteOrReadCheckpoint (RDD. Scala: 319)
The at org. Apache. Spark. RDD. RDD $$anonfun $8. Apply (RDD. Scala: 332)
The at org. Apache. Spark. RDD. RDD $$anonfun $8. Apply (RDD. Scala: 330)
The at org. Apache. Spark. Storage. BlockManager $$$doPutIterator anonfun $1. Apply (BlockManager. Scala: 951)
The at org. Apache. Spark. Storage. BlockManager $$$doPutIterator anonfun $1. Apply (BlockManager. Scala: 926)
The at org. Apache. Spark. Storage. BlockManager. DoPut (BlockManager. Scala: 866)
The at org. Apache. Spark. Storage. BlockManager. DoPutIterator (BlockManager. Scala: 926)
The at org. Apache. Spark. Storage. BlockManager. GetOrElseUpdate (BlockManager. Scala: 670)
The at org. Apache. Spark. RDD. RDD. GetOrCompute (RDD. Scala: 330)
The at org. Apache. Spark. RDD. RDD. Iterator (RDD. Scala: 281)
At org.apache.spark.api.python.PythonRDD.com pute (PythonRDD. Scala: 63)
At org.apache.spark.rdd.RDD.com puteOrReadCheckpoint (RDD. Scala: 319)
The at org. Apache. Spark. RDD. RDD. Iterator (RDD. Scala: 283)
The at org. Apache. Spark. The scheduler. ResultTask. RunTask (ResultTask. Scala: 70)
At org. Apache. Spark. The scheduler. Task. Run (86) Task. Scala:
The at org. Apache. Spark. Executor. $TaskRunner executor. Run (executor. Scala: 274)
The at Java. Util. Concurrent. ThreadPoolExecutor. RunWorker (ThreadPoolExecutor. Java: 1142)
The at Java. Util. Concurrent. ThreadPoolExecutor $Worker. The run (ThreadPoolExecutor. Java: 617)
The at Java. Lang. Thread. The run (Thread. Java: 745)
17/03/15 22:13:23 WARN TaskSetManager: Lost task 0.0 in stages (dar 0, localhost) : 0.0 org. Apache. The spark. API. Python. PythonException: Traceback (the most recent call last) :
The File "C: \ eclipse \ spark - 2.0.2 - bin - hadoop2.7 \ python \ lib \ pyspark zip \ pyspark \ worker py", line 172, the main in
The File "C: \ eclipse \ spark - 2.0.2 - bin - hadoop2.7 \ python \ lib \ pyspark zip \ pyspark \ worker py", line 167, in the process
The File "C: \ eclipse \ spark - 2.0.2 - bin - hadoop2.7 \ python \ lib \ pyspark zip \ pyspark \ serializers py", line 263, in dump_stream
V=list (itertools islice (iterator, batch))
The File "C: \ workspace \ Python_Spark \ Spark_Study \ test py", line 9, the in & lt; Lambda>
Tmprdd1=csdnRDD. The map (lambda x: (x.s plit (" \ t ") [2]))
IndexError: list index out of range

At org. Apache. Spark. API. Python. PythonRunner $$$1. -anon read (193). PythonRDD scala:
The at org. Apache. Spark. API. Python. PythonRunner $$$1. -anon & lt; init> (PythonRDD. Scala: 234)
At org.apache.spark.api.python.PythonRunner.com pute (PythonRDD. Scala: 152)
At org.apache.spark.api.python.PythonRDD.com pute (PythonRDD. Scala: 63)
At org.apache.spark.rdd.RDD.com puteOrReadCheckpoint (RDD. Scala: 319)
The at org. Apache. Spark. RDD. RDD $$anonfun $8. Apply (RDD. Scala: 332)
The at org. Apache. Spark. RDD. RDD $$anonfun $8. Apply (RDD. Scala: 330)
The at org. Apache. Spark. Storage. BlockManager $$$doPutIterator anonfun $1. Apply (BlockManager. Scala: 951)
nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull
  • Related