Home > other >  Use pyspark SparkSQL error
Use pyspark SparkSQL error

Time:09-17

 & gt;> The from pyspark. SQL import SparkSession 
> The from pyspark. SQL import Row
>
> The spark=SparkSession. Builder. AppName (" example "). GetOrCreate ()
> Sc=spark. SparkContext
> RDD=sc. TextFile (" G://ml - 100 k/u.u ser ")
> RDD=RDD. The map (lambda line: line. The split (" | "))
> RDD. First ()

Under the window using pyspark execute the code above error
Stage 1: & gt; (0 + 1)/1] Traceback (most recent call l
Ast) :
The File "C: \ Python27 \ lib \ runpy py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
The File "C: \ Python27 \ lib \ runpy py", line 72, in _run_code
The exec code in run_globals
The File "C: \ spark \ python \ lib \ pyspark zip \ pyspark \ worker py", line 25, in & lt; module>
ImportError: No module named resource
The 2018-12-04 08:20:25 ERROR Executor: 91 - Exception in task 0.0 in stage 1.0 (dar) 1
Org. Apache. Spark. SparkException: Python worker failed to connect back.
The at org. Apache. Spark. API. Python. PythonWorkerFactory. CreateSimpleWorker (PythonWorkerFactory. Scala: 170)
At org. Apache. Spark. API. Python. PythonWorkerFactory. Create (97). PythonWorkerFactory scala:
The at org. Apache. Spark. SparkEnv. CreatePythonWorker (SparkEnv. Scala: 117)
At org.apache.spark.api.python.BasePythonRunner.com pute (PythonRunner. Scala: 108)
At org.apache.spark.api.python.PythonRDD.com pute (PythonRDD. Scala: 65)
At org.apache.spark.rdd.RDD.com puteOrReadCheckpoint (RDD. Scala: 324)
The at org. Apache. Spark. RDD. RDD. Iterator (RDD. Scala: 288)
The at org. Apache. Spark. The scheduler. ResultTask. RunTask (ResultTask. Scala: 90)
At org. Apache. Spark. The scheduler. Task. Run (121) Task. Scala:
The at org. Apache. Spark. Executor. Executor $TaskRunner $$anonfun $10. Apply (executor. Scala: 402)
The at org. Apache. Spark. Util. Utils $. TryWithSafeFinally (Utils. Scala: 1360)
The at org. Apache. Spark. Executor. $TaskRunner executor. Run (executor. Scala: 408)
The at Java. Util. Concurrent. ThreadPoolExecutor. RunWorker (ThreadPoolExecutor. Java: 1149)
The at Java. Util. Concurrent. ThreadPoolExecutor $Worker. The run (ThreadPoolExecutor. Java: 624)
The at Java. Lang. Thread. The run (Thread. Java: 748)
Under Caused by: java.net.SocketTimeoutException: Accept timed out
The at java.net.DualStackPlainSocketImpl.waitForNewConnection (Native Method)
At java.net.DualStackPlainSocketImpl.socketAccept DualStackPlainSocketImpl. Java: (135)
At java.net.AbstractPlainSocketImpl.accept AbstractPlainSocketImpl. Java: (409)
At java.net.PlainSocketImpl.accept PlainSocketImpl. Java: (199)
The at java.net.ServerSocket.implAccept ServerSocket. Java: (545)
The at java.net.ServerSocket.accept ServerSocket. Java: (513)
The at org. Apache. Spark. API. Python. PythonWorkerFactory. CreateSimpleWorker (PythonWorkerFactory. Scala: 164)
. 14 more

Someone had a similar situation, thank you

CodePudding user response:

PIP is introduced into the resource module, although there is no clear import, but pyspark should use the rely on, I have to resolve an error, the original poster can a try!
  • Related