Home > database >  java.io.IOException in local spark mode
java.io.IOException in local spark mode

Time:06-23

I am a newer in spark.

Thanks for your attention and your help. Here is my problem. I started my spark-shell(version 3.2.1) in local mode in my mac os (Catalina 10.15.7), then I ran some code below, no exceptions happened

import org.apache.spark.sql.types.{StringType, IntegerType, StructField, StructType}
import org.apache.spark.sql.Row
import org.apache.spark.sql.DataFrame
import org.apache.spark.rdd.RDD

val seq: Seq[(String, Int)] = Seq(("Bob", 14), ("Alice", 18))
val rdd: RDD[(String, Int)] = sc.parallelize(seq)
val schema:StructType = StructType( Array(StructField("name", StringType),StructField("age", IntegerType)))
val rowRDD: RDD[Row] = rdd.map(fileds => Row(fileds._1, fileds._2))
val dataFrame: DataFrame = spark.createDataFrame(rowRDD,schema)

finally I ran

dataFrame.show

and I got this

22/06/22 11:18:19 ERROR util.Utils: Aborting task=======>         (10   2) / 12]
java.io.IOException: Failed to connect to /192.168.1.3:50561
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
    at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:399)
    at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:367)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
    at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:366)
    at org.apache.spark.repl.ExecutorClassLoader.getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:135)
    at org.apache.spark.repl.ExecutorClassLoader.$anonfun$fetchFn$1(ExecutorClassLoader.scala:66)
    at org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:176)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:113)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:405)

CodePudding user response:

I will explain to you few things why the first step didn't throw the error and after the show that actually show the error.

Spark works with two different stages, transformations and executions.

The first part of your code are all transformations, you are telling s your Spark Driver to analyse the transformation and make sure your transformations will work. But will not execute anything until you call an execution step.

What is an execution step? Is anything that will throw an output, like a show() that will show the data in the screen, or a write() that will write output files.

Your problem was, once you do the transformation that runs in your local driver, no need to send to any cluster. Once you do an execution step it should send the commands to your spark cluster. Because you are running locally, there is no cluster to send. So to avoid this problem you should add the env variable:

$ export SPARK_LOCAL_IP="127.0.0.1"
$ spark-shell

That you will tell to your application that you are running locally and you don't have any cluster to send the transformation.

  • Related