I am a newer in spark.
Thanks for your attention and your help.
Here is my problem.
I started my spark-shell(version 3.2.1)
in local mode in my mac os (Catalina 10.15.7)
,
then I ran some code below, no exceptions happened
import org.apache.spark.sql.types.{StringType, IntegerType, StructField, StructType}
import org.apache.spark.sql.Row
import org.apache.spark.sql.DataFrame
import org.apache.spark.rdd.RDD
val seq: Seq[(String, Int)] = Seq(("Bob", 14), ("Alice", 18))
val rdd: RDD[(String, Int)] = sc.parallelize(seq)
val schema:StructType = StructType( Array(StructField("name", StringType),StructField("age", IntegerType)))
val rowRDD: RDD[Row] = rdd.map(fileds => Row(fileds._1, fileds._2))
val dataFrame: DataFrame = spark.createDataFrame(rowRDD,schema)
finally I ran
dataFrame.show
and I got this
22/06/22 11:18:19 ERROR util.Utils: Aborting task=======> (10 2) / 12]
java.io.IOException: Failed to connect to /192.168.1.3:50561
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
at org.apache.spark.rpc.netty.NettyRpcEnv.downloadClient(NettyRpcEnv.scala:399)
at org.apache.spark.rpc.netty.NettyRpcEnv.$anonfun$openChannel$4(NettyRpcEnv.scala:367)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1496)
at org.apache.spark.rpc.netty.NettyRpcEnv.openChannel(NettyRpcEnv.scala:366)
at org.apache.spark.repl.ExecutorClassLoader.getClassFileInputStreamFromSparkRPC(ExecutorClassLoader.scala:135)
at org.apache.spark.repl.ExecutorClassLoader.$anonfun$fetchFn$1(ExecutorClassLoader.scala:66)
at org.apache.spark.repl.ExecutorClassLoader.findClassLocally(ExecutorClassLoader.scala:176)
at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:113)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:405)
CodePudding user response:
I will explain to you few things why the first step didn't throw the error and after the show that actually show the error.
Spark works with two different stages, transformations
and executions
.
The first part of your code are all transformations, you are telling s your Spark Driver to analyse the transformation and make sure your transformations will work. But will not execute anything until you call an execution step.
What is an execution step? Is anything that will throw an output, like a show()
that will show the data in the screen, or a write()
that will write output files.
Your problem was, once you do the transformation that runs in your local driver, no need to send to any cluster. Once you do an execution step it should send the commands to your spark cluster. Because you are running locally, there is no cluster to send. So to avoid this problem you should add the env variable:
$ export SPARK_LOCAL_IP="127.0.0.1"
$ spark-shell
That you will tell to your application that you are running locally and you don't have any cluster to send the transformation.