Home > Back-end >  Data Source "kafka" fails with PySpark
Data Source "kafka" fails with PySpark

Time:09-10

When I try integrate Kafka with spark it fails.

Here is my code:

from pyspark.sql import SparkSession
from pyspark.streaming import StreamingContext
import findspark
import kafka

import json

if __name__ == "__main__":

    findspark.init("D:\\spark-3.0.1-bin-hadoop2.7\\")

    spark = (
        SparkSession.builder.appName("Kafka Pyspark Streaming Learning").config(
            "spark.jars.packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0").master("local[*]").getOrCreate()
    )

    ssc = StreamingContext(spark, 10)

    df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "localhost:9092").option(
        "subscribe", "registered_user").option("startingOffsets", "earliest").load()

    print(df.show())

    ssc.start()
    ssc.awaitTermination()

Here is the error traceback:

Ivy Default Cache set to: C:\Users\BERNARD JOSHUA\.ivy2\cache
The jars for the packages stored in: C:\Users\BERNARD JOSHUA\.ivy2\jars
:: loading settings :: url = jar:file:/D:/spark-3.0.1-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-37ff0b88-bb9a-4566-ac9e-d00b45eb6881;1.0
        confs: [default]
        found org.apache.spark#spark-sql-kafka-0-10_2.12;3.3.0 in central
        found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.3.0 in central
        found org.apache.kafka#kafka-clients;2.8.1 in central
        found org.lz4#lz4-java;1.8.0 in central
        found org.xerial.snappy#snappy-java;1.1.8.4 in central
        found org.slf4j#slf4j-api;1.7.32 in central
        found org.apache.hadoop#hadoop-client-runtime;3.3.2 in central
        found org.spark-project.spark#unused;1.0.0 in central
        found org.apache.hadoop#hadoop-client-api;3.3.2 in central
        found commons-logging#commons-logging;1.1.3 in central
        found com.google.code.findbugs#jsr305;3.0.0 in central
        found org.apache.commons#commons-pool2;2.11.1 in central
:: resolution report :: resolve 653ms :: artifacts dl 22ms
        :: modules in use:
        com.google.code.findbugs#jsr305;3.0.0 from central in [default]
        commons-logging#commons-logging;1.1.3 from central in [default]
        org.apache.commons#commons-pool2;2.11.1 from central in [default]
        org.apache.hadoop#hadoop-client-api;3.3.2 from central in [default]
        org.apache.hadoop#hadoop-client-runtime;3.3.2 from central in [default]
        org.apache.kafka#kafka-clients;2.8.1 from central in [default]
        org.apache.spark#spark-sql-kafka-0-10_2.12;3.3.0 from central in [default]
        org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.3.0 from central in [default]
        org.lz4#lz4-java;1.8.0 from central in [default]
        org.slf4j#slf4j-api;1.7.32 from central in [default]
        org.spark-project.spark#unused;1.0.0 from central in [default]
        org.xerial.snappy#snappy-java;1.1.8.4 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   12  |   0   |   0   |   0   ||   12  |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-37ff0b88-bb9a-4566-ac9e-d00b45eb6881
        confs: [default]
        0 artifacts copied, 12 already retrieved (0kB/19ms)
22/09/10 00:17:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/09/10 00:17:02 ERROR SparkContext: Failed to add file:///C:/Users/BERNARD JOSHUA/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.12-3.3.0.jar to Spark environment
java.io.FileNotFoundException: Jar C:\Users\BERNARD JOSHUA\.ivy2\jars\org.apache.spark_spark-sql-kafka-0-10_2.12-3.3.0.jar not found     
        at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1833)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1887)
        at org.apache.spark.SparkContext.$anonfun$new$11(SparkContext.scala:490)
        at org.apache.spark.SparkContext.$anonfun$new$11$adapted(SparkContext.scala:490)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:490)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
22/09/10 00:17:02 ERROR SparkContext: Failed to add file:///C:/Users/BERNARD JOSHUA/.ivy2/jars/org.apache.spark_spark-token-provider-kafka-0-10_2.12-3.3.0.jar to Spark environment
java.io.FileNotFoundException: Jar C:\Users\BERNARD JOSHUA\.ivy2\jars\org.apache.spark_spark-token-provider-kafka-0-10_2.12-3.3.0.jar not found
        at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1833)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1887)
        at org.apache.spark.SparkContext.$anonfun$new$11(SparkContext.scala:490)
        at org.apache.spark.SparkContext.$anonfun$new$11$adapted(SparkContext.scala:490)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:490)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
22/09/10 00:17:02 ERROR SparkContext: Failed to add file:///C:/Users/BERNARD JOSHUA/.ivy2/jars/org.apache.kafka_kafka-clients-2.8.1.jar to Spark environment
java.io.FileNotFoundException: Jar C:\Users\BERNARD JOSHUA\.ivy2\jars\org.apache.kafka_kafka-clients-2.8.1.jar not found
        at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1833)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1887)
        at org.apache.spark.SparkContext.$anonfun$new$11(SparkContext.scala:490)
        at org.apache.spark.SparkContext.$anonfun$new$11$adapted(SparkContext.scala:490)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:490)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
22/09/10 00:17:02 ERROR SparkContext: Failed to add file:///C:/Users/BERNARD JOSHUA/.ivy2/jars/com.google.code.findbugs_jsr305-3.0.0.jar to Spark environment
java.io.FileNotFoundException: Jar C:\Users\BERNARD JOSHUA\.ivy2\jars\com.google.code.findbugs_jsr305-3.0.0.jar not found
        at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1833)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1887)
        at org.apache.spark.SparkContext.$anonfun$new$11(SparkContext.scala:490)
        at org.apache.spark.SparkContext.$anonfun$new$11$adapted(SparkContext.scala:490)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:490)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
22/09/10 00:17:02 ERROR SparkContext: Failed to add file:///C:/Users/BERNARD JOSHUA/.ivy2/jars/org.apache.commons_commons-pool2-2.11.1.jar to Spark environment
java.io.FileNotFoundException: Jar C:\Users\BERNARD JOSHUA\.ivy2\jars\org.apache.commons_commons-pool2-2.11.1.jar not found
        at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1833)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1887)
        at org.apache.spark.SparkContext.$anonfun$new$11(SparkContext.scala:490)
        at org.apache.spark.SparkContext.$anonfun$new$11$adapted(SparkContext.scala:490)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:490)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
22/09/10 00:17:02 ERROR SparkContext: Failed to add file:///C:/Users/BERNARD JOSHUA/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar to Spark environment
java.io.FileNotFoundException: Jar C:\Users\BERNARD JOSHUA\.ivy2\jars\org.spark-project.spark_unused-1.0.0.jar not found
        at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1833)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1887)
        at org.apache.spark.SparkContext.$anonfun$new$11(SparkContext.scala:490)
        at org.apache.spark.SparkContext.$anonfun$new$11$adapted(SparkContext.scala:490)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:490)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
22/09/10 00:17:02 ERROR SparkContext: Failed to add file:///C:/Users/BERNARD JOSHUA/.ivy2/jars/org.apache.hadoop_hadoop-client-runtime-3.3.2.jar to Spark environment
java.io.FileNotFoundException: Jar C:\Users\BERNARD JOSHUA\.ivy2\jars\org.apache.hadoop_hadoop-client-runtime-3.3.2.jar not found        
        at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1833)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1887)
        at org.apache.spark.SparkContext.$anonfun$new$11(SparkContext.scala:490)
        at org.apache.spark.SparkContext.$anonfun$new$11$adapted(SparkContext.scala:490)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:490)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
22/09/10 00:17:02 ERROR SparkContext: Failed to add file:///C:/Users/BERNARD JOSHUA/.ivy2/jars/org.lz4_lz4-java-1.8.0.jar to Spark environment
java.io.FileNotFoundException: Jar C:\Users\BERNARD JOSHUA\.ivy2\jars\org.lz4_lz4-java-1.8.0.jar not found
        at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1833)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1887)
        at org.apache.spark.SparkContext.$anonfun$new$11(SparkContext.scala:490)
        at org.apache.spark.SparkContext.$anonfun$new$11$adapted(SparkContext.scala:490)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:490)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
22/09/10 00:17:02 ERROR SparkContext: Failed to add file:///C:/Users/BERNARD JOSHUA/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.8.4.jar to Spark environment
java.io.FileNotFoundException: Jar C:\Users\BERNARD JOSHUA\.ivy2\jars\org.xerial.snappy_snappy-java-1.1.8.4.jar not found
        at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1833)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1887)
        at org.apache.spark.SparkContext.$anonfun$new$11(SparkContext.scala:490)
        at org.apache.spark.SparkContext.$anonfun$new$11$adapted(SparkContext.scala:490)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:490)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
22/09/10 00:17:02 ERROR SparkContext: Failed to add file:///C:/Users/BERNARD JOSHUA/.ivy2/jars/org.slf4j_slf4j-api-1.7.32.jar to Spark environment
java.io.FileNotFoundException: Jar C:\Users\BERNARD JOSHUA\.ivy2\jars\org.slf4j_slf4j-api-1.7.32.jar not found
        at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1833)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1887)
        at org.apache.spark.SparkContext.$anonfun$new$11(SparkContext.scala:490)
        at org.apache.spark.SparkContext.$anonfun$new$11$adapted(SparkContext.scala:490)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:490)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
22/09/10 00:17:02 ERROR SparkContext: Failed to add file:///C:/Users/BERNARD JOSHUA/.ivy2/jars/org.apache.hadoop_hadoop-client-api-3.3.2.jar to Spark environment
java.io.FileNotFoundException: Jar C:\Users\BERNARD JOSHUA\.ivy2\jars\org.apache.hadoop_hadoop-client-api-3.3.2.jar not found
        at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1833)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1887)
        at org.apache.spark.SparkContext.$anonfun$new$11(SparkContext.scala:490)
        at org.apache.spark.SparkContext.$anonfun$new$11$adapted(SparkContext.scala:490)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:490)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
22/09/10 00:17:02 ERROR SparkContext: Failed to add file:///C:/Users/BERNARD JOSHUA/.ivy2/jars/commons-logging_commons-logging-1.1.3.jar to Spark environment
java.io.FileNotFoundException: Jar C:\Users\BERNARD JOSHUA\.ivy2\jars\commons-logging_commons-logging-1.1.3.jar not found
        at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1833)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1887)
        at org.apache.spark.SparkContext.$anonfun$new$11(SparkContext.scala:490)
        at org.apache.spark.SparkContext.$anonfun$new$11$adapted(SparkContext.scala:490)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:490)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
22/09/10 00:17:02 ERROR SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File file:/C:/Users/BERNARD JOSHUA/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.12-3.3.0.jar does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:428)
        at org.apache.spark.SparkContext.addFile(SparkContext.scala:1534)
        at org.apache.spark.SparkContext.addFile(SparkContext.scala:1498)
        at org.apache.spark.SparkContext.$anonfun$new$12(SparkContext.scala:494)
        at org.apache.spark.SparkContext.$anonfun$new$12$adapted(SparkContext.scala:494)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:494)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
22/09/10 00:17:02 WARN MetricsSystem: Stopping a MetricsSystem that is not running

I also have all the dependencies installed in my .ivy2 file: enter image description here

CodePudding user response:

Firstly, Spark might have issues with you having spaces in your Windows user name... The error includes saying path C:\Users\BERNARD JOSHUA does not exist, which is "true" because of the space being URI encoded.

But, you're running spark-3.0.1, so you should be using 3.0.1 of spark-sql-kafka-0-10 rather than 3.3.0.

Alternatively, upgrade Spark.

See example I wrote here - https://github.com/OneCricketeer/docker-stacks/blob/master/hadoop-spark/spark-notebooks/kafka-sql.ipynb

  • Related