I am trying to read csv file which is stored in GCS using spark, I have a simple spark java project which does nothing but reading a csv. the following code are used in it.
SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("Hello world");
SparkSession sparkSession = SparkSession.builder().config(conf).getOrCreate();
Dataset<Row> dataset = sparkSession.read().option("header", true).option("sep", "" ",").option("delimiter", "\"").csv("gs://abc/WDC_age.csv");
but it throws an error which says:
Exception in thread "main" java.io.IOException: No FileSystem for scheme: gs
can anyone help me in this? I just want to read csv from GCS using spark.
Thanks In Advance :)
CodePudding user response:
No FileSystem for scheme: gs
indicates Spark couldn't find the GCS connector. I guess you are not running in a Dataproc cluster, you might need to install the connector by yourself
https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage
CodePudding user response:
In my case, i just added the following dependency on my pom.xml file:
<dependency>
<groupId>com.google.cloud.bigdataoss</groupId>
<artifactId>gcs-connector</artifactId>
<version>hadoop3-2.2.4</version>
</dependency>
and it work for me.