Home > other >  How to read csv file from GCS using spark-java?
How to read csv file from GCS using spark-java?

Time:12-27

I am trying to read csv file which is stored in GCS using spark, I have a simple spark java project which does nothing but reading a csv. the following code are used in it.

SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("Hello world");
    SparkSession sparkSession = SparkSession.builder().config(conf).getOrCreate();

    Dataset<Row> dataset = sparkSession.read().option("header", true).option("sep", ""   ",").option("delimiter", "\"").csv("gs://abc/WDC_age.csv");

but it throws an error which says:

Exception in thread "main" java.io.IOException: No FileSystem for scheme: gs

can anyone help me in this? I just want to read csv from GCS using spark.

Thanks In Advance :)

CodePudding user response:

No FileSystem for scheme: gs indicates Spark couldn't find the GCS connector. I guess you are not running in a Dataproc cluster, you might need to install the connector by yourself https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage

CodePudding user response:

In my case, i just added the following dependency on my pom.xml file:

<dependency>
        <groupId>com.google.cloud.bigdataoss</groupId>
        <artifactId>gcs-connector</artifactId>
        <version>hadoop3-2.2.4</version>
    </dependency>

and it work for me.

  • Related