Home > other >  Issues with SparkSQL (Spark and Hive connectivity)
Issues with SparkSQL (Spark and Hive connectivity)

Time:11-19

I am trying to retrieve data from a database made in Hive into my Spark and even if there's data in the DB (I checked it with Hive) doing a query with Spark returns no rows (it returns the column information though).

I have copied the hive-site.xml file into the Spark configuration folder (was asked for).

IMPORTS

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import org.apache.spark.sql
import org.apache.spark.storage.StorageLevel
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.hive.HiveContext

Creating a Spark session:

val spark = SparkSession.builder().appName("Reto").config("spark.sql.warehouse.dir", "hive_warehouse_hdfs_path").enableHiveSupport().getOrCreate() 
    spark.sql("show databases").show()

Getting data:

spark.sql("USE retoiabd")
val churn = spark.sql("SELECT count(*) FROM churn").show()

Output:

count(1) = 0

CodePudding user response:

After checking it out with our teacher there was an issue with the creation of the tables themselves in Hive.

We created the table like this:

CREATE TABLE name (columns)

Instead of like this:

CREATE EXTERNAL TABLE name (columns)

  • Related