I am trying to retrieve data from a database made in Hive into my Spark and even if there's data in the DB (I checked it with Hive) doing a query with Spark returns no rows (it returns the column information though).
I have copied the hive-site.xml file into the Spark configuration folder (was asked for).
IMPORTS
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
import org.apache.spark.sql
import org.apache.spark.storage.StorageLevel
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.hive.HiveContext
Creating a Spark session:
val spark = SparkSession.builder().appName("Reto").config("spark.sql.warehouse.dir", "hive_warehouse_hdfs_path").enableHiveSupport().getOrCreate()
spark.sql("show databases").show()
Getting data:
spark.sql("USE retoiabd")
val churn = spark.sql("SELECT count(*) FROM churn").show()
Output:
count(1) = 0
CodePudding user response:
After checking it out with our teacher there was an issue with the creation of the tables themselves in Hive.
We created the table like this:
CREATE TABLE name (columns)
Instead of like this:
CREATE EXTERNAL TABLE name (columns)