Home > Software design >  Hive date column shows null while querying the data
Hive date column shows null while querying the data

Time:12-17

Loaded data of spark data frame in hive table. Before loading df.show(10) shows date column in proper format and data, but while querying hive table date column shows null.

//Spark Code
df.show(10)
 ---------- 
|    bus_dt|
 ---------- 
|2021-12-01|
 ---------- 

//Hive shell
select distinct bus_dt from stg.dateTable;
NULL

//Table
create table stg.dateTable (
`bus_dt` date comment 'from deserializer')
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '/u0003'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
TBLPROPERTIES (
'serialization.null.format'=''
);

CodePudding user response:

Can you try changing the DDL to use serde rather than TxtInputFormat.

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES ('field.delim'='/u0003')
TBLPROPERTIES('serialization.null.format'='');

The issue seems to be with storage and retrival at hive side rather than at spark processing.

  • Related