Loaded data of spark data frame in hive table. Before loading df.show(10) shows date column in proper format and data, but while querying hive table date column shows null.
//Spark Code
df.show(10)
----------
| bus_dt|
----------
|2021-12-01|
----------
//Hive shell
select distinct bus_dt from stg.dateTable;
NULL
//Table
create table stg.dateTable (
`bus_dt` date comment 'from deserializer')
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '/u0003'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
TBLPROPERTIES (
'serialization.null.format'=''
);
CodePudding user response:
Can you try changing the DDL to use serde rather than TxtInputFormat.
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe'
WITH SERDEPROPERTIES ('field.delim'='/u0003')
TBLPROPERTIES('serialization.null.format'='');
The issue seems to be with storage and retrival at hive side rather than at spark processing.