Unknown date data type (spark,parquet) [13 character long]-CodePudding

I have a parquet file with the date column filled with a data type I am having trouble with

I understand that Hive and Impala tend to rebase their time stamp...However, I cannot seem to convert or find any pointers on how to solve this.

I have tried setting int96RebaseModeInRead and datetimeRebaseModeInRead mode to legacy

I also tried to apply a date schema onto the read operation but to no avail.

This is with schema applied

These are the documentations I've reviewed so far. Maybe there's a simple solution I am not seeing. Let's also assume that there's no way for me to ask the person who created the source file what the heck they did.

https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option

https://kontext.tech/article/1062/spark-2x-to-3x-date-timestamp-and-int96-rebase-modes

https://docs.cloudera.com/runtime/7.2.1/developing-spark-applications/topics/spark-timestamp-compatibility-parquet.html

Also, this thread is the only one I was able to find that shows how the timestamp is created but not how to reverse it. Please give me some pointers. parquet int96 timestamp conversion to datetime/date via python

CodePudding user response：

As I understand you try to cast order_date column to dateType. If thats the case following code could help. You can read order_date column as stringType from source file and you should use your own timezone for from_utc_timestamp method.

from pyspark.sql.functions import from_utc_timestamp
from pyspark.sql.types import StringType

d = ['1374710400000']
df = spark.createDataFrame(d, StringType())
df.show()


df = df.withColumn('new_date',from_utc_timestamp(from_unixtime(df.value/1000,"yyyy-MM-dd hh:mm:ss"),'GMT 1')).show()

Output:

 ------------- 
|        value|
 ------------- 
|1374710400000|
 ------------- 

 ------------- ------------------- 
|        value|         new_date|
 ------------- ------------------- 
|1374710400000|2013-07-25 13:00:00|
 ------------- -------------------