I have code:
timestampFormat="yyy-MM-dd'T'HH:mm:ssXXX"
or
timestampFormat="yyy-MM-dd'T'HH:mm:ssZZZZZ"
Dataset<Row> inputDataFrame = spark.read()
.format("CSV")
.option("timestampFormat", timestampFormat)
.load(path/file);
The value 2022-04-05T08:19:00 00:00
is loaded into the hive table as 05.04.2022 10:19:00
.
There is 2 hours difference. It should be 05.04.2022 08:19:00
. Can someone tell me what kind of format should I use?
CodePudding user response:
You can set spark sql session timezone like below & rerun the job.
--conf "spark.sql.session.timeZone=UTC" // Change it your timezone.
or
spark.conf.set("spark.sql.session.timeZone", "UTC") // Change it your timezone.