Spark timestamp format with timezone issue-CodePudding

I have code:

timestampFormat="yyy-MM-dd'T'HH:mm:ssXXX"

timestampFormat="yyy-MM-dd'T'HH:mm:ssZZZZZ"
Dataset<Row> inputDataFrame = spark.read()
            .format("CSV")
            .option("timestampFormat", timestampFormat)
            .load(path/file);

The value 2022-04-05T08:19:00 00:00 is loaded into the hive table as 05.04.2022 10:19:00. There is 2 hours difference. It should be 05.04.2022 08:19:00. Can someone tell me what kind of format should I use?

CodePudding user response：

You can set spark sql session timezone like below & rerun the job.

--conf "spark.sql.session.timeZone=UTC" // Change it your timezone.

spark.conf.set("spark.sql.session.timeZone", "UTC") // Change it your timezone.