So I managed to combine a year, month, and day column into 1 and then combining that with a time column. When I try to convert it to a timestamp, I'm getting the wrong one. Here is my code that I tried.
val df2 = df.withColumn("full_date", concat_ws("/", $"Month", $"Day", $"Year"))
// // df2.show()
val df3 = df2.withColumn("date_time", concat_ws(" ", $"full_date", $"TimeCST"))
// // df3.show()
val stamp = df3.withColumn("timestamp", unix_timestamp($"date_time", "M/d/yyyy h:mm a"))
stamp.show()
I'm getting 94668798 but it should be 946752780. An example date I'm trying to convert is this: 1/1/2000 12:53 AM
CodePudding user response:
From the documentation of unix_timestamp
:
Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale.
In other words, since your timestamp format does not include the timezone, Spark treats the time as being in the timezone configured on the machine running Spark. If you want the value in a different timezone, set spark.sql.session.timeZone
to the correct name first:
spark.conf.set("spark.sql.session.timeZone", "Europe/Berlin")