I have variable, for example:
val loadingDate: = LocalDateTime.of(2020, 1, 2, 0, 0, 0)
I need to add an extra column by using the value of this variable.
When I try to do this:
val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
DF.withColumn("dttm", expr(s"$loadingDate.format(formatter)}").cast("timestamp"))
I get error like that:
Exception in thread "main" java.lang.reflect.InvocationTargetException
Caused by: org.apache.spark.sql.catalyst.parser.ParseException
mismutched input '00' expecting <EOF>(line 1, pos 11)
==SQL==
2020-01-02 00:00:00
-------------^^^
Can I use variables of type LocalDateTime for adding extra columns in Spark? Or do I have to use other types?
I need to get a date from an external system and use this date in Spark. How can I do this the best way? Which types to use?
CodePudding user response:
You can use your parsed string val dateString = s"$loadingDate.format(formatter)"
and convert it into Spark DateType
using to_date()
function, first of all you have to convert String into literal(or in other words, represent your string as a column), to do so use lit(dateString)
.
val date: LocalDateTime = LocalDateTime.of(2020, 1, 2, 0,0, 0)
val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
val formattedDate = date.format(formatter).
val dfWithYourDate = df.withColumn("your_date", to_date(lit(formattedDate), "yyyy-MM-dd HH:mm:ss"))
If you need TimestampType
instead of to_date()
use function to_timestamp()