Home > Mobile >  How to add extra date column in DataFrame by using Spark?
How to add extra date column in DataFrame by using Spark?

Time:11-17

I have variable, for example:

val loadingDate: = LocalDateTime.of(2020, 1, 2, 0, 0, 0)

I need to add an extra column by using the value of this variable.

When I try to do this:

val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")

DF.withColumn("dttm", expr(s"$loadingDate.format(formatter)}").cast("timestamp"))

I get error like that:

Exception in thread "main" java.lang.reflect.InvocationTargetException

Caused by: org.apache.spark.sql.catalyst.parser.ParseException

mismutched input '00' expecting <EOF>(line 1, pos 11)

==SQL==

2020-01-02 00:00:00

-------------^^^

Can I use variables of type LocalDateTime for adding extra columns in Spark? Or do I have to use other types?

I need to get a date from an external system and use this date in Spark. How can I do this the best way? Which types to use?

CodePudding user response:

You can use your parsed string val dateString = s"$loadingDate.format(formatter)" and convert it into Spark DateType using to_date() function, first of all you have to convert String into literal(or in other words, represent your string as a column), to do so use lit(dateString).

val date: LocalDateTime = LocalDateTime.of(2020, 1, 2, 0,0, 0)
val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
val formattedDate = date.format(formatter).

val dfWithYourDate = df.withColumn("your_date", to_date(lit(formattedDate), "yyyy-MM-dd HH:mm:ss"))

If you need TimestampType instead of to_date() use function to_timestamp()

  • Related