Home > Enterprise >  Spark SQL change date format
Spark SQL change date format

Time:04-15

I'm trying to change the date format of my data from (11 20, 2014) to 2014-11-20. I tried this:

df.withColumn("newDate", to_date(col("reviewTime"),("mm dd, yyyy")))

Because the days with single digits appear as 1,2,8 instead of 01,02,08 I got this message:

SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '09 1, 2014' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string. Caused by: DateTimeParseException: Text '09 1, 2014' could not be parsed at index 3

Is there a way to fix this?

Thanks!

CodePudding user response:

Some of your data date rows were written in an old spark version. You should add your spark configuration.

spark.sql.parquet.int96RebaseModeInRead = "LEGACY"

or

spark.sql.parquet.int96RebaseModeInRead = "CORECTED"

according to your requirements, they explain to you the differences between those two options in your error.

CodePudding user response:

You can use format (M d, yyyy) to deal with it

Example (scala spark):

Seq(
    "(11 20, 2014)",
    "(1 3, 2013)",
    "(2 20, 2012)",
    "(4 22, 2014)"
).toDF("ugly_date")
        .withColumn("date", to_date($"ugly_date", "(M d, yyyy)"))
        .show(false)

Output:
 ------------- ---------- 
|ugly_date    |date      |
 ------------- ---------- 
|(11 20, 2014)|2014-11-20|
|(1 3, 2013)  |2013-01-03|
|(2 20, 2012) |2012-02-20|
|(4 22, 2014) |2014-04-22|
 ------------- ---------- 

For more information about Datetime Patterns see https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

  • Related