How would I convert a string with a two digit date to the correct date with Apache Spark?-CodePudding

If you run any kind of a query within Spark SQL, it will convert dates to future dates. Basically, anything that you would logically think is going to be based in the past gets converted into a future date.

For instance, if you do a simple statement like the following:

select to_date('1/1/94', 'm/d/yy')

You will get: 2094-01-01

Is there an easy way, aside from post processing (subtracting 100 years from anything in the future) to logically handle this?

CodePudding user response：

First of all, m means minutes and not month. You need to use M for months. Second of all, 94 is interpreted as 2094 because spark 3.x uses a DateTimeFormatter and this is the default behavior of this class. Spark 2.X however used to use a SimpleDateFormat that interpreted 94 as 1994. If you want to use the legacy time formatting, you can set spark.sql.legacy.timeParserPolicy to LEGACY:

spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
spark.sql("select to_date('2/10/94', 'M/d/yy') as date").show

 ---------- 
|      date|
 ---------- 
|1994-02-10|
 ----------