If you run any kind of a query within Spark SQL, it will convert dates to future dates. Basically, anything that you would logically think is going to be based in the past gets converted into a future date.
For instance, if you do a simple statement like the following:
select to_date('1/1/94', 'm/d/yy')
You will get: 2094-01-01
Is there an easy way, aside from post processing (subtracting 100 years from anything in the future) to logically handle this?
CodePudding user response:
First of all, m
means minutes and not month. You need to use M
for months. Second of all, 94
is interpreted as 2094
because spark 3.x uses a DateTimeFormatter and this is the default behavior of this class. Spark 2.X however used to use a SimpleDateFormat that interpreted 94 as 1994. If you want to use the legacy time formatting, you can set spark.sql.legacy.timeParserPolicy
to LEGACY
:
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
spark.sql("select to_date('2/10/94', 'M/d/yy') as date").show
----------
| date|
----------
|1994-02-10|
----------