I am trying to convert a string column to date using to_date. Everything is working fine, however, my requirement is to fail the spark job if there is any bad data, that is, any malformed input for date. Currently, to_date is returning null, but not falling. How to make sure that job will be failed in such scenario?
CodePudding user response:
The behavior of the to_date
function is dependent on the spark.sql.ansi.enabled
Spark option.
When it is disabled (the default), Spark uses a Hive compliant dialect and returns null results instead of failing.
Conversely, if enabled, Spark will be ANSI compliant and will fail if the input is malformed as stated here.
That said, you may not want to enable spark.sql.ansi.enabled
because it has many other effects, see here.
An alternative solution is to use an UDF instead of the to_date
function to perform the date parsing, and throw an exception if the parse fails.