Home > Blockchain >  SimpleDateFormat succeeds or throws depending on year number
SimpleDateFormat succeeds or throws depending on year number

Time:01-10

I'm working with legacy code that tries to parse dates that include an optional time component that's usually padded with zeroes, using the a format string ddMMyy that doesn't really match the input. In the spirit of true legacy code, nobody ever bothered to clean it up because it accidentally does what it's supposed to do. Except, in 2023, it no longer does.

Here's a (drastically simplified) version of the code:

import java.text.ParseException;
import java.text.SimpleDateFormat;

public class WeirdDateFormat {
    public static void main(String...args) throws ParseException {
        var df = new SimpleDateFormat("ddMMyy");
        df.setLenient(false);

        System.out.println(df.parse("09012023000000000"));
        System.out.println(df.parse("09012022000000000"));
    }
}

It prints:

Mon Jan 09 00:00:00 CET 70403584
Exception in thread "main" java.text.ParseException: Unparseable date: "09012022000000000"
    at java.base/java.text.DateFormat.parse(DateFormat.java:399)
    at WeirdDateFormat.main(WeirdDateFormat.java:10)

In other words, the first date (9 January 2023) parses fine, but gives a date with the year 70403584. The second date (9 January 2022) fails to parse and throws an exception.

(If we set lenient to true, the second date doesn't throw but ends up in the year 239492593.)

WTF is happening here? Why does it sometimes fail to parse, and sometimes not? And where do these bizarre year numbers come from?

(Found the issue in production running Java 8, but the behaviour on Java 17 is the same.)

EDIT Yes, I know, I know, I must fix the legacy code. No need to keep telling me, you're preaching to the choir! Unfortunately, my drastically simplified version of the code doesn't show you all the other legacy defects of the code base that also have to be addressed. I just want to understand what's going on here so I'll be better informed when I actually do refactor this code.

CodePudding user response:

OBVIOUS ANSWER: Get rid of this obsolete crud that never worked properly and do it right.

Let's first explain this result

I'm not sure why, but you wanted to know why this happens. I dived into the source of SimpleDateFormat for you.

Given that the yy part is the last, it takes all remaining digits. Thus, the "2022000000000" part of the string is parsed into a Long of value 2022000000000. This is then immediately converted to an int, and that's quite problematic; 2022000000000 overflows and turns into int value -929596416. A standard java.text.CalendarBuilder instance is then told to set its YEAR field to that value (-929596416). Which is fine.

When parsing is done, that builder is asked to produce a GregorianCalendar value. This doesn't work - the GregorianCalendar accepts -929596416 as YEAR value just fine, but SimpleDateFormat then asks this GregCal instance to calculate the time in millis since the epoch, and that fails; an exception throws an exception indicating this. This exception is caught by the SimpleDateFormat code and results in the Unparseable date exception that you are getting.

With 2023, you get the same effect: That is turned into an int without checking if it overflows; that overflows just the same, and results in int value 70403584. GregorianCalendar DOES accept this year. This then results in what you saw: Year 70403584 - which is explained as follows:

long y = 2023000000000L;
int i = (int) y;
System.out.println(i); // prints 70403584

A deeper dive then is: Why is 70403584 fine, and -929596416 isn't?

Mostly, 'because'. The GregCal internal methods getMinimum(field) and getMaximum(field), when passing the YEAR field (constant value 1) are respectively 1 and 292278994. That means 70403584 is accepted, and -929596416 is not. You told it to be non-lenient. "Lenient" here (the old j.u.Calendar stuff) is mostly a silly concept (trying to define what is acceptable in non-lenient mode is virtually impossible. Various utterly ridiculous dates nevertheless are acceptable even in non-lenient mode).

We can verify this:

GregorianCalendar cal = new GregorianCalendar();
cal.setLenient(false);
cal.set(Calendar.YEAR, -5);
System.out.println(cal.getTime());

gives you:

Exception in thread "main" java.lang.IllegalArgumentException: YEAR
    at java.base/java.util.GregorianCalendar.computeTime(GregorianCalendar.java:2609)
    at java.base/java.util.Calendar.updateTime(Calendar.java:3411)
    at java.base/java.util.Calendar.getTimeInMillis(Calendar.java:1805)
    at java.base/java.util.Calendar.getTime(Calendar.java:1776)

THE EXECUTIVE CONCLUSION: If you were expecting lenient mode to reject these patterns, I have some nasty news for you: non-lenient mode does not work and never did and you should not be relying on it. Specifically here, overflows are not checked (you'd think that in non-lenient mode, any overflow of any value means the value is rejected, but, alas), and 2023000000000 so happens to overflow into a ridiculous but nevertheless, acceptable (even in non-lenient) year, whereas 2022000000000 does not.

So how do you fix this?

You can't. SimpleDateFormat and GregorianCalendar are horrible API and broken implementations. The only fix is to ditch it. Use java.time. Make a new formatter using java.time.DateTimeFormatter, parse this value into a LocalDate, and go from there. You'll solve a whole host of timezone related craziness on the fly, too! (Because java.util.Date is lying and doesn't represent dates. It represents instants, hence why .getYear() and company are deprecated, because you can't ask an instant for a year without a timezone, and Date doesn't have one. Calendar is intricately interwoven with it all - hence, storing dates on one timezone and reading them on another causes wonkiness. LocalDate avoids all that).

EDIT: As a fellow dutchie, note that the most recent JDKs break the Europe/Amsterdam timezone (grumble grumble OpenJDK team doesn't understand what damage they are causing) - which means any conversion between epoch-millis and base dates is extra problematic for software running in dutch locales. For example, if you are storing birthdates and you dip through conversion like this, everybody born before 1940 will break and their birthday will shift by a day. LocalDate avoids this by never storing anything as epoch-millis in the first place.

CodePudding user response:

The reason is integer overflow. But seriously, fix the legacy code.

jshell> (int)9012023000000000L
$1 ==> 496985600

jshell> (int)9012022000000000L
$2 ==> -503014400

A negative year is out of range for the year component. A crazy large year is still considered valid.

Set a breakpoint on line 1543 of SimpleDateFormat.java. The exception itself is thrown in Line 2583 of GregorianCalendar.java:

for (int field = 0; field < FIELD_COUNT; field  ) {
    int value = internalGet(field);
    if (isExternallySet(field)) {
        // Quick validation for any out of range values
        if (value < getMinimum(field) || value > getMaximum(field)) {
            throw new IllegalArgumentException(getFieldName(field));
        }
    }
    originalFields[field] = value;
}

The actual "year" value that ends up in this method is -929596416 ((int)2022000000000L). Year 2023 does not throw because the long value coerced into an integer ends up being 70403584 – which is allowed. The GregorianCalendar rejects values smaller than 1 (< 1) and greater than 292278994 (> 292278994).

  • Related