Home > Software engineering >  Parsing ISO-Date from Wikidata in JS
Parsing ISO-Date from Wikidata in JS

Time:10-26

What possibilities are there to parse an ISO string of a date in Javascript, if the date can also be B.C.?
I have dates from Wikidata that look like this, for example (but also other ISO-formats are possible) : -0709-01-01T00:00:00Z

This is too far back for Date.parse, and even libraries like Luxon or Moment can't do anything with it here.
I would like to be able to query year, month, day, hour and minute in a desired time zone.

I could now do this by creating a regular expression and parse the string with it, but with the many possible variations I see the danger of not including everything.
I also haven't found a regex on the internet that can handle everything.

Does anyone know a library or another solution that I could use here?

CodePudding user response:

"-0709-01-01T00:00:00Z" is not consistent with the format defined in ECMA-262 so parsing is implementation dependent.

Per ECMA-262, where the year has a sign (i.e. preceded by or -) it must use the expanded year format and have 6 digits, e.g.

new Date('-000709-01-01T00:00:00Z')

works in Safari, Firefox and Chrome at least.

So you can parse the string yourself or expand the year to 6 digits. The latter requires parsing and reformatting, so you might as well just parse it and avoid the built–in parser by going directly to the constructor, e.g.

function parseUTC(d) {
  let [Y, M, D, H, m, s] = d.match(/\d /g);
  let sign = /^-/.test(d)? -1 : 1;
  return new Date(Date.UTC(sign*Y, M-1, D, H, m, s));
}

let d = '-0709-01-01T00:00:00Z';

// -000709-01-01T00:00:00.000Z
console.log(parseUTC(d).toISOString());

// Alternatively…
console.log(new Date(d.replace(/^-/,'-00')).toISOString());
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

While the second method "works", I think the first method is more robust as it's not dependent on the number of digits in the year, whereas the second assumes it's 4 digits and doesn't handle say " 2020-10-23T00:00:00Z".

  • Related