What possibilities are there to parse an ISO string of a date in Javascript, if the date can also be B.C.?
I have dates from Wikidata that look like this, for example (but also other ISO-formats are possible) : -0709-01-01T00:00:00Z
This is too far back for Date.parse, and even libraries like Luxon or Moment can't do anything with it here.
I would like to be able to query year, month, day, hour and minute in a desired time zone.
I could now do this by creating a regular expression and parse the string with it, but with the many possible variations I see the danger of not including everything.
I also haven't found a regex on the internet that can handle everything.
Does anyone know a library or another solution that I could use here?
CodePudding user response:
"-0709-01-01T00:00:00Z" is not consistent with the format defined in ECMA-262 so parsing is implementation dependent.
Per ECMA-262, where the year has a sign (i.e. preceded by or -) it must use the expanded year format and have 6 digits, e.g.
new Date('-000709-01-01T00:00:00Z')
works in Safari, Firefox and Chrome at least.
So you can parse the string yourself or expand the year to 6 digits. The latter requires parsing and reformatting, so you might as well just parse it and avoid the built–in parser by going directly to the constructor, e.g.
function parseUTC(d) {
let [Y, M, D, H, m, s] = d.match(/\d /g);
let sign = /^-/.test(d)? -1 : 1;
return new Date(Date.UTC(sign*Y, M-1, D, H, m, s));
}
let d = '-0709-01-01T00:00:00Z';
// -000709-01-01T00:00:00.000Z
console.log(parseUTC(d).toISOString());
// Alternatively…
console.log(new Date(d.replace(/^-/,'-00')).toISOString());
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
While the second method "works", I think the first method is more robust as it's not dependent on the number of digits in the year, whereas the second assumes it's 4 digits and doesn't handle say " 2020-10-23T00:00:00Z".