Home > Blockchain >  Modify Regex To Match String with Multiple Full Month Names And End on Hyphen
Modify Regex To Match String with Multiple Full Month Names And End on Hyphen

Time:01-20

I am using a Regex to pull dates out of a series of strings. The format varies slightly, but it always contains the full month. The strings usually contain two dates to represent a range like so:

February 1, 2020 - March 18, 2020

or

February 1st 2020 - March 18th 2020

And this is working great until I come across dates like:

June 1 - July 22, 2018

where a year is not presented in the "starting" part of the range because it is the same as the "ending" year.

Below is the Regex I crudely copied and applied to my code. It is Javascript but I really think this is more of a Regex question...

const regex = /((\b\d{1,2}\D{0,3})?\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)\D?)(\d{1,2}(st|nd|rd|th)?)?((\s*[,.\-\/]\s*)\D?)?\s*((19[0-9]\d|20\d{2})|\d{2})*/gm;

var myDateString1 = "January 8, 2020 - January 27, 2020"; // THIS WORKS GREAT!
var myDateString2 = "January 8 - January 27, 2020"; // THIS DOES NOT WORK GREAT!

var dates = myDateString1.match(regex);
// returns ["January 8, 2020","January 27, 2020"]

var dates2 = myDateString2.match(regex);
// returns ["January 8 - J"]

Is there a way I can modify this so if it is met with a hyphen it discontinues that given match? So myDateString2 would return ["January 8", "January 27, 2020"]?

The strings sometimes have words before or after, like

Presented from January 8, 2020 - January 27, 2020 at such and such place

so I don't think simply having a regex based on the hyphen before/after would work.

CodePudding user response:

You could use 2 capture groups and make the pattern more specific to match the format of the strings.

The /m flag can be omitted as there are no anchors in the pattern.

Note that the pattern matches a date like pattern, and does not validate the date itself.

\b((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\s*\d\d?(?:,\s \d{4})?)\s [,./-]\s \b((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\s*\d\d?,\s \d{4})\b

See a regex101 demo.

const regex = /\b((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\s*\d\d?(?:,\s \d{4})?)\s [,./-]\s \b((?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\s*\d\d?,\s \d{4})\b/g;
const str = `January 8, 2020 - January 27, 2020
January 8 - January 27, 2020
Presented from January 8, 2020 - January 27, 2020 at such and such place
June 1 - July 22, 2018`;

console.log(Array.from(str.matchAll(regex), m => [m[1], m[2]]))

CodePudding user response:

The capture groups were used for debug.
Simply taking out the hyphen in the class and making the year optional at the end with a single ? should get what you want.

/((\b\d{1,2}\D{0,3})?\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(Nov|Dec)(?:ember)?)\D?)(\d{1,2}(st|nd|rd|th)?)?((\s*[,.\/]\s*)\D?)?(\s*(19[0-9]\d|20\d{2})|\d{2})?/

https://regex101.com/r/bMMzcR/1

And replacing the capture groups with clusters (?: ) then giving it one more level of factoring will make it quicker.

(?:\b\d{1,2}\D{0,3})?\b(?:J(?:an(?:uary)?|u(?:ne?|ly?))|Feb(?:ruary)?|Ma(?:r(?:ch)?|y)|A(?:pr(?:il)?|ug(?:ust)?)|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\D?(?:\d{1,2}(?:st|[nr]d|th)?)?(?:\s*[,./]\s*\D?)?(?:\s*(?:19[0-9]\d|20\d{2})|\d{2})?

https://regex101.com/r/Fq0dy2/1

const regex = /(?:\b\d{1,2}\D{0,3})?\b(?:J(?:an(?:uary)?|u(?:ne?|ly?))|Feb(?:ruary)?|Ma(?:r(?:ch)?|y)|A(?:pr(?:il)?|ug(?:ust)?)|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?)\D?(?:\d{1,2}(?:st|[nr]d|th)?)?(?:\s*[,./]\s*\D?)?(?:\s*(?:19[0-9]\d|20\d{2})|\d{2})?/g;
var myDateString1 = "January 8, 2020 - January 27, 2020";
var myDateString2 = "January 8 - January 27, 2020";

console.log(myDateString1.match(regex));
console.log(myDateString2.match(regex));

  • Related