Home > Blockchain >  Regex pattern for matching the year after some character
Regex pattern for matching the year after some character

Time:10-07

I need to group the year after † char.

The pattern I found useful: †\D*(?<death_year>\d )*

(† 1656), (ur. 1520, † ok. 1585) - it works fine

but in this case:

(† 21 VII 1595)

I need to match only the last group of digits

I can rely on ) but those cases are possible: († 1656 ), († 1656? ), († 1656?)

Can you help me figure it out?

CodePudding user response:

You can grab the first 4-digit chunk after than char:

†.*?(?<death_year>\d{4})

See this regex demo. Here, is matched first, then .*? consumes zero or more chars other than line break chars as few as possible, and then (?<death_year>\d{4}) captures four digits.

Although in this case it seems redundant, a check for "exactly four digits" might make it safer:

†.*?\b(?<death_year>\d{4})\b
†.*?(?<!\d)(?<death_year>\d{4})(?!\d)(?:.*?\D)??(?<death_year>\d{4})(?!\d)

See the regex demo. Here, \b fails the match if the four digits have a letter/digit/underscore on either side, (?<!\d)/(?!\d) does the same if there is a digit on either side, and the third option does the same as the second except without a lookbehind ((?:.*?\D)?? matches an optional occurrence of any 0 non-linebreak chars as few as possible, and then a non-digit char lazily, so that 4 digits could be captured right after ).

  • Related