Home > Software design >  Regex to match keys and array values inside a front matter
Regex to match keys and array values inside a front matter

Time:09-17

I'd like to match keys and some array values inside a front matter to convert them to tags in a translation memory. Basically, any matched key and value will be filtered out and appear as non-translatable tag.

The system supports Java regexp.

Here's the front matter:

The array values do not have a hyphen anymore due to some preprocessing.

---
title: This is a title
label:
one
two
three
ultra
description: "this is a description text"
other_key: value
---

note: this is a note outside the front matter
tip: this is a tip ...
one: this is a one

The problem:

  • There can be the same text outside the front matter. Currently, note, tip, and one (see above).
  • Labels or values can change in the future, changing the regex each time is not ideal. I have added ultra and other_key as an example above.

Important note: The docs state "we will reject complex regular expressions with quantifiers (except possessives) on groups which contain other quantifiers (except possessives)."

Depending on what this means we might need to go with a very naive approach :/

My regular expressions so far:

  • Test1: ^one|^two|^three|^((\w|-)*)(:)
  • Test2: ^one|^two|^three|^description:|^title:|^label:
  • Test3: ^(---(?:\n.*)*)\s*(---)$

CodePudding user response:

Converting my comment to answer so that solution is easy to find for future visitors.

You may use this regex with look arounds and \G to match all keys and labels before ---:

(?:(?:^label:|(?<!\A)\G)\R(\S )|^\w (?=:))(?=(?:.*\R) ---)

RegEx Demo

Breakup:

  • ^\w (?=:): Matches a tag that matches 1 word char that must be followed by a :
  • \G asserts position at the end of the previous match
  • (?:(?:^label:|(?<!\A)\G)\R(\S ): Matches all labels
  • (?=(?:.*\R) ---): Lookahead to assert that we have --- ahead of the current position
  • Related