Home > Software design >  multiline regex pattern that stops at empty line
multiline regex pattern that stops at empty line

Time:01-04

I am looking for a regex pattern that starts with a specific string on the first line, contains one of a set of strings on the following line, and ends with an empty line. For example, it must start with - hello: world; it must contain a line with fruit: apple or fruit: banana, and it must end with an empty lines. So the pattern would match the first two blocks here, but not the third:

- hello: world
  fruit: apple
  foo: bar
  key: value

- hello: world
  fruit: banana
  message: hi

- hello: world
  fruit: orange
  message: hi

This is what I have so far:

/- hello: world\s*fruit: (apple|banana)/g

What I'm looking for is the rest that will stop at the empty line.

CodePudding user response:

Instead of using a Regex, use a parsed like that is build for parsing YAML.


So if the input file looks something like:

myData:
  - hello: world
    fruit: apple
    foo: bar
    key: value
  - hello: world
    fruit: banana
    message: hi
  - hello: world
    fruit: orange
    message: hi

Use an filter to filter the output where the following conditions are true

  • .hello == "world"
  • .fruit == "apple" or .fruit == "banana":
yq e '.myData | map(select(.hello == "world" and (.fruit == "apple" or .fruit == "banana")))' /path/to/input/file

Output:

- hello: world
  fruit: apple
  foo: bar
  key: value
- hello: world
  fruit: banana
  message: hi

CodePudding user response:

Using \s* matches optional whitespace characters which might also match newline.

The pattern world\s*fruit that you are using could also match worldfruit or world fruit

If there should be a newline in between, and not matching empty lines:

- hello: world\n[^\S\n]*fruit: (?:apple|banana)\b(?:\n[^\S\n]*\S.*)*

Explanation

  • - hello: world\n Match literally followed by a newline
  • [^\S\n]*fruit: Match optional spaces followed by fruit:
  • (?:apple|banana)\b Match either apple or banana followed by a word boundary
  • (?: Non capture group to repeat as a whole part
    • \n[^\S\n]*\S.* Match a newline, optional spaces and a non white space character followed by the rest of the line
  • )* Close the non capture group and optionally repeat it to match all lines

See a regex101 demo.

  • Related