I am looking for a regex pattern that starts with a specific string on the first line, contains one of a set of strings on the following line, and ends with an empty line. For example, it must start with - hello: world
; it must contain a line with fruit: apple
or fruit: banana
, and it must end with an empty lines. So the pattern would match the first two blocks here, but not the third:
- hello: world
fruit: apple
foo: bar
key: value
- hello: world
fruit: banana
message: hi
- hello: world
fruit: orange
message: hi
This is what I have so far:
/- hello: world\s*fruit: (apple|banana)/g
What I'm looking for is the rest that will stop at the empty line.
CodePudding user response:
Instead of using a Regex, use a parsed like yq that is build for parsing YAML.
So if the input file looks something like:
myData:
- hello: world
fruit: apple
foo: bar
key: value
- hello: world
fruit: banana
message: hi
- hello: world
fruit: orange
message: hi
Use an yq filter to filter the output where the following conditions are true
.hello == "world"
.fruit == "apple" or .fruit == "banana"
:
yq e '.myData | map(select(.hello == "world" and (.fruit == "apple" or .fruit == "banana")))' /path/to/input/file
Output:
- hello: world
fruit: apple
foo: bar
key: value
- hello: world
fruit: banana
message: hi
CodePudding user response:
Using \s*
matches optional whitespace characters which might also match newline.
The pattern world\s*fruit
that you are using could also match worldfruit
or world fruit
If there should be a newline in between, and not matching empty lines:
- hello: world\n[^\S\n]*fruit: (?:apple|banana)\b(?:\n[^\S\n]*\S.*)*
Explanation
- hello: world\n
Match literally followed by a newline[^\S\n]*fruit:
Match optional spaces followed byfruit:
(?:apple|banana)\b
Match eitherapple
orbanana
followed by a word boundary(?:
Non capture group to repeat as a whole part\n[^\S\n]*\S.*
Match a newline, optional spaces and a non white space character followed by the rest of the line
)*
Close the non capture group and optionally repeat it to match all lines
See a regex101 demo.