Home > front end >  Extract all ocurrences between delimiters being that they can appear more than one per line
Extract all ocurrences between delimiters being that they can appear more than one per line

Time:12-28

I have a document file.yaml that have something like placeholders to replace:

class: ##TOPIC##-area
  name: myClass
type: Class
secretKey: private-##SECRET_KEY##

so far I've used grep to get the values of placeholders

grep -P '(?<=##).*(?=##)' file.yaml

then, I had those values:

TOPIC
SECRET_KEY

now, we have to introduce new properties that can have more than one placeholder per line

class: ##TOPIC##-area
  name: myClass
type: Class
secretKey: private-##SECRET_KEY##-encoded-##SUFFIX_CODE##

hence, grep no longer worked because the output became:

TOPIC
SECRET_KEY##-encoded-##SUFFIX_CODE

but, I want to have

TOPIC
SECRET_KEY
SUFFIX_CODE

I accept all kinds of suggestions and ideas to solve that. thanks

CodePudding user response:

When you want to use grep, try something like

grep -Eo "##[^#]*##" file.yaml | tr -d '#'

With awk you can have a multi-character separator, that looks easier:

awk -F'##' '{for (i=2; i<=NF;i =2) {print $i}}' file.yaml

CodePudding user response:

Use a negative lookahead to exclude substrings that contain ## in them from the match.

(?<=##)((?!##).)*(?=##)

DEMO

Note that this will also return -encoded- since it's also between a pair of ##. Normally you won't get overlapping matches, but lookarounds aren't considered part of the match, so they don't count as overlapping.

  • Related