Home > Enterprise >  Regex only inside multiline match
Regex only inside multiline match

Time:10-24

I have an old app that generates something like:

USERLIST (
    "jasonr"
    "jameso"
    "tommyx"
)
ROLELIST (
    "op"
    "admin"
    "ro"
)

I need some form of regex that changes ONLY the USERLIST section to USERLIST("jasonr", "jameso", "tommyx") and the rest of the text remain intact:

USERLIST("jasonr", "jameso", "tommyx")
ROLELIST (
    "op"
    "admin"
    "ro"
)

In addition to the multiline issue, I don't know how to handle the replacement in only part of the string. I've tried perl (-0pe) and sed, can't find a solution. I don't want to write an app to do this, surely there is a way...

CodePudding user response:

perl -0777 -wpe'
    s{USERLIST\s*\(\K ([^)] ) }{ join ", ", $1 =~ /("[^"] ")/g }ex' file

Prints the desired output on the shown input file. Broken over lines for easier view.

With -0777 switch the whole file is read at once into a string ("slurped") and is thus in $_. With /x modifier literal spaces in the pattern are ignored so can be used for readability.

Explanation

  • Capture what follows USERLIST (, up to the first closing parenthesis. This assumes no such paren inside USERLIST( ... ). With \K lookbehind all matches prior to it stay (are not "consumed" out of the string) and are excluded from $&, so we don't have to re-enter them in the replacement side

  • The replacement side is evaluated as code, courtesy of /e modifier. In it we capture all double-quoted substrings from the initial $1 capture (assuming no nested quotes) and join that list by , . The obtained string is then used for the replacement for what was in the parentheses following USERLIST

CodePudding user response:

With your shown samples in GNU awk please try following awk code.

awk -v RS='(^|\n)USERLIST \\(\n[^)]*\\)\n' '
RT{
  sub(/[[:space:]] \(\n[[:space:]] /,"(",RT)
  sub(/[[:space:]]*\n\)\n/,")",RT)
  gsub(/"\n  "/,"\", \"",RT)
  print RT
}
END{
  printf("%s",$0)
}
'   Input_file

Explanation: Setting RS(record separator) as (^|\n)USERLIST \\(\n[^)]*\\)\n for all lines of Input_file. Then in main program checking condition if RT is NOT NULL then substituting [[:space:]] \(\n[[:space:]] with "(" and then substituting [[:space:]]*\n\)\n with ) and then substituting "\n " with \" finally printing its value. Then in this program's END block printing line's value in printf function to get rest of the values.

Output will be as follows:

USERLIST("jasonr", "jameso", "tommyx")
ROLELIST (
    "op"
    "admin"
    "ro"
)

CodePudding user response:

This might work for you (GNU sed):

sed '/USERLIST/{:a;N;/^)$/M!ba;s/(\n\s*/(/;s/\n)/)/;s/\n\s*/, /g}' file

If a line contains USERLIST, gather up the list and format as required.

  • Related