Home > Blockchain >  How do I extract a string that spans multiple lines with sed?
How do I extract a string that spans multiple lines with sed?

Time:11-08

I need to extract the string between CAKE_FROSTING(" and ",. If the string extends over multiple lines, the quotation marks and newline at the line changes must be removed. I have a command (thanks stackoverflow) that does something in that direction, but not exactly. How can I fix it (and can you shortly explain the fixes)? I am using Linux bash.

sed -En ':a;N;s/.*CAKE_FROSTING\(\n?\s*?"([^,]*).*/\1/p;ba' filesToCheck/* > result.txt

filesToCheck/file.h

something
CAKE_FROSTING(
"is supreme", 
"[i][agree]") something else
something more
something else
CAKE_FROSTING(
"is."kinda" neat"
"in fact", 
"[i][agree]") something else
something more

result.txt current

is supreme"
is."kinda" neat"

result.txt desired

is supreme
is."kinda" neat in fact

Edit: With help from @D_action I now have

sed -En ':a;N;s/.*CAKE_FROSTING\(\n?\s*?"([^,]*).*,/\1/p;ba' filesToCheck/* > result.txt

this produces almost the correct output, but there are unnecessary quotation marks and one too many newline in the output:

result.txt current

is supreme" 
is."kinda" neat"
"in fact" 

CodePudding user response:

Using GNU sed

$ sed -En ':a;N;s/.*CAKE_FROSTING\(\n?\s"([^"]*[^\n,]*)["].*\n"([[:alpha:] ] )?.*/\1 \2/p;ba' input_file
is supreme
is."kinda" neat in fact

CodePudding user response:

You can also use perl here to match string between CAKE_FROSTING( and ) and remove double quotes from start/end of lines and replace linebreaks with spaces only inside the matches:

perl -0777 -ne 'while (/CAKE_FROSTING\(\s*"([^,]*)"/g) {$a=$1; $a =~ s/^"|"$|(\R )/$1?" ":""/gme; print "$a\n"}' file

See the online demo. Note that -0777 slurps the file so that the regex engine could "see" the line breaks.

The CAKE_FROSTING\(\s*"([^,]*)" pattern matches CAKE_FROSTING(, zero or more whitespaces, ", then captures into Group 1 any zero or more non-comma chars until the right-most ".

The $a=$1; $a =~ s/^"|"$|(\R )/$1?" ":""/gme; print "$a\n" parts assigns the Group 1 value to an $a variable, ^"|"$|(\R ) matches "s that are either at the start of end of lines or captures one or more line breaks (\R ) into Group 1 and if Group 1 matches, the replacement is a space, else, it is an empty string. The contents of the $a variable is printed only.

  • Related