I need to extract the string between CAKE_FROSTING("
and ",
. If the string extends over multiple lines, the quotation marks and newline at the line changes must be removed. I have a command (thanks stackoverflow) that does something in that direction, but not exactly. How can I fix it (and can you shortly explain the fixes)? I am using Linux bash.
sed -En ':a;N;s/.*CAKE_FROSTING\(\n?\s*?"([^,]*).*/\1/p;ba' filesToCheck/* > result.txt
filesToCheck/file.h
something
CAKE_FROSTING(
"is supreme",
"[i][agree]") something else
something more
something else
CAKE_FROSTING(
"is."kinda" neat"
"in fact",
"[i][agree]") something else
something more
result.txt current
is supreme"
is."kinda" neat"
result.txt desired
is supreme
is."kinda" neat in fact
Edit: With help from @D_action I now have
sed -En ':a;N;s/.*CAKE_FROSTING\(\n?\s*?"([^,]*).*,/\1/p;ba' filesToCheck/* > result.txt
this produces almost the correct output, but there are unnecessary quotation marks and one too many newline in the output:
result.txt current
is supreme"
is."kinda" neat"
"in fact"
CodePudding user response:
Using GNU sed
$ sed -En ':a;N;s/.*CAKE_FROSTING\(\n?\s"([^"]*[^\n,]*)["].*\n"([[:alpha:] ] )?.*/\1 \2/p;ba' input_file
is supreme
is."kinda" neat in fact
CodePudding user response:
You can also use perl
here to match string between CAKE_FROSTING(
and )
and remove double quotes from start/end of lines and replace linebreaks with spaces only inside the matches:
perl -0777 -ne 'while (/CAKE_FROSTING\(\s*"([^,]*)"/g) {$a=$1; $a =~ s/^"|"$|(\R )/$1?" ":""/gme; print "$a\n"}' file
See the online demo. Note that -0777
slurps the file so that the regex engine could "see" the line breaks.
The CAKE_FROSTING\(\s*"([^,]*)"
pattern matches CAKE_FROSTING(
, zero or more whitespaces, "
, then captures into Group 1 any zero or more non-comma chars until the right-most "
.
The $a=$1; $a =~ s/^"|"$|(\R )/$1?" ":""/gme; print "$a\n"
parts assigns the Group 1 value to an $a
variable, ^"|"$|(\R )
matches "
s that are either at the start of end of lines or captures one or more line breaks (\R
) into Group 1 and if Group 1 matches, the replacement is a space, else, it is an empty string. The contents of the $a
variable is printed only.