I have this file
~/ % cat t
---
abc
def DEF
ghi GHI
---
123
456
and I would like to extract the content between the three dashes, so I try
sed -En '{N; /^---\s{5}\w /,/^---/p}' t
I.e. 3 dashes followed by 5 whitespaces including the newline, followed by one or more word characters and ending with another set of three dashes. This gives me this output
~/ % sed -En '{N; /^---\s{5}\w /,/^---/p}' t
---
abc
def DEF
ghi GHI
---
123
I don't want the line with "123". Why am I getting that and how do I adjust my expression to get rid of it? [EDIT]: It is important that the four spaces of indentation after the first three dashes are matched in the expression.
CodePudding user response:
No need to use the pattern space here - a range pattern will do fine.
$ sed -n '/^---/,/^---/p' t
---
abc
def DEF
ghi GHI
---
Tested in GNU sed 4.7 and OSX sed.
CodePudding user response:
I believe you can use
perl -0777 -ne '/^---\R(\s{4}\w.*?^---)/gsm && print "$1\n";' t
Details:
-0777
- slurps the file into a single variable^---\R(\s{4}\w.*?^---)
- start of a line (^
),---
, a line break, then Group 1: four whitespaces, a word char, then zero or more chars as few as possible, and then---
at the start of a linegsm
-g
lobal, all occurrences are returned,s
means.
matches any chars including line break chars, asm
means^
now matches start of any line, not just string start&& print "$1\n"
- if there is a match, print Group 1 value a line break.
CodePudding user response:
This might work for you (GNU sed):
sed -En '/^---/{:a;N;/^ {4}\S/M!D;/\n---/!ba;p;d}' file
Turn on extended regexp (-E
) and off implicit printing (-n
).
If a line begins ---
and the following line is indented by 4 spaces, gather up the following lines until another begins ---
and print them.
If the following line does not match the above criteria, delete the first and repeat.
All other lines will pass through unprinted.
N.B. The M
flag on the second regexp for multiline matching , since the first line already begins ---
the next must be indented.