I'm trying to do some regex matching in bash.
I'd like to match multiple block of indented (space or tab) content, with the block itself starting with a keyword.
Some other content could be present in the file.
Using this sample content :
keyword aaa match1
Some other content
keyword ccc match2
indentend content
matching
Some other content
with indendation
keyword ddd match2
indented content still matching
I managed to use this : (^keyword.(?:\n^\h .)*), which seems to be sort of okay, everything is matching as expected. : https://regex101.com/r/kvMlKK/1
Expected output would be to print every matches :
keyword aaa match1
keyword ccc match2
indentend content
matching
keyword ddd match2
indented content still matching
Unfortunatly I did not find a way to print all matches in bash. I can use grep/sed/awk/perl without any problem (edit: i meant I have access to all these command in the environnement i am working with).
Edit:
grep -E --include \*.md '(^keyword.*(?:\n^\h .*)*)' $(dirname "$0")/../_inbox/draft.md
Using grep it does not return the full match, only first line because of the lack of multi-line matching support I guess.
I am not familiar with awk/sed, I did not get any meaningful results (even if it seems to be better to use them for multi-line matching).
Edit: if that could work on multiple files that would be awesome
Thanks for your help!
CodePudding user response:
You can do it in pure bash, by looping... Because bash regex doesn't support multi-line matching.
#!/bin/bash
# Flag to track whether inside indented block
indented=0
# Read input line by line
while IFS= read -r line; do
# Check if line starts with keyword
reg="^[ \t]*keyword"
if [[ $line =~ $reg ]]; then
# Print line
printf "%s\n" "$line"
# Set flag to indicate inside indented block
indented=1
else
# Check if line starts with whitespace and inside indented block
reg="^[ \t] .*"
if [[ $line =~ $reg && $indented -eq 1 ]]; then
# Print line
printf "%s\n" "$line"
else
# Reset flag to indicate outside indented block
indented=0
fi
fi
done < "input"
You can do it in awk
too:
awk '/^[ \t]*keyword/{print;while(getline line) if(line~/^[ \t] .*/) print line;else break}' input
Or use sed
sed -n '/^[ \t]*keyword/{:start;p;n;/^[ \t]/{p;n;b start;}}' input
CodePudding user response:
Using awk:
$ awk '!/^[\t ]/{p=0} /^keyword/{p=1} p' file
keyword aaa match1
keyword ccc match2
indentend content
matching
keyword ddd match2
indented content still matching
$