Bash regex to match multiple blocks of indented content and print all of them-CodePudding

I'm trying to do some regex matching in bash.
I'd like to match multiple block of indented (space or tab) content, with the block itself starting with a keyword. Some other content could be present in the file. Using this sample content :

keyword aaa match1

Some other content

keyword ccc match2
  indentend content
  matching
Some other content
  with indendation 

keyword ddd match2
  indented content still matching

I managed to use this : (^keyword.(?:\n^\h .)*), which seems to be sort of okay, everything is matching as expected. : https://regex101.com/r/kvMlKK/1

Expected output would be to print every matches :

keyword aaa match1
keyword ccc match2
  indentend content
  matching
keyword ddd match2
  indented content still matching

Unfortunatly I did not find a way to print all matches in bash. I can use grep/sed/awk/perl without any problem (edit: i meant I have access to all these command in the environnement i am working with).

Edit:

grep -E --include \*.md '(^keyword.*(?:\n^\h .*)*)' $(dirname "$0")/../_inbox/draft.md

Using grep it does not return the full match, only first line because of the lack of multi-line matching support I guess.
I am not familiar with awk/sed, I did not get any meaningful results (even if it seems to be better to use them for multi-line matching).

Edit: if that could work on multiple files that would be awesome

Thanks for your help!

CodePudding user response：

You can do it in pure bash, by looping... Because bash regex doesn't support multi-line matching.

#!/bin/bash

# Flag to track whether inside indented block
indented=0

# Read input line by line
while IFS= read -r line; do
  # Check if line starts with keyword
  reg="^[ \t]*keyword"
  if [[ $line =~ $reg ]]; then
    # Print line
    printf "%s\n" "$line"
    # Set flag to indicate inside indented block
    indented=1
  else
    # Check if line starts with whitespace and inside indented block
    reg="^[ \t] .*"
    if [[ $line =~ $reg && $indented -eq 1 ]]; then
      # Print line
      printf "%s\n" "$line"
    else
      # Reset flag to indicate outside indented block
      indented=0
    fi
  fi
done < "input"

You can do it in awk too:

awk '/^[ \t]*keyword/{print;while(getline line) if(line~/^[ \t] .*/) print line;else break}' input

Or use sed

sed -n '/^[ \t]*keyword/{:start;p;n;/^[ \t]/{p;n;b start;}}' input

CodePudding user response：

Using awk:

$ awk '!/^[\t ]/{p=0} /^keyword/{p=1} p' file 
keyword aaa match1
keyword ccc match2
  indentend content
  matching
keyword ddd match2
  indented content still matching
$