Home > Net >  Bash regex to match multiple blocks of indented content and print all of them
Bash regex to match multiple blocks of indented content and print all of them

Time:01-30

I'm trying to do some regex matching in bash.
I'd like to match multiple block of indented (space or tab) content, with the block itself starting with a keyword. Some other content could be present in the file. Using this sample content :

keyword aaa match1

Some other content

keyword ccc match2
  indentend content
  matching
Some other content
  with indendation 

keyword ddd match2
  indented content still matching

I managed to use this : (^keyword.(?:\n^\h .)*), which seems to be sort of okay, everything is matching as expected. : https://regex101.com/r/kvMlKK/1

Expected output would be to print every matches :

keyword aaa match1
keyword ccc match2
  indentend content
  matching
keyword ddd match2
  indented content still matching

Unfortunatly I did not find a way to print all matches in bash. I can use grep/sed/awk/perl without any problem (edit: i meant I have access to all these command in the environnement i am working with).

Edit:

grep -E --include \*.md '(^keyword.*(?:\n^\h .*)*)' $(dirname "$0")/../_inbox/draft.md

Using grep it does not return the full match, only first line because of the lack of multi-line matching support I guess.
I am not familiar with awk/sed, I did not get any meaningful results (even if it seems to be better to use them for multi-line matching).

Edit: if that could work on multiple files that would be awesome

Thanks for your help!

CodePudding user response:

You can do it in pure bash, by looping... Because bash regex doesn't support multi-line matching.

#!/bin/bash

# Flag to track whether inside indented block
indented=0

# Read input line by line
while IFS= read -r line; do
  # Check if line starts with keyword
  reg="^[ \t]*keyword"
  if [[ $line =~ $reg ]]; then
    # Print line
    printf "%s\n" "$line"
    # Set flag to indicate inside indented block
    indented=1
  else
    # Check if line starts with whitespace and inside indented block
    reg="^[ \t] .*"
    if [[ $line =~ $reg && $indented -eq 1 ]]; then
      # Print line
      printf "%s\n" "$line"
    else
      # Reset flag to indicate outside indented block
      indented=0
    fi
  fi
done < "input"

You can do it in awk too:

awk '/^[ \t]*keyword/{print;while(getline line) if(line~/^[ \t] .*/) print line;else break}' input

Or use sed

sed -n '/^[ \t]*keyword/{:start;p;n;/^[ \t]/{p;n;b start;}}' input

CodePudding user response:

Using awk:

$ awk '!/^[\t ]/{p=0} /^keyword/{p=1} p' file 
keyword aaa match1
keyword ccc match2
  indentend content
  matching
keyword ddd match2
  indented content still matching
$ 
  • Related