Home > database >  Print line with 4 leading whitespaces - grep and awk handle ERE differently
Print line with 4 leading whitespaces - grep and awk handle ERE differently

Time:08-26

I am trying to print a line with 4 leading whitespaces. When I apply my regex with egrep, everything works as expected. But when I use awk, the results highly differ.

Can u say what I am doing wrong?

Example:

echo "    testtest" | egrep '^[[:space:]]{4}'

=> prints: testtest

echo "    testtest" | awk '/^[[:space:]]{4}/ {print}'

=> prints nothing

CodePudding user response:

Regarding your comment that echo "(whitespace x 4) testtest" | awk '/^[ \t]{4}/ {print}' --> prints nothing as well as the issue in your question - with mawk 1.3.4 you're running a pre-POSIX version of a minimal featured (for execution speed) variant of awk, mawk 1, so you shouldn't expect it to understand relatively modern POSIX concepts like character classes ([[:space:]]) or RE intervals ({4}) or non-POSIX extensions like \s or various other things. mawk 2 is now available which should have better support of POSIX features but get GNU awk, gawk, for the fullest functionality and excellent speed.

By the way, egrep is deprecated, use grep -E instead.

CodePudding user response:

inaccuracies I need to point out :

mawk 'BEGIN { 
    __="[[:space:]]"

    for(_=_<_; (_ _) < 4^4; _  ) { 
         if(sprintf("%c",_)~__)  { 
             printf("U   %6.4X\n",_) } } }'
U     0009    # horizontal tab \t
U     000A
U     000B
U     000C    # \f
U     000D    
U     0020    # space "[ ]"
  1. mawk-1 recognizes POSIX spaces properly in the ASCII side of things

  2. mawk-2, at its current beta stage, doesn't yet solve the {n,m} interval problem that mawk-1 faces

as for matching 4 spaces up front, something like

echo "    testtest"  | 
mawk 'BEGIN { _="[ \t]"; gsub(".",_,_); _^=FS=("^")_ } _<NF'

    or

# if u wanna be posixly-pedantic about it

mawk 'BEGIN { _^=FS="^"(_=(_="[[:space:]]")_)_ } _<NF'  
              
    testtest
  • Related