I am trying to print a line with 4 leading whitespaces. When I apply my regex with egrep, everything works as expected. But when I use awk, the results highly differ.
Can u say what I am doing wrong?
Example:
echo " testtest" | egrep '^[[:space:]]{4}'
=> prints: testtest
echo " testtest" | awk '/^[[:space:]]{4}/ {print}'
=> prints nothing
CodePudding user response:
Regarding your comment that echo "(whitespace x 4) testtest" | awk '/^[ \t]{4}/ {print}' --> prints nothing
as well as the issue in your question - with mawk 1.3.4
you're running a pre-POSIX version of a minimal featured (for execution speed) variant of awk, mawk 1
, so you shouldn't expect it to understand relatively modern POSIX concepts like character classes ([[:space:]]
) or RE intervals ({4}
) or non-POSIX extensions like \s
or various other things. mawk 2
is now available which should have better support of POSIX features but get GNU awk, gawk
, for the fullest functionality and excellent speed.
By the way, egrep
is deprecated, use grep -E
instead.
CodePudding user response:
inaccuracies I need to point out :
mawk 'BEGIN { __="[[:space:]]" for(_=_<_; (_ _) < 4^4; _ ) { if(sprintf("%c",_)~__) { printf("U %6.4X\n",_) } } }'
U 0009 # horizontal tab \t
U 000A
U 000B
U 000C # \f
U 000D
U 0020 # space "[ ]"
mawk-1
recognizesPOSIX
spaces properly in theASCII
side of thingsmawk-2
, at its current beta stage, doesn't yet solve the{n,m}
interval problem thatmawk-1
faces
as for matching 4 spaces up front, something like
echo " testtest" |
mawk 'BEGIN { _="[ \t]"; gsub(".",_,_); _^=FS=("^")_ } _<NF' or # if u wanna be posixly-pedantic about it mawk 'BEGIN { _^=FS="^"(_=(_="[[:space:]]")_)_ } _<NF'
testtest