This regex works in regex website test, but not in Gawk-CodePudding

Using GNU Awk 5.0.0, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.1.2), I want to check for a pattern using match.

My sample text is the following (with a space at the beginning of the line):

 7 Plasmas Mobiles (30%)

Using the following regex, I am able to match the string:

 [0-9]{1,} .{1,} \([0-9]{1,}%\)

As proved with this live example: regexr.com/6n3fh

However, awk's match returns 0:

awk '{print match($0, " [0-9]{1,} .{1,} \([0-9]{1,}%\)")}' reports/test

awk: cmd. line:1: warning: escape sequence \(' treated as plain ('

awk: cmd. line:1: warning: escape sequence \)' treated as plain )'

0

Why is that and how can I get the expected behavior, which is getting "1" as a return of match ?

CodePudding user response：

In awk a regex is formed as /the-regex/, see Regular Expressions. awk does offer Dynamic Regexps where the regex is quoted as you have it.

awk treats the two styles of regex differently. Specifically the double-quoted string is scanned twice by awk. This necessitates escaping with a double backslash, e.g. \\.

In your case you can either use:

match($0, / [0-9]{1,} .{1,} \([0-9]{1,}%\)/)

match($0, " [0-9]{1,} .{1,} \\([0-9]{1,}%\\)")

Example Use/Output

$ echo " 7 Plasmas Mobiles (30%)" | awk '{print match($0, / [0-9]{1,} .{1,} \([0-9]{1,}%\)/)}'
1

and

$ echo " 7 Plasmas Mobiles (30%)" | awk '{print match($0, " [0-9]{1,} .{1,} \\([0-9]{1,}%\\)")}'
1