Im trying to capture numbers inside a file using AWK, I could capture all, but im not being able to capture those in a certain amount of digits. What im doing wrong?
echo -e "$teste" | awk '/_OA/ { match($0,/\[\([:digit:]{4,13}\]/);oa = substr($0,RSTART,RLENGTH);print oa}'
File sample:
_OA ............. [6712227000168]
_OA Tasdsd, OA .. [91][355016]
_OA Tasdsd, DA .. [91][5512987000]
Expected:
6712227000168
355016
5512987000
CodePudding user response:
With your shown samples please try following awk
solution. Simply making field separator as ]
OR [
and in main block checking condition if line starts from _QA
then printing the 2nd last field.
awk -F"[][]" '/^_QA /{print $(NF-1)}' Input_file
CodePudding user response:
You could update the pattern and the values for RSTART and RLENGTH to not match the leading and trailing square brackets.
The digits part should be [[:digit:]]
and there is a \(
in the pattern that matches (
that should not be there.
awk '/_OA/ { match($0,/\[[[:digit:]]{4,13}\]/);oa = substr($0,RSTART 1,RLENGTH-2);print oa}' <<< "$teste"
Output
6712227000168
355016
5512987000
As there are multiple occurrences of digits between square brackets, if you want to match multiple occurrences:
teste='_OA Tasdsd, OA .. [91][355016][123456789][1][9999]'
awk '/_OA/ {
while(match($0,/\[[[:digit:]]{4,13}]/)){
start=RSTART 1; len=RLENGTH-2
s=substr($0,start,len)
res=res?res","s:s
$0=substr($0,start len)
}
print res
res = ""
}' <<< "$teste"
Output
355016,123456789,9999
CodePudding user response:
You can use
awk '/_OA/ { match($0,/\[[[:digit:]]{4,13}]/);print substr($0,RSTART 1,RLENGTH-2)}'
See the online demo:
#!/bin/bash
s='_OA ............. [6712227000168]
_OA Tasdsd, OA .. [91][355016]
_OA Tasdsd, DA .. [91][5512987000]'
awk '/_OA/ { match($0,/\[[[:digit:]]{4,13}]/);print substr($0,RSTART 1,RLENGTH-2)}' <<< "$s"
Output:
6712227000168
355016
5512987000
Details:
\[
- a[
char[[:digit:]]{4,13}
- four to thirteen digits (note that the[:digit:]
POSIX character class must be used within[...]
, a bracket expression)]
- a]
char (it is not special, no need escaping)
And substr($0,RSTART 1,RLENGTH-2)
means that we
$0
- take the matchRSTART 1
- starting with the second charRLENGTH-2
- and then as many characters as is the match length - 2 (thus getting rid of enclosing[
and]
chars)
CodePudding user response:
Your regexp \[\([:digit:]{4,13}\]
says:
\[
= the literal character[
\(
= the literal character(
[:digit:]
= a bracket expression containing a character set of the characters:
,d
,i
,g
,t
{4,13}
= a regexp interval that's 4 to 13 repetitions of the preceding bracket expression\]
= the literal character]
The 2 main issues with that which are causing your regexp to be unable to match any of your input are:
- You don't have any
(
s in your input (from #2 above), and - To match digits you need a character class
[:digit:]
inside a bracket expression[[:digit:]]
, not a character set:digit:
inside a bracket expression[:digit:]
(from #3 above)
You also don't actually need to escape the ]
at the end of the regexp as it's only a regexp metachar (end of bracket expression) if preceded by a matching unescaped [
(start of bracket expression).
So the regexp I think you wanted to write instead would have been:
\[[[:digit:]]{4,13}]
e.g.:
$ awk '/_OA/ { match($0,/\[[[:digit:]]{4,13}]/);oa = substr($0,RSTART,RLENGTH);print oa}' file
[6712227000168]
[355016]
[5512987000]
or to only print the numbers:
$ awk '/_OA/ { match($0,/\[[[:digit:]]{4,13}]/);oa = substr($0,RSTART 1,RLENGTH-2);print oa}' file
6712227000168
355016
5512987000