and thanks for taking the time to read my question.
I am trying to write a regular expression that will match the version number from a configuration file. I am trying to match and extract the version number from the two following numbering patterns
1) <version>2.343</version>
2) <version>2.343.2</version>
Such that a result is returned of either
1) 2.343
2) 2.343.2
My current solution- looks like either one of these two awk commands with the regex pattern to match both cases individually. But there must be a solution that covers both cases?
awk 'match($0, /[0-9][.][0-9][0-9][0-9]/) {print substr($0, RSTART, RLENGTH) }' config.xml
awk 'match($0, /[0-9][.][0-9][0-9][0-9].[0-9]/) {print substr($0, RSTART, RLENGTH) }' config.xml
CodePudding user response:
1st solution: With your shown samples please try following. Using match
function of awk
here, should work in any POSIX awk
version. Using regex >[0-9] (\.[0-9] )*<
to match values from >
followed by version followed by >
and if regex match is found then printing sub string of matched values.
awk 'match($0,/>[0-9] (\.[0-9] )*</){print substr($0,RSTART 1,RLENGTH-2)}' Input_file
OR In case you want to exactly looking for version tag then try following:
awk 'match($0,/<version>[0-9] (\.[0-9] )*<\/version>/){print substr($0,RSTART 9,RLENGTH-19)}' Input_file
2nd solution: With your shown samples. Using GNU awk
's RS
variable with same concept of using regex in it and getting values.
awk -v RS='<version>[0-9] (\\.[0-9] )*<\\/version>' 'RT{split(RT,arr,"[><]");print arr[3]}' Input_file
CodePudding user response:
You may use:
awk 'match($0, /[0-9] (\.[0-9] ) /) {
print $0, substr($2, RSTART, RLENGTH)}' file
1) 2.343
2) 2.343.2
CodePudding user response:
Using GNU awk and the third argument of match()
:
$ gawk 'match($0,/<version>(.*)<\/version>/,a){print a[1]}' file
2.343
2.343.2
CodePudding user response:
Your two commands might be melded into one using ?
meaning zero-or-one repetitions as follows
awk 'match($0, /[0-9][.][0-9][0-9][0-9](.[0-9])?/) {print substr($0, RSTART, RLENGTH) }' config.xml
which for config.xml
content as follows
1) <version>2.343</version>
2) <version>2.343.2</version>
gives output
2.343
2.343.2
(tested in gawk 4.2.1)
CodePudding user response:
absolutely no need to invoke match()
or resort to vendor-proprietary solutions
nawk NF OFS='' FS='(^[^>]*)?[<][/]?version[>]($)?'
2.343
2.343.2
the brute-force approaches :
gawk NF=NF OFS= FS='^[^>] >|<[/]. $' # kinda brute-force mawk NF OFS= FS='^[^>] .|./. $' # REALLY brute-force
2.343
2.343.2
CodePudding user response:
Here is another awk
solution (tested with GNU and BSD awk
) that tries to match exactly the two numbering patterns shown in the OP (<version>N.NNN</version>
and <version>N.NNN.N</version>
where N
is any digit). It assumes that <version>...</version>
tags are properly balanced, do not appear in comments, strings... and do not span over multiple lines. If several version numbers appear on the same line they are all printed.
awk -F '</?version>' '{
for(i=1; i<=NF/2; i )
if($(2*i) ~ /^[0-9]\.[0-9]{3}(\.[0-9])?$/) print $(2*i)
}' config.xml
If the components of version numbers can have any number of digits (minimum 1) just relax the regular expression: /^[0-9] (\.[0-9] ){1,2}$/
. And if there can be any number of components (minimum 1) relax a bit more: /^[0-9] (\.[0-9] )*$/
(or /^[0-9] (\.[0-9] ) $/
for at least 2 components).
If <version>...</version>
tags are not properly balanced, can appear in comments, or can span over several lines, a real XML parser would be a much better solution than a general purpose utility like awk
.