Index.html
of the curl command looks like below.
<html>
<head><title>Index of myorg/release/builds/production/</title>
</head>
<body>
<h1>Index of myorg/release/builds/production/</h1>
<pre>Name Last modified Size</pre><hr/>
<pre><a href="../">../</a>
<a href="1.0.60/">1.0.60/</a> 06-Jul-2022 07:47 -
<a href="1.0.63/">1.0.63/</a> 06-Jul-2022 10:21 -
<a href="1.0.64/">1.0.64/</a> 09-Jul-2022 18:08 -
<a href="1.0.65/">1.0.65/</a> 09-Jul-2022 18:42 -
<a href="1.0.71/">1.0.71/</a> 10-Jul-2022 10:23 -
<a href="1.0.73/">1.0.73/</a> 14-Jul-2022 17:28 -
<a href="1.0.75/">1.0.75/</a> 20-Jul-2022 07:25 -
<a href="%f}/">{STOCKIO}/</a> 24-May-2022 11:09 -
<a href="dashboard-react-module-1.0.29.tar.gz">dashboard-react-module-1.0.29.tar.gz</a> 24-May-2022 07:27 87.74 MB
<a href="dashboard-react-module-1.0.29.tar.gz.md5">dashboard-react-module-1.0.29.tar.gz.md5</a> 24-May-2022 07:27 32 bytes
<a href="dashboard-react-module-1.0.29.tar.gz.sha1">dashboard-react-module-1.0.29.tar.gz.sha1</a> 24-May-2022 07:27 40 bytes
<a href="dashboard-react-module-1.0.29.tar.gz.sha256">dashboard-react-module-1.0.29.tar.gz.sha256</a> 24-May-2022 07:27 64 bytes
<a href="dashboard-react-module.tar.gz">dashboard-react-module.tar.gz</a> 24-May-2022 07:27 87.74 MB
<a href="dashboard-react-module.tar.gz.md5">dashboard-react-module.tar.gz.md5</a> 24-May-2022 07:27 32 bytes
<a href="dashboard-react-module.tar.gz.sha1">dashboard-react-module.tar.gz.sha1</a> 24-May-2022 07:27 40 bytes
</pre>
<hr/><address style="font-size:small;">Artifactory/6.23.41 Server .myorg.com Port 80</address></body></html>
I'm unable to construct a logic to find the largest entry in the file, here its - 1.0.75
I tried grepping only the numbers like - grep -E "[[:digit:]]\.[[:digit:]]\.[[:digit:]]{1,4}" index.html
but it throws the same output as above.
My idea is to get all the numeric entries like 1.0.60, 1.0.63 ...
in to an array, cut the last part of the number and compare them to get the largest number, but, unable to find the right grep
command that gives only the numeric values.
Or is there a much efficient way to do it ?
CodePudding user response:
Using sed
to filter the data, sort
to arrange (in case unsorted) and tail
to show the last (largest) entry
$ sed -En '/href/s~[^>]*>([0-9][^/]*).*~\1~p' input_file | sort -n | tail -1
1.0.75
- Match lines containing the string
href
- Capture within parenthesis the match and exclude everthing else
- Return the match with backrefence
\1
- sort the piped output by numbers
- Print the last line (highest value)
CodePudding user response:
No doubt a lot of ways to do this..
cat foo1.x | grep 'href="[0-9]' | sed -E 's/.*href=.1.0.([0-9] ).*/\1/' | sort -u -n | tail -1
CodePudding user response:
With your shown samples and attempts, please try following GNU awk
sort
with head
solution.
awk 'match($0,/<a href="([0-9] (\.[0-9] )*)/,arr){print arr[1] | "sort -rV | head -1"}' Input_file
Explanation: Using awk
program to parse Input_file to it. Where using its match
function in which using regex <a href="([0-9] (\.[0-9] )*/)
where it creates capturing group of matched values to only have versions in it. GNU awk
capabilities to store matched regex values as values into array so creating arr
array which will contain only version values. Then using |
to run BASH command sort -rV
(Version sort) to get it revers order sort(descending order) and once all values ae printed; sending this output to head
command and printing only very first output which will be highest version only.
CodePudding user response:
Versions are sorted in index.html..
Getting the last one
awk -F'["/]' '/href="([0-9] \.[0-9] \.[0-9] )\/"/{n=$2}END{print n}' index.html
1.0.75
If versions are not sorted
awk -F'["/]' '
/href="([0-9] \.[0-9] \.[0-9] )\/"/ { a[NR]=$2 }
END{
asorti(a,b,"@val_num_desc");
print a[b[1]]
}
' index.html
1.0.75