Home > Back-end >  bash grep only numbers and compare them
bash grep only numbers and compare them

Time:07-20

Index.html of the curl command looks like below.

<html>
<head><title>Index of myorg/release/builds/production/</title>
</head>
<body>
<h1>Index of myorg/release/builds/production/</h1>
<pre>Name                                        Last modified      Size</pre><hr/>
<pre><a href="../">../</a>
<a href="1.0.60/">1.0.60/</a>                                      06-Jul-2022 07:47    -
<a href="1.0.63/">1.0.63/</a>                                      06-Jul-2022 10:21    -
<a href="1.0.64/">1.0.64/</a>                                      09-Jul-2022 18:08    -
<a href="1.0.65/">1.0.65/</a>                                      09-Jul-2022 18:42    -
<a href="1.0.71/">1.0.71/</a>                                      10-Jul-2022 10:23    -
<a href="1.0.73/">1.0.73/</a>                                      14-Jul-2022 17:28    -
<a href="1.0.75/">1.0.75/</a>                                      20-Jul-2022 07:25    -
<a href="%f}/">{STOCKIO}/</a>                                 24-May-2022 11:09    -
<a href="dashboard-react-module-1.0.29.tar.gz">dashboard-react-module-1.0.29.tar.gz</a>          24-May-2022 07:27  87.74 MB
<a href="dashboard-react-module-1.0.29.tar.gz.md5">dashboard-react-module-1.0.29.tar.gz.md5</a>      24-May-2022 07:27  32 bytes
<a href="dashboard-react-module-1.0.29.tar.gz.sha1">dashboard-react-module-1.0.29.tar.gz.sha1</a>     24-May-2022 07:27  40 bytes
<a href="dashboard-react-module-1.0.29.tar.gz.sha256">dashboard-react-module-1.0.29.tar.gz.sha256</a>   24-May-2022 07:27  64 bytes
<a href="dashboard-react-module.tar.gz">dashboard-react-module.tar.gz</a>                 24-May-2022 07:27  87.74 MB
<a href="dashboard-react-module.tar.gz.md5">dashboard-react-module.tar.gz.md5</a>             24-May-2022 07:27  32 bytes
<a href="dashboard-react-module.tar.gz.sha1">dashboard-react-module.tar.gz.sha1</a>            24-May-2022 07:27  40 bytes
</pre>
<hr/><address style="font-size:small;">Artifactory/6.23.41 Server .myorg.com Port 80</address></body></html>

I'm unable to construct a logic to find the largest entry in the file, here its - 1.0.75

I tried grepping only the numbers like - grep -E "[[:digit:]]\.[[:digit:]]\.[[:digit:]]{1,4}" index.html but it throws the same output as above.

My idea is to get all the numeric entries like 1.0.60, 1.0.63 ... in to an array, cut the last part of the number and compare them to get the largest number, but, unable to find the right grep command that gives only the numeric values.

Or is there a much efficient way to do it ?

CodePudding user response:

Using sed to filter the data, sort to arrange (in case unsorted) and tail to show the last (largest) entry

$ sed -En '/href/s~[^>]*>([0-9][^/]*).*~\1~p' input_file | sort -n | tail -1
1.0.75
  • Match lines containing the string href
  • Capture within parenthesis the match and exclude everthing else
  • Return the match with backrefence \1
  • sort the piped output by numbers
  • Print the last line (highest value)

CodePudding user response:

No doubt a lot of ways to do this..

cat foo1.x | grep 'href="[0-9]' | sed -E 's/.*href=.1.0.([0-9] ).*/\1/' | sort -u -n | tail -1

CodePudding user response:

With your shown samples and attempts, please try following GNU awk sort with head solution.

awk 'match($0,/<a href="([0-9] (\.[0-9] )*)/,arr){print arr[1] | "sort -rV | head -1"}' Input_file

Explanation: Using awk program to parse Input_file to it. Where using its match function in which using regex <a href="([0-9] (\.[0-9] )*/) where it creates capturing group of matched values to only have versions in it. GNU awk capabilities to store matched regex values as values into array so creating arr array which will contain only version values. Then using | to run BASH command sort -rV(Version sort) to get it revers order sort(descending order) and once all values ae printed; sending this output to head command and printing only very first output which will be highest version only.

CodePudding user response:

Versions are sorted in index.html..

Getting the last one

awk -F'["/]' '/href="([0-9] \.[0-9] \.[0-9] )\/"/{n=$2}END{print n}' index.html

1.0.75

If versions are not sorted

 awk -F'["/]' '
   /href="([0-9] \.[0-9] \.[0-9] )\/"/ { a[NR]=$2 } 
   END{
       asorti(a,b,"@val_num_desc");
       print a[b[1]]
   }
' index.html

1.0.75
  • Related