I'm trying to build an equivalent to the following github-specific code that works for finding the latest artifact available for download from https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master -- the download links look something like https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/5901-5db768d8bbb973ba27c81e424aea2910144a3100/fx.tar.xz.
# Working code for github.com, needs to be converted to fivem.net
LOCATION=$(curl -s https://api.github.com/repos/someuser/somerepo/releases/latest \
| grep "tag_name" \
| awk '{print "https://github.com/someuser/somerepo/archive/" substr($2, 2, length($2)-3) ".zip"}') \
; curl -L -o file.zip $LOCATION
The file has an incremental version number but not a sequential number, followed by a completely random hash.
How can I find the latest download link from the HTML page at https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master?
CodePudding user response:
We can build off the use of lynx dump
, as suggested in Easiest way to extract the urls from an html page using sed or awk only --
#!/usr/bin/env bash
url_re='https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/([[:digit:]] )-([[:xdigit:]] )/fx.tar.xz'
newest_link_num=0
newest_link_content=
while read -r _ link; do
[[ $link =~ $url_re ]] || continue
if (( ${BASH_REMATCH[1]} > newest_link_num )); then
newest_link_num=${BASH_REMATCH[1]}
newest_link_content=$link
fi
done < <(lynx -dump -listonly -hiddenlinks=listonly https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master)
echo "Newest link is: $newest_link_content"
As of this writing, it finishes with the following output:
Newest link is: https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/5901-5db768d8bbb973ba27c81e424aea2910144a3100/fx.tar.xz
CodePudding user response:
I examined https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/ and latest links (version 5902 i.e. newest and version 5484 i.e. latest recommended) seems to have is-active
class
<a href="./5902-3c88d7752be75493078c1da898337b0abc2652ff/fx.tar.xz" style="display: block;">
as opposed to older versions. If possible you should use tools designed for working with HTML for dealing with HTML for example hxselect
, however if you are not allowed to install such tools you might GNU AWK
instead following way
wget -O - https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/ | awk 'BEGIN{RS="<|>"}/is-active/{sub(/^.*href="\./,"");sub(/".*/,"");print "https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master"$0}'
to get output
https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/5902-3c88d7752be75493078c1da898337b0abc2652ff/fx.tar.xz
https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/5848-4f71128ee48b07026d6d7229a60ebc5f40f2b9db/fx.tar.xz
Explanation: I inform GNU AWK
that row separator (RS
) is <
or >
so inside of starting and ending tag are treated as single row, then for row which contain is-active
I replace everything up to href=".
with empty string, i.e. delete it and then replace "
and all behind it using empty string, i.e. delete it, then print contatenation of https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master
and extracted href's value.
(tested in gawk 4.2.1)