I want to get the newest (first) download link matching a regex.
URL=https://github.com/sharkdp/bat/releases/ # Need to look at /releases/ even though the downloads are under /releases/download/$REL/$BAT
content=$(wget $URL -q -O -)
# Parse $content for string starting 'https://' and ending "_amd64.deb"
# At the moment, that will be: href="/sharkdp/bat/releases/download/v0.18.3/bat_0.18.3_amd64.deb"
# wget -O to specify the name of the file into which wget dumps the page contents, and then - to get the dump onto standard output. -q (quiet) turns off wget output.
Then I need to somehow grep / match strings that starts https://
and ends _amd64
. Then I need to just pick the first one in that list.
How do I grep / match / pick first item in this way?
Once I have that, it's then easy for me to download the latest version on the page, with wget -P /tmp/ $DL
CodePudding user response:
With Bash, you can use
rx='href="(/sharkdp/[^"]*_amd64\.deb)"'
if [[ "$content" =~ $rx ]]; then
echo "${BASH_REMATCH[1]}";
else
echo "No match";
fi
# => /sharkdp/bat/releases/download/v0.18.3/bat-musl_0.18.3_amd64.deb
The href="(/sharkdp/[^"]*_amd64\.deb)"
regex matches href="
, then captures into Group 1 (${BASH_REMATCH[1]}
) /shardp/
zero or more chars other than "
_amd64.deb
and then just matches "
.
With GNU grep
, you can use
> link=$(grep -oP 'href="\K/sharkdp/[^"]*_amd64\.deb' <<< "$content" | head -1)
> echo "$link"
# => /sharkdp/bat/releases/download/v0.18.3/bat-musl_0.18.3_amd64.deb
Here,
href="\K/sharkdp/[^"]*_amd64\.deb
- matcheshref="
, then drops this text from the match, then matches/sharkdp/
any zero or more chars other than"
and then_amd_64.deb
head -1
- only keeps the first match.