I have for example this codes inside some file such as test.txt
<li>
<a href="https://www.mediafire.com/file/ke0f3n2u8sqbxsr/Ncam-Images-IPK.zip/file"
target="_blank"
rel="noopener"
id="copyShareURL"
aria-labelledby="copy-tooltip" title="Copy file link to clipboard"></a>
<span id="copy-tooltip" >
Copy file link to clipboard
</span>
</li>
>
<input type="hidden"
name="security"
value="1671208608.1b34b06f05dcee408eb015bfbc59436ab061fd4a21a61e3f970cd8d01f8ddcc3"
>
<div id="download_link">
<!-- IF THIS IS TRADITIONAL -->
<a href="#"><span>Preparing your download...</span></a>
<a
aria-label="Download file"
href="http://download2291.mediafire.com/h0k3ambdjvbg/ke0f3n2u8sqbxsr/Ncam-Images-IPK.zip" id="downloadButton"
rel="nofollow">
Download (4.97MB)
</a>
<a href="#"><span>Your download is starting...</span></a>
<a href="http://www.mediafire.com/download_repair.php?qkey=ke0f3n2u8sqbxsr&dkey=h0k3ambdjvb&template=34&origin=click_button">
<span >Download Started. <em>Repair your download</em></span>
</a>
<script type="text/javascript">
(function() {
var dl = document.getElementById('download_link');
if (!dl) return;
var init = false;
function retry() {
dl.className = ' retry';
};
function download() {
dl.className = ' started';
window.dlStarted = true;
setTimeout(retry, 16000);
};
I need to extract this url line
http://download2291.mediafire.com/h0k3ambdjvbg/ke0f3n2u8sqbxsr/Ncam-Images-IPK.zip
P.s: the number after download(2291) and (h0k3ambdjvbg) string it is not fixed, but variable ..
I have use this command of sed but no result
sed -n 's|.*href="\(http://download.*\)">|\1|p' test.txt
CodePudding user response:
Try
sed -n 's|.*href="\(http://download[^ "]*\)".*|\1|p' test.txt
to avoid matching quotes and spaces.
CodePudding user response:
With an HTML-aware tool such as pup, you could do something like
$ pup 'a.input attr{href}' < text.txt
http://download2291.mediafire.com/h0k3ambdjvbg/ke0f3n2u8sqbxsr/Ncam-Images-IPK.zip
assuming that the input
class is unique enough to identify the URL you want.