Use sed to extract specific line from file?-CodePudding

I have for example this codes inside some file such as test.txt

                <li>
                    <a href="https://www.mediafire.com/file/ke0f3n2u8sqbxsr/Ncam-Images-IPK.zip/file"
                        target="_blank"
                        rel="noopener"
                        id="copyShareURL"
                        
                        aria-labelledby="copy-tooltip" title="Copy file link to clipboard"></a>
                    <span id="copy-tooltip" >
                        Copy file link to clipboard
                    </span>
                </li>
    >
        <input type="hidden"
               name="security"
               value="1671208608.1b34b06f05dcee408eb015bfbc59436ab061fd4a21a61e3f970cd8d01f8ddcc3"
        >
        
        <div  id="download_link">
    <!-- IF THIS IS TRADITIONAL -->
    <a  href="#"><span>Preparing your download...</span></a>

            <a 
           aria-label="Download file"
           href="http://download2291.mediafire.com/h0k3ambdjvbg/ke0f3n2u8sqbxsr/Ncam-Images-IPK.zip"           id="downloadButton"
           rel="nofollow">
                Download (4.97MB)
        </a>
        <a  href="#"><span>Your download is starting...</span></a>
    <a  href="http://www.mediafire.com/download_repair.php?qkey=ke0f3n2u8sqbxsr&amp;dkey=h0k3ambdjvb&amp;template=34&amp;origin=click_button">
        <span >Download Started. <em>Repair your download</em></span>
    </a>
    <script type="text/javascript">
     (function() {
         var dl = document.getElementById('download_link');
         if (!dl) return;
         var init = false;

         function retry() {
             dl.className  = ' retry';
         };

         function download() {
             dl.className  = ' started';
             window.dlStarted = true;
                             setTimeout(retry, 16000);
                      };

I need to extract this url line

http://download2291.mediafire.com/h0k3ambdjvbg/ke0f3n2u8sqbxsr/Ncam-Images-IPK.zip

P.s: the number after download(2291) and (h0k3ambdjvbg) string it is not fixed, but variable ..

I have use this command of sed but no result

sed -n 's|.*href="\(http://download.*\)">|\1|p' test.txt

CodePudding user response：

Try

sed -n 's|.*href="\(http://download[^ "]*\)".*|\1|p' test.txt

to avoid matching quotes and spaces.

CodePudding user response：

With an HTML-aware tool such as pup, you could do something like

$ pup 'a.input attr{href}' < text.txt
http://download2291.mediafire.com/h0k3ambdjvbg/ke0f3n2u8sqbxsr/Ncam-Images-IPK.zip

assuming that the input class is unique enough to identify the URL you want.