Home > Back-end >  How to extract particular link from html page using php
How to extract particular link from html page using php

Time:10-12

Hii i'm trying to scrap href link from a tag using regex, but i'm unable to retrieve link can someone help me to achieve this here is the link which i tring to extract from html page. /u/0/uc?export=download&confirm=EY_S&id=fileid Here is my php function

<?php
function dwnload($url)
{
    $scriptx = "";
    $internalErrors = libxml_use_internal_errors(true);
    $dom = new DOMDocument();
    @$dom->loadHTML(curl($url));
    foreach ($dom->getElementsByTagName('a') as $k => $js) {
        $scriptx .= $js->nodeValue;
    }
    preg_match_all('#\bhttps?://[^,\s()<>] (?:\([\w\d] \)|([^,[:punct:]\s]|/))#', $scriptx, $match);
    $vlink = "";
    foreach ($match[0] as $c) {
        if (strpos($c, 'export=download') !== false) {
            $vlink = $c;
        }
    }

    return $vlink; 
}?>

Thanks

CodePudding user response:

You're concatenating the link texts. That does not make sense. If you try to extract links, DOMNode::getElementsByTagName() does the job already. You just need to filter the results.

Let's consider a small HTML fragment:

$html = <<<'HTML'
<a href="/u/0/uc?export=download&amp;confirm=EY_S&amp;id=fileid">SUCCESS</a>
<a href="/another/link">FAILURE</a>
HTML;

Now iterate the a elements and filter them by their href attribute.

$document = new DOMDocument();
$document->loadHTML($html);

foreach ($document->getElementsByTagName('a') as $a) {
    $href = $a->getAttribute('href');
    if (strpos($href, 'export=download') !== false) {
        var_dump([$href, $a->textContent]);
    }
}

Output:

array(2) {
  [0]=>
  string(46) "/u/0/uc?export=download&confirm=EY_S&id=fileid"
  [1]=>
  string(7) "SUCCESS"
}

Now if this is a string match it is possible to use an Xpath expression:

$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);

foreach ($xpath->evaluate('//a[contains(@href, "export=download")]') as $a) {
    var_dump([$a->getAttribute('href'), $a->textContent]);
}

Or combine the Xpath expression with an more specific regular expression:

$pattern = '((?:\\?|&)export=download(?:&|$))';
foreach ($xpath->evaluate('//a[contains(@href, "export=download")]') as $a) {
    $href = $a->getAttribute('href');
    if (preg_match($pattern, $href)) {
        var_dump([$href, $a->textContent]);
    }
}
  • Related