Home > front end >  How to create a simple screen scraper in PHP
How to create a simple screen scraper in PHP

Time:01-13

I am trying to create a simple screen scraper that gets me the price of a specific item. Here is an example of a product I want to get the price from:

https://www.flanco.ro/telefon-mobil-apple-iphone-14-5g-128gb-purple.html

This is the portion of the html code I am interested in: enter image description here

I want to get the '4699' thing.

Here is what I have been trying to do but it does not seem to work:

$html = file_get_contents("https://www.flanco.ro/telefon-mobil-apple-iphone-14-5g-128gb-purple.html");
$doc = new DomDocument();
$doc->loadHtml($html);
$xpath = new DomXPath($doc);
//Now query the document:
foreach ($xpath->query('/<span >[0-9]*\\.[0-9] /i') as $node) {
    echo $node, "\n";
}

CodePudding user response:

You could just use standard PHP string functions to get the price out of the $html:

$url   = "https://www.flanco.ro/telefon-mobil-apple-iphone-14-5g-128gb-purple.html";
$html  = file_get_contents($url);
$seek  = '<span ><span >';
$end   = strpos($html, $seek)   strlen($seek);
$price = substr($html, $end, strpos($html, ',', $end) - $end);

Or something similar. This is all the code you need. This code returns:

4.699

My point is: In this particular case you don't need to parse the DOM and use a regular expression to get that single price.

CodePudding user response:

Since there are a few price classes on the page. I would specifically target the pricesPrp class.

Also on your foreach you are trying to convert a DOMElement object into a string which wouldn't work

Update your xpath query as such :

$query = $xpath->query('//div[@]//span[@]//span[@]');

If you want to see the different nodes:

echo '<pre>';
foreach ($query as $node) {
    var_dump($node);
}

And if you want to get that specific price :

$price = $query->item(0)->nodeValue;
echo $price;

CodePudding user response:

$html = file_get_contents('PASTE_URL');

$doc = new DOMDocument();
@$doc->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));    
@$selector = new DOMXPath($doc);

$result = $selector->query('//span[@]');
foreach($result as $node) {
    echo $node->nodeValue;
}
  • Related