I am trying to create a simple screen scraper that gets me the price of a specific item. Here is an example of a product I want to get the price from:
https://www.flanco.ro/telefon-mobil-apple-iphone-14-5g-128gb-purple.html
This is the portion of the html code I am interested in: enter image description here
I want to get the '4699' thing.
Here is what I have been trying to do but it does not seem to work:
$html = file_get_contents("https://www.flanco.ro/telefon-mobil-apple-iphone-14-5g-128gb-purple.html");
$doc = new DomDocument();
$doc->loadHtml($html);
$xpath = new DomXPath($doc);
//Now query the document:
foreach ($xpath->query('/<span >[0-9]*\\.[0-9] /i') as $node) {
echo $node, "\n";
}
CodePudding user response:
You could just use standard PHP string functions to get the price out of the $html
:
$url = "https://www.flanco.ro/telefon-mobil-apple-iphone-14-5g-128gb-purple.html";
$html = file_get_contents($url);
$seek = '<span ><span >';
$end = strpos($html, $seek) strlen($seek);
$price = substr($html, $end, strpos($html, ',', $end) - $end);
Or something similar. This is all the code you need. This code returns:
4.699
My point is: In this particular case you don't need to parse the DOM and use a regular expression to get that single price.
CodePudding user response:
Since there are a few price classes on the page. I would specifically target the pricesPrp class.
Also on your foreach you are trying to convert a DOMElement object into a string which wouldn't work
Update your xpath query as such :
$query = $xpath->query('//div[@]//span[@]//span[@]');
If you want to see the different nodes:
echo '<pre>';
foreach ($query as $node) {
var_dump($node);
}
And if you want to get that specific price :
$price = $query->item(0)->nodeValue;
echo $price;
CodePudding user response:
$html = file_get_contents('PASTE_URL');
$doc = new DOMDocument();
@$doc->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));
@$selector = new DOMXPath($doc);
$result = $selector->query('//span[@]');
foreach($result as $node) {
echo $node->nodeValue;
}