I am new in PHP and trying to make script which can get data from external site. I am interesting in getting value of Merk which is Opel. HTML code for it is like this
<div class="row">
<div class="col-6 col-sm-5 label" data-tooltip="<strong>Merk</strong><br/>Het merk van het voertuig. Dit wordt voor alle voertuigsoorten geregistreerd.
<span>bron: RDW</span> ">
Merk
<span data-toggle="tooltip" data-html="true" title="<strong>Merk</strong><br/>Het merk van het voertuig. Dit wordt voor alle voertuigsoorten geregistreerd.<br /><span>bron: RDW</span> "></span><span data-toggle="tooltip" data-html="true" title="<strong>Merk</strong><br/>Het merk van het voertuig. Dit wordt voor alle voertuigsoorten geregistreerd.<br /><span>bron: RDW</span> "></span></div>
<div class="col-6 col-sm-7 value">
Opel
</div>
</div>
I am trying to get it with PHP code like below
<?php
// a new dom object
$dom = new domDocument;
// load the html into the object
$dom->loadHTML('https://centraalbeheerkentekencheck.azurewebsites.net/?kenteken=L-762-LZ');
// discard white space
$dom->preserveWhiteSpace = false;
$rowData= $dom->getElementsByTagName('row');
But now I am stuck and does not know how I can finish remain code so I can get value of Merk whiich is Opel. Let me know if anyone here can help me to achieve my goal.
Thanks!
CodePudding user response:
I think it is better to use SimpleHtmlDom for this (like voku/simple_html_dom):
composer install voku/simple_html_dom
The SimpleHtmlDom version
You used the url https://centraalbeheerkentekencheck.azurewebsites.net/?kenteken=L-762-LZ
for this, but it contains an iframe to: https://centraalbeheer.finnik.nl/kenteken/l762lz/gratis
, so I use that one instead in the script:
use voku\helper\HtmlDomParser;
require_once __DIR__ . "/vendor/autoload.php";
function getBrand(string $license) : string
{
$license = strtolower(str_replace("-", "", $license));
$dom = HtmlDomParser::file_get_html("https://centraalbeheer.finnik.nl/kenteken/".$license."/gratis");
$brand = $dom->find(".result .row .value")[0]->innerHtml();
return str_replace([" ", "\n", "\r"], "", $brand);
}
var_dump(getBrand("L-762-LZ"));
Update: You can also do this with regex
function getBrandRegex(string $license) : string
{
$license = strtolower(str_replace("-", "", $license));
$content = file_get_contents("https://centraalbeheer.finnik.nl/kenteken/".$license."/gratis");
preg_match_all('/<div >(.*?)<\/div>/s', $content, $matches);
$brand = $matches[1][0];
return trim(str_replace([" ", "\n", "\r"], "", $brand));
}
var_dump(getBrandRegex("L-762-LZ"));
Update: The DomDocument version
function getBrandDomDocument(string $license) : string
{
libxml_use_internal_errors(true); //see: https://www.php.net/manual/en/function.libxml-use-internal-errors.php
$license = strtolower(str_replace("-", "", $license));
$dom = new \DomDocument;
$dom->loadHTMLFile("https://centraalbeheer.finnik.nl/kenteken/".$license."/gratis");
$dom->preserveWhiteSpace = false;
$xpath = new \DOMXPath($dom);
$data = $xpath->query("//div[contains(@class, 'col-6 col-sm-7 value')]");
return trim(str_replace([" ", "\n", "\r"], "", $data[0]->textContent));
}
var_dump(getBrandDomDocument("L-762-LZ"));
Output
Opel