Home > Net >  Add space between textContent data scraped from website using PHP DOM
Add space between textContent data scraped from website using PHP DOM

Time:05-04

I am trying to add a comma and whitespace to some data I am scraping from a website. The data scrapes successfully, but they are muddled up together, and the space and comma are trying to add only get added to the last item. Here is the code I currently have

$html = curl_exec($ch);
$dom = new DOMDocument();
@$dom->loadHTML($html);
$finder = new DomXPath($dom);

$class_ops = 'ipc-inline-list ';
$class_opp = 'ipc-inline ';
$node = $finder->query("//div[@class='$class_ops']//ul[@class='$class_opp']");

foreach ($node as $index => $t) {
    if ($index == 3) {
        $la = $t->textContent.", ";
    }
}

echo $la;

Current Result

DoyleBrainDavid, 

Expected Result

Doyle, Brain, David

CodePudding user response:

I am using this code

$c1 = curl_init('https://stackoverflow.com/');
curl_setopt($c1, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($c1);

if (curl_error($c1))
    die(curl_error($c1));
// Get the status code
$status = curl_getinfo($c1, CURLINFO_HTTP_CODE);
curl_close($c1);
preg_match_all('/<span(.*?)<\/span>/s', $html, $matches1);


foreach($matches1[0] as $k=>$v){
    $enc =  mb_detect_encoding($v);
    $v = mb_convert_encoding($v,$enc, "UTF-8");
    $match1[$k] = strip_tags ($v);
    //$match1[$k] = preg_replace('/^[^A-Za-z0-9] /', '', $match1[$k]);
}

var_dump($match1);

In your case you can replace like this

preg_match_all('/<div >(.*?)<\/div>/s', $html, $matches1);

This return array with matches.

I hope this can be helpful for you

CodePudding user response:

You want each li, not the ul as one block. Try:

$node = $finder->query("//div[@class='$class_ops']//ul[@class='$class_opp']/li");

Demo: https://3v4l.org/Mvfud

If that doesn't work the actual HTML content should be added to the question.

  • Related