I am doing php parser using cURL (simple_html_dom.php). I have to parse news posts here: https://www.sport-express.ru/football/reviews/page2/ It is second page. I need to get programatically last number of page (it will be 50). There is no pagination - only lazy loading button. How can I get last page number using cURL? Thanks!
PS: It will be great if You show also how can I get last page number when there will pagination.
CodePudding user response:
A possible solution would be to go through all pages until the error 404 appears
$pageNumber = 1;
$url = "https://www.sport-express.ru/football/reviews/page{pageNumber}/?ajax=1";
$finished = false;
while($finished === false) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, str_replace($pageNumber, '{pageNumber}', $url));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpCode === 404) {
$finished = true;
} else {
// Do something...
$pageNumber ;
}
}
CodePudding user response:
Try this:
$data = file_get_contents('https://www.sport-express.ru/football/reviews/page1/');
$start = strpos($data,'data-prop-max-page="') 20;
echo "start=$start\n";
$end = strpos($data,'>',$start) - 1;
$lastpage = substr($data,$start,$end-$start);
echo "last page = $lastpage \n$data";
This is what we are looking for:
<div data-component="nav" data-prop-page="2" data-prop-max-page="50">
First find the position of 'data-prop-max-page='
The add 20 because the search string is 20 characters long.
Then get the the position of the >
immediately following the $start position (third strpos parameter).
Then get the sub string which today is 50.
Here are the values found:
start=339441
end=339442
last page = 50
PS: It will be great if You show also how can I get last page number when there will pagination.
Reply to my answer when that day comes. My psychic abilities are not that sharp.