Home > Back-end >  Php Dynamic Sitemap After 50K Url
Php Dynamic Sitemap After 50K Url

Time:03-13

Hello I Need some help with dynamic sitemap using PHP This is my code

<?php
header('Content-type: application/xml; charset="ISO-8859-1"', true);
$dataAll1 = scandir('cache-data');
unset($dataAll1[0]);
unset($dataAll1[1]);
unset($dataAll1[2]);
$sitemap = '<?xml version="1.0" encoding="UTF-8"?>
           <urlset xmlns="'.PROTOCOL.'://www.sitemaps.org/schemas/sitemap/0.9">';

$sitemap .= '<url>
                        <loc>' . SITE_HOST . '</loc>
                        <priority>1.0</priority>
                     </url>';

foreach($dataAll1 as $val){
    $data = json_decode(file_get_contents('cache-data/'.$val),1);
    if($val=='index.php'){
        continue;
    }
    $sitemap .= '<url>
                   <loc>'.SITE_HOST . '/job-detail/' . $data['jk'].'jktk'.$data['tk'] . '-' . $service->slugify($data['title']).'</loc>
                        <priority>0.9</priority>
                        <changefreq>daily</changefreq>
                     </url>';
}

$sitemap.='</urlset>';
echo $sitemap;

this code success build sitemap file but after 50K this code wont make new file sitemap, so how to make this code want make new file sitemap after 50K Url?

I really appreciate any help

CodePudding user response:

I would recommend to switch from a web page that serves the sitemap when it's called, to a script that generates static XMLs and that you can optionally call via crontab.

Step 1: Define directory where sitemaps are saved

This will be used to write one static xml file for each sitemap, plus the sitemap index. The directory needs to be writable from the user running the script and accessible via web.

define('PATH_TO_SITEMAP', __DIR__.'/www/');

Step 2: Define the public URL where the sitemaps will be reachable

This is the Web URL of the folder you defined above.

define('HTTP_TO_SITEMAP', 'https://yourwebsite.com/');

Step 3: Write sitemaps to file

Instead of echo the Sitemap, write it to static xml files in the folder defined above.

Furthermore keep a counter ($count_urls in my example below) for the number of URLs and when it reaches 50k reset the counter, close the sitemap and open a new one.

Also, keep a counter of how many sitemaps you're creating ($count_sitemaps in my example below).

Here's a working example that keeps your structure but writes the multiple files. It's not ideal, you should use a class/method or a function to write the sitemaps and avoid repeated code, but I thought this would be easier to understand as it keeps the same approach you had in your code.

$h = fopen(PATH_TO_SITEMAP.'sitemap_'.$count_sitemaps.'.xml', 'w ');
fwrite($h, '<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="'.PROTOCOL.'://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>'.SITE_HOST.'</loc>
        <priority>1.0</priority>
    </url>
');

$count_urls = 0;
foreach ($dataAll1 as $val) {
    $count_urls  ;
    $data = json_decode(file_get_contents('cache-data/'.$val), 1);
    if ($val == 'index.php') {
        continue;
    }
    // When you reach 50k, close the sitemap, increase the counter and open a new sitemap
    if ($count_urls >= 50000) {
        fwrite($h, '</urlset>');
        fclose($h);
        $count_sitemaps  ;
        $h = fopen(PATH_TO_SITEMAP.'sitemap_'.$count_sitemaps.'.xml', 'w ');
        fwrite($h, '<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="'.PROTOCOL.'://www.sitemaps.org/schemas/sitemap/0.9">');
        $count_urls = 0;
    }
    fwrite($h, '
        <url>
            <loc>'.SITE_HOST.'/job-detail/'.$data['jk'].'jktk'.$data['tk'].'-'.$service->slugify($data['title']).'</loc>
            <priority>0.9</priority>
            <changefreq>daily</changefreq>
        </url>
    ');
}

// Important! Close the final sitemap and save it.
fwrite($h, '</urlset>');
fclose($h);

Step 4: Write SitemapIndex

Finally you write a sitemapindex file that points to all the sitemaps you generated using the same $count_sitemaps counter.

// Build the SitemapIndex
$h = fopen(PATH_TO_SITEMAP.'sitemap.xml', 'w ');
fwrite($h, '<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">');
for ($i = 1; $i <= $count_sitemaps; $i  ) {
    fwrite($h, '
    <sitemap>
        <loc>'.HTTP_TO_SITEMAP.'sitemap_'.$i.'.xml</loc>
        <lastmod>'.date('c').'</lastmod>
    </sitemap>
    ');
}
fwrite($h, '
</sitemapindex>');
fclose($h);

Step 5 (optional): ping Google

file_get_contents('https://www.google.com/webmasters/tools/ping?sitemap='.urlencode(HTTP_TO_SITEMAP.'sitemap.xml'));

Conclusions

As mentioned this approach is a bit messy and doesn't follow best practices on code re-use, but it should be quite clear on how to loop, store multiple sitemaps and finally build the sitemapindex file.

For a better approach, I suggest you to look at this open source Sitemap class.

CodePudding user response:

You should start the array generated from the scandir with the 50001, but it would be better to split it into several scripts with a crontab, so you don't exhaust the server with excessive runtime; otherwise, increase the runtime of PHP scripts.

  •  Tags:  
  • php
  • Related