Home > Enterprise >  Facebook open graph scraper not seeing HTML
Facebook open graph scraper not seeing HTML

Time:11-10

I have had trouble getting the Facebook open graph scraper to see my pages or recognize the meta tags on them. SOME of our pages work, (e.g. the book product pages) but most pages do not show any images or correct title info in the debugger. The home page is one I am especially interested in getting Facebook to recognize, but there are many others that do not work. All of our pages show up with a 206 response code in the facebook scraper here: Homepage head

There is a redirect from enter image description here

Since I usually see response 200 on other pages I test on the scraper debugger, I have also created a small version of a homepage with JUST the open graph meta tags as a test. That page can be seen here: enter image description here

Even this small page gives me a response of 206 from the debugger. I think the 206 is probably not related the problem. Facebook support pages say it is OK, but it seems odd that such a small page would give a partial response... so maybe it is part of the problem.

One other thing that was sort of interesting: I tested one page on our site that has no og: tags, and it worked the first time I tried it in the scraper, showing an image and lots of constructed og tags. Plus, it showed me lots of info when I clicked on 'What the scraper sees' link. But subsequent tests of the same page (which hasn't changed at all) in the debugger have been empty images, and a blank page in what the scraper sees. This was that page: https://press.uchicago.edu/books/freeEbook.html

CodePudding user response:

Apparently it was the cache control. We had it as: "Cache-Control: no-cache, no-store, must-revalidate, max-age=0". But I changed it to just "Cache-Control: no-cache" and Facebook could suddenly see the page.

Facebook must need to store a copy of the page, so the no-store setting prevents that.

Here is some more about the settings:

""no-cache" and "no-store"

"no-cache" indicates that the returned response can't be used to satisfy a subsequent request to the same URL without first checking with the server if the response has changed. As a result, if a proper validation token (ETag) is present, no-cache incurs a roundtrip to validate the cached response, but can eliminate the download if the resource has not changed.

By contrast, "no-store" is much simpler. It simply disallows the browser and all intermediate caches from storing any version of the returned response—for example, one containing private personal or banking data. Every time the user requests this asset, a request is sent to the server and a full response is downloaded.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#directives

  • Related