I have scoured the interwebs for months trying to find a solution, so any guidance will be a huge help to me.
So my task is that I have a RoR app that is using Fargate. I have a sitemap index and three sitemaps(links split up in 50k increments). These sitemaps needs to be accessible via my url (mysite.com/sitemap...).
So from my understanding, containers are ephemeral and adding the sitemap to my public folder will have undesirable results with indexing on Google.
I have found countless tutorials on how to upload the sitemap using Heroku via S3 - but this option appears to use the public url of the S3 and not the url from my domain.
My guess is I need to use something like Elastic File Storage or maybe even S3 - but I am lost. I can even put it this way, how do companies like Airbnb and Github store their sitemaps?
CodePudding user response:
I don't know about Airbnb or Github's sitemaps, but if you can get your app running on Fargate then you can figure out anything.
So from my understanding, containers are ephemeral and adding the sitemap to my public folder will have undesirable results with indexing on Google.
It's true that containers are ephemeral, but that has nothing to do with undesirable results with Google.
You can host the sitemaps on S3 or Elastic File Storage. You can configure S3 to use your domain as well (see below), but I'm not sure if that is worth the effort.
The easiest thing to do is to host the sitemaps in your public folder. The process would be to generate the files on your dev machine and add them to the repo. When they are deployed, they will be in the public folder of each container and available to the Rails app.
If you decide that you don't want the Rails app to serve the sitemaps (which may make sense for certain use cases), then the next easiest thing would probably be to host it on S3.
You can configure S3 to use a subdomain. I'm not sure if this would have an effect on how Google sees your site, or if the site index is supposed to be hosted on the same domain.
If you want to host the sitemaps on S3 with your own domain, then you might be able to use CloudFront to forward all requests to your Rails app, with the exception of the sitemaps. The sitemaps could be served from S3.
Reference: Using S3 with Subdomain
EDIT: If you decide to use CloudFront, then it's not necessary to use S3. CloudFront can cache the sitemap for days or weeks, and your app would only serve it once in that time.
CodePudding user response:
My guess is I need to use something like Elastic File Storage or maybe even S3 - but I am lost. I can even put it this way, how do companies like Airbnb and Github store their sitemaps?
Big companies like that would certainly have a CDN in front of their website. You can also have a CDN in front of your website. The AWS solution is CloudFront, but I would also recommend looking into Cloudflare.
In either case, once you have a CDN in front of your website, you can configure it to server different content from different origins, based on the URL path. So for instance you could setup the default origin as your Ruby app, and setup the /sitemap
origin as an S3 bucket that has your sitemap file in it.
Alternatively you could store the site map in EFS, map the EFS volume to your Fargate tasks, and configure your Ruby app (or Nginx running in front of your Ruby app?) to serve the file in the sitemap volume when a request comes in for /sitemap
.