I built an r shiny web app in rStudio that pulls data from an Amazon S3 bucket using the access key, secret access key and region via EC2 in Sys.setenv()
but would like to utilize AWS CloudFront. I already set up a CloudFront distribution via the AWS online console for the Amazon S3 bucket in question but do not quite understand how to actually assure data is pulled via CloudFront rather than EC2. I could also be misunderstanding the relationship between Amazon S3 and EC2/CloudFront, thus any information is much appreciated.
CodePudding user response:
Cloudfront sits "in front" of an S3 bucket (or http "origin") and receives incoming requests, which it forwards to the origin. It then caches the response. Cloudfront is a "content distribution network" which means it has edge nodes all over the place, and will route request traffic to a nearby edge node, maximizing the amount of network path that is on the amazon network (which is fast and high quality and has good connectivity to the rest of AWS) and minimizing the consumer grade part of the path (which is slow, latent, and unreliable).
This caching and edge cdn can speed things up significantly - for remote users or highly repetitive requests (like for a busy website). But what effect will it have on your performance in R? If R is running in an EC2 instance, then you're already quite near to the s3 servers, so being nearer your edge cdn server doesn't help you.
Cloudfront serves hypertext protocol, so requests to cloudfront would be made over https, but it does not serve the s3 api , so your s3 code would have to change to a more http based process. You'd have to solve your authNZ differently from the Sigv4 aws signing you're using with S3.
the relationship between Amazon S3 and EC2/CloudFront
You can think of cloudfront almost as a load balancer in front of an origin. That origin could be S3 or it could be an http server, perhaps on an EC2 instance. But its consumers are always http clients. The consumers could be on EC2 or they could be running somewhere else ,but they still speak https. So the choice of the phrase EC2/Cloudfront
is a little confusing to me, because EC2 is on the client side and cloudfront is "closer" to the S3 source side of the equation.