How we use Amazon CloudFront for dynamically generated content

Renat Zubairov know-how

Amazon Cloudfront Part 1

Amazon CloudFront in combination with custom origin provide us with a great caching and content delivery service. We can use it even for dynamically generated resources. In our (simplistic) test we’ve seen 10-fold increase in speed and latency when serving from the Amazon CloudFront caches. With CloudFront we can declaratively control caching behaviour. This means that we can also use aggressive caching which could significantly reduce load on our servers therefore we need less servers/VMs hence reducing costs.

We were using JavaScript and Canvas to render it on browser side but now we decided to use same JavaScript and node-canvas to render PNG on server side. How we did it is a topic for separate blog post, however it was clear from beginning – server-side PNG generation is costly and resource intensive task, therefore we need to take extreme care when doing it on the web-scale.

One obvious way to reduce the load on our servers was caching. We could have installed a custom caching solution, however we decided to go with Amazon CloudFront, particularly with AWS CloudFront and custom origin option.

What is Amazon CloudFront custom origin?

When CloudFront was introduced it performed great on content distribution. As stated on the Amazon CloudFront page:

Amazon CloudFront delivers your static and streaming content using a global network of edge locations. Requests for your objects are automatically routed to the nearest edge location, so content is delivered with the best possible performance…

In the beginning you only could feed CloudFront with your Amazon S3 content. You put your files on the Amazon S3 and CloudFront takes it from there, caches it on the edge locations and delivers from there to your customers.

That was working great, however S3 as the only source was definitely a limitation. In November 2011 CloudFront introduced new ‘custom origin’ feature. With custom origin you could feed your Amazon CloudFront with any content, even dynamically generated, right from your web-server. That was exactly what we were looking for to cache our dynamically generated images.

How to use Amazon CloudFront with custom origin

Configuring Amazon CloudFront is very simple via AWS Management Console. Just create new CloudFront distribution and put your source web-server URL as custom origin.

Example

After some initialisation time your Amazon CloudFront distribution can be used to serve your files.

And what you also need to do is to check that resources served by your web-server have necessary HTTP headers to control how long CloudFront will cache them. There are well documented requirements to the resources Amazon CloudFront will cache. For our use-case most interesting HTTP headers would be the Cache-Control, Date, Last-Modified and ETag.

Amazon CloudFront uses Cache-Control header to control how long it should cache given resource. There is another Expires header available in the HTTP spec, however Cache-Control is preferred by CloudFront.

Apart from that we need to serve Date and Last-Modified headers that are required to compute expire value from Cache-Control header. The we would also need the ETag (more about it later).

That’s how a response from our origin server looks like (you can use curl -I http://webserver.com to see HTTP response headers):

According to the headers above this resource will be cached by Amazon CloudFront for 86400 seconds which equals to one day. And once we request same URL via CloudFront we would see the following headers. First time:

Note the ‘X-Cache: Miss from Amazon CloudFront‘ header, and second time:

Note the ‘X-Cache: Hit from Amazon CloudFront’ header – that means that our resource is now served directly from its cache.

Speed comparison

We can check how much fast it would be to to serve a single image. Please note that it is in no way a representative comparison of Amazon CloudFront capabilities, it is just a single test for two consequent HTTP requests. One request will do a HTTP redirect and another one will serve an image. As an origin server we use a single Heroku web-worker with node.js 0.6.8 on it. Here is the picture without caching:

Example

It took over 1005 milliseconds including 799 milliseconds latency to do these two HTTP requests (HTTP connection was kept open). And here is the same requests served from CloudFront (after cache & DNS wam-up): This is a w00ping 93 milliseconds (!) with only 35 milliseconds latency which is more than 10 times faster than without caching.

Example

Small bonus – HTTP 304

And apart from that you will get a nice bonus if you serve your ETag header. ETags are kind of digest of the resource that is used by modern browsers to save the bandwidth. When browser request a resource that is available in browser’s cache it will send a special If-Match header with the ETag of the resource from the cache. In this case Amazon CloudFront servers will compare ETag from browser with ETag from cache and will respond with HTTP 304 Not Modified without sending the resource again. Your web server might do the same for you, however it needs to be explicitly handled for dynamically generated content. With Amazon CloudFront you just need to supply your content with ETag and you will get this feature for free.

Summary

Amazon CloudFront in combination with custom origin provide us with a great caching and content delivery service. We can use it even for dynamically generated resources. In our (simplistic) test we’ve seen 10-fold increase in speed and latency when serving from CloudFront caches. With CloudFront we can declaratively control caching behaviour. With aggressive caching we can significantly reduce load on our servers therefore we need less servers/VMs hence reducing costs. CloudFront service is not free. Actual price for serving one GB from CloudFront is $0,120 which is for our use-cases definitely justify the lowering costs of server hours.

Update 1

Today (14 May 2012) there is a new update for Amazon CloudFront that will make it even more suitable for our use-cases. With the newest update we would be able to serve all static resources, even the once which are not dynamically generated via one CloudFront distribution (by using multiple origins). Another addition will allow us to specify image width as query parameters to resize images on the server-side. So we will keep you posted about our new developments in this blog, in the meantime you can follow us on twitter and read AWS blog about new Amazon CloudFront features.


About the Author

Renat Zubairov

Facebook Twitter

CEO at elastic.io


You might want to check out also these posts