« Kevin Rose and the flash player | Home | as3 logger »

June 14, 2010

Cloud storage without bandwidth costs

"We need to store on the cloud but be able to control our bandwidth costs"

Our client generates around 100 GB of video for the web every month so looking for cloud storage was a no-brainer.

We needed an "unlimited" size hard drive but didn't need a CDN service, and it was specially important to control our costs, at least initially.

Here is how we achieved the goal.
After testing all cloud storage options we knew, we ended up with Rackspace CloudFiles. Personally i've been using rackspace services for a while and can't be happier with their service, but it was it's ability to communicate through their internal network what made us choose it.

Cloudfiles servers are in DFW rackspace datacenter, so if we had a vps server on the same building we could be able to communicate via their internal network with bandwidth charges.

We built a simple django app that will receive a file request, retrieve it via the internal network with their python binding and store it on a varnish server for cache.

It works like this:

User requests a file -> nginx receives the request and proxies (load balance) the call to a varnish server -> varnish executes the django backend and stores the file in memory -> returns the call to nginx that adds gzip headers (if required).

Next time a user requests the file it will be retrieved from varnish's cache so we don't hit cloudfiles again. We have a purging process that expires files as we update them but that's for another blog post.

Varnish is an awesome server. We tested Squid configured as a reverse proxy but it was not close to what we could achieve with varnish. The only thing i'm not really happy with varnish is not being able to add gzip compression to responses; hence why we are going through nginx again.

Is this permitted by rackspace? Yes it is.

With this solution we are able to control our costs and expand as we grow.

Suggestions and/or comments are more than welcome.

-- fernando

4 Comments

Note that recent versions of nginx have reverse caching proxy support. It's even faster than varnish, and it has gzip support.

Also, what you describe is a very well known cloud pattern. Something you have to be very careful with (as with any caching reverse proxy), is caching invalidation. Usually the best way to deal with this is using new urls in combination with mod_rewrite, and letting the proxy LRUs do their work.

Be aware that, as with any unix app that uses files, nginx will use the system's caches. So even if you configure a directory in nginx's proxy_cache_path, you end up serving everything from memory (assuming you have enough free RAM and your server is configured correctly, something you already did if you are using varnish).