On the other hand, there was also no performance improvement over time, file system cache can work great on a local machine with SSD, but on a cloud machine where the storage might not be as close, it is significantly slower than the local RAM or maybe even than a memcached instance.
So essentially, I needed to cache my cache.
One cache to rule them all
Django supports multiple Cache Backends, so you can define a local memory cache backend and a filebased backend. What I wanted to create, is a cache backend that chains the two together.
So here is the interface I wanted:
- get - try getting from the first cache in the chain, if exists return, else go to the next cache.
If there is hit in a deeper cache backend, update all the caches in the chain up to it - set - set the item in all the caches in the chain
That turned out to be very simple to implement:
from django.core.cache import BaseCache from django.core.cache import get_cache from lock_factory import LockFactory class ChainedCache(BaseCache): def __init__(self, name, params): BaseCache.__init__(self, params) self.caches = [get_cache(cache_name) for cache_name in params.get('CACHES', [])] self.debug = params.get('DEBUG', False) def add(self, key, value, timeout=None, version=None): """ Set a value in the cache if the key does not already exist. If timeout is given, that timeout will be used for the key; otherwise the default cache timeout will be used. Returns True if the value was stored, False otherwise. """ if self.has_key(key, version=version): return False self.set(key, value, timeout=timeout, version=version) return True def get(self, key, default=None, version=None): """ Fetch a given key from the cache. If the key does not exist, return default, which itself defaults to None. """ def recurse_get(cache_number = 0): if cache_number >= len(self.caches): return None cache = self.caches[cache_number] value = cache.get(key, version=version) if value is None: value = recurse_get(cache_number + 1) # Keep the value from the next cache in this cache for next time if value is not None: cache.set(key, value, version = version) # Got to use the default timeout... else: if self.debug: print 'CACHE HIT FOR', key, 'ON LEVEL', cache_number return value value = recurse_get() if value is None: if self.debug: print 'CACHE MISS FOR', key return default return value def set(self, key, value, timeout=None, version=None): """ Set a value in the cache. If timeout is given, that timeout will be used for the key; otherwise the default cache timeout will be used. """ # Just to be sure we don't get a race condition between different caches, lets use a lock here with LockFactory.get_lock(self.make_key(key, version = version)): for cache in self.caches: cache.set(key, value, timeout = timeout, version = version) def delete(self, key, version=None): """ Delete a key from the cache, failing silently. """ # Just to be sure we don't get a race condition between different caches, lets use a lock here with LockFactory.get_lock(self.make_key(key, version = version)): for cache in self.caches: cache.delete(key, version = version) def clear(self): """Remove *all* values from the cache at once.""" for cache in reversed(self.caches): cache.clear() # For backwards compatibility class CacheClass(ChainedCache): pass
And here are the settings:
CACHES = { 'staticfiles' : { 'BACKEND' : 'chained_cache.ChainedCache', 'CACHES' : ['staticfiles-mem', 'staticfiles-filesystem'], 'DEBUG' : False, }, 'staticfiles-filesystem' : { 'BACKEND': 'django.core.cache.backends.filebased.FileBasedCache', 'LOCATION': os.path.join(PROJECT_ROOT, 'static_cache'), 'TIMEOUT': 100 * 365 * 24 * 60 * 60, # A hundred years! 'OPTIONS': { 'MAX_ENTRIES': 100 * 1000 } }, 'staticfiles-mem' : { 'BACKEND': 'django.core.cache.backends.locmem.LocMemCache', 'LOCATION': 'staticfiles-mem' } }
You can also get the code in this gist
A few notes:
- I am using a named lock factory, which is also useful for other stuff. you can check it out in the gist.
Django is not strict about being thread safe in the cache backend so you can remove the lock altogether but I prefer it this way - calling "get" might cause a side effect of setting the item on the cache backends that missed. This might cause the item timeout to be larger than originally requested, but no larger than the sum of the default timeouts of the cache backends in the chain
Problem solved - let's go eat!
This is a great idea. Thanks for sharing. I had a similar problem where I needed to store files in S3 but serve them directly from the EC2 webserver (to tightly control access). The webserver wound up with a local filesystem cache of files in S3.
ReplyDeleteAfter a while, I was bit by Django's filebased cache backend culling strategy. Not really satisfied with solutions and alternatives, I created a new project called DiskCache (http://www.grantjenks.com/docs/diskcache/). DiskCache is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django. Your readers may be particularly interested in the Django cache benchmarks: http://www.grantjenks.com/docs/diskcache/djangocache-benchmarks.html