me.writelines(): Django Cache Chaining

In a previous post, I used Django's FileBasedCache to synchronize the static files version with the code the version. One of the advantages of this method was that there was no performance hit when a process is recycled, since the cache starts "full".
On the other hand, there was also no performance improvement over time, file system cache can work great on a local machine with SSD, but on a cloud machine where the storage might not be as close, it is significantly slower than the local RAM or maybe even than a memcached instance.

So essentially, I needed to cache my cache.

One cache to rule them all

Django supports multiple Cache Backends, so you can define a local memory cache backend and a filebased backend. What I wanted to create, is a cache backend that chains the two together.

So here is the interface I wanted:

get - try getting from the first cache in the chain, if exists return, else go to the next cache.
If there is hit in a deeper cache backend, update all the caches in the chain up to it
set - set the item in all the caches in the chain

That turned out to be very simple to implement:

from django.core.cache import BaseCache
from django.core.cache import get_cache
from lock_factory import LockFactory

class ChainedCache(BaseCache):
    def __init__(self, name, params):
        BaseCache.__init__(self, params)
        self.caches = [get_cache(cache_name) for cache_name in params.get('CACHES', [])]
        self.debug = params.get('DEBUG', False)

    def add(self, key, value, timeout=None, version=None):
        """
        Set a value in the cache if the key does not already exist. If
        timeout is given, that timeout will be used for the key; otherwise
        the default cache timeout will be used.

        Returns True if the value was stored, False otherwise.
        """
        if self.has_key(key, version=version):
            return False
        self.set(key, value, timeout=timeout, version=version)
        return True

    def get(self, key, default=None, version=None):
        """
        Fetch a given key from the cache. If the key does not exist, return
        default, which itself defaults to None.
        """
        def recurse_get(cache_number = 0):
            if cache_number >= len(self.caches): return None
            cache = self.caches[cache_number]
            value = cache.get(key, version=version)
            if value is None:
                value = recurse_get(cache_number + 1)
                # Keep the value from the next cache in this cache for next time
                if value is not None: cache.set(key, value, version = version) # Got to use the default timeout...
            else:
                if self.debug: print 'CACHE HIT FOR', key, 'ON LEVEL', cache_number
            return value

        value = recurse_get()
        if value is None:
            if self.debug: print 'CACHE MISS FOR', key
            return default
        return value

    def set(self, key, value, timeout=None, version=None):
        """
        Set a value in the cache. If timeout is given, that timeout will be
        used for the key; otherwise the default cache timeout will be used.
        """
        # Just to be sure we don't get a race condition between different caches, lets use a lock here
        with LockFactory.get_lock(self.make_key(key, version = version)):
            for cache in self.caches:
                cache.set(key, value, timeout = timeout, version = version)

    def delete(self, key, version=None):
        """
        Delete a key from the cache, failing silently.
        """
        # Just to be sure we don't get a race condition between different caches, lets use a lock here
        with LockFactory.get_lock(self.make_key(key, version = version)):
            for cache in self.caches:
                cache.delete(key, version = version)

    def clear(self):
        """Remove *all* values from the cache at once."""
        for cache in reversed(self.caches):
            cache.clear()


# For backwards compatibility
class CacheClass(ChainedCache):
    pass

And here are the settings:

CACHES = {
    'staticfiles' : {
        'BACKEND' : 'chained_cache.ChainedCache',
        'CACHES' : ['staticfiles-mem', 'staticfiles-filesystem'],
        'DEBUG' : False,
    },
    'staticfiles-filesystem' : {
        'BACKEND': 'django.core.cache.backends.filebased.FileBasedCache',
        'LOCATION': os.path.join(PROJECT_ROOT, 'static_cache'),
        'TIMEOUT': 100 * 365 * 24 * 60 * 60, # A hundred years!
        'OPTIONS': {
            'MAX_ENTRIES': 100 * 1000
        }
    },
    'staticfiles-mem' : {
        'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
        'LOCATION': 'staticfiles-mem'
    }
}

You can also get the code in this gist

A few notes:

I am using a named lock factory, which is also useful for other stuff. you can check it out in the gist.
Django is not strict about being thread safe in the cache backend so you can remove the lock altogether but I prefer it this way
calling "get" might cause a side effect of setting the item on the cache backends that missed. This might cause the item timeout to be larger than originally requested, but no larger than the sum of the default timeouts of the cache backends in the chain

Problem solved - let's go eat!

1 comment:

GrantMarch 21, 2016 at 8:47 PM
This is a great idea. Thanks for sharing. I had a similar problem where I needed to store files in S3 but serve them directly from the EC2 webserver (to tightly control access). The webserver wound up with a local filesystem cache of files in S3.

After a while, I was bit by Django's filebased cache backend culling strategy. Not really satisfied with solutions and alternatives, I created a new project called DiskCache (http://www.grantjenks.com/docs/diskcache/). DiskCache is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django. Your readers may be particularly interested in the Django cache benchmarks: http://www.grantjenks.com/docs/diskcache/djangocache-benchmarks.html

Sunday, September 16, 2012

Django Cache Chaining

One cache to rule them all

1 comment: