Lazy evaluation of Django ORM objects

When you create a Django QuerySet object, no database activity occurs until you do something to evaluate the queryset. Evaluation is forced by the following: iterating over it,  calling len() or list(), slicing it with the ‘step’ parameter, or testing it in a boolean context.

How data is held in memory

When a queryset is created the cache is empty. When it is evaluated and database interaction occurs, the results of the query are stored and the requested results are returned. In many cases the queryset should be stored and re-used instead of consuming, in order to avoid unnecessary database lookups:

entry = Entry.objects.get(id=1)
entry.blog  # Blog object is retried
entry.blog  # cached version, no DB lookup

entry.authors.all()    #query performed
entry.authors.all()    #query performed again

As caching objects can involve significant memory usage, if the queryset will not need to be re-used sometimes then there is no need for it to be cached. As well as caching of querysets, there is also caching of attributes of ORM objects. In general, attributes of ORM objects that are not callable will be cached whereas callable attributes cause DB lookups every time.

Retrieve everything you need in one hit

But not the things you don’t need. Using QuerySet.values() can signficantly reduce the overhead from a database lookup and is useful where when you just need a dictionary or a list of the values, not the ORM model objects. QuerySet.select_related() is useful for lookups spanning multiple tables:

class Album(models.Model):
    title = models.CharField(max_length=50)
    year = models.IntegerField()

Song(models.Model):
    name = models.CharField(max_length=50)
    album = models.ForeignKey(Album)

song = Song.objects.get(id=5) # query performed
album =  song.album # query performed again

song = Song.objects.select_related(‘album’).get(id=5)
song.album # database query not required

QuerySet.select_related() works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to FK and one-to-one relationships.

QuerySet.prefetch_related() serves a similar purpose, but the strategy is quite different. It does a separate lookup for each relationship, and does the “joining” in Python.  This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related.

Advertisements

Dictionaries in Python

Python’s way of storing key-value pairs, a fundamental data structure in computer science. The data type is summarized in the official documentation as “an unordered set of key: value pairs, with the requirement that the keys are unique”. Dictionaries can be indexed by any immutable data type and the stored values accessed in the following ways:

value = d.get[key]

value = d.get(key)

value = d.get(key, "no data")

Whereas using [key] will return a KeyError if the key does not exist, the .get method will either return None, or a default value if specified as an optional parameter. Values within nested dictionaries, such as deserialized JSON data, can be accessed by the successive use of [key] or .get(key):

sales = {'data':{'orders':{'january':240}}}

sales['data']['orders']['january']

sales.get('data').get('orders').get('january')

sales['data']['orders'].get('january')

The following are all valid ways of creating dictionaries:

my_dict = {'key1': 'value1', 'key2': 'value2'}

my_dict = dict(key1='value1',key2='value2')

my_dict = {x: x**2 for x in values}

my_dict = dict(zip(keys, values))

When the keys are simple strings, it can be useful to pass in the keys as keywords to the dict() constructor. This is the most performant way of creating dictionaries and useful for the generation of arbitrary keys and values. Using the zip function inside the dict() constructor is particularly useful for creating dictionaries from lists of keys and values.

Dictionaries are unordered, except in Python 3.6+. To store the insertion order of keys, the dictionary sub-class OrderedDict can be used after importing it from the collections module in the standard library.

Avoiding multi-table inheritance in Django Models

Model inheritance does not have a natural translation to relational database architecture and so models in Django should be designed in order to avoid impact on database performance. When there is no need for the base model to be translated into a table abstract inheritance should be used instead of multi-table inheritance.

Given the following model:

class Person(Model):
  name = CharField()
  …

class Employee(Person):
  department = CharField()
  …

Two tables will be created and what looks like a simple query to the Employee child class will actually involve a join automatically being created. The same example with abstract = True in the Meta class allows abstract inheritance:

class Person(Model):
  name = CharField()
  …

class Meta:
  abstract = True

class Employee(Person):
  department = CharField()
  …

By putting abstract = True, the extra table for the base model is not created and the fields within the base model are automatically created for each child model. This avoids unnecessary joins being created to access those fields. This way of using model inheritance also avoids repetition of code within the child classes.

Quickly get memcached working in Python Django

As with most frameworks, the Django framework for Python can make use of caching to greatly improve performance for many common requests. Here we will look at using memcached as it enjoys good Django support and production use although there is also Redis support which definitely improves on memcached in some aspects such as data persistence.

  1. The first step is to install memcached on your server:
  2. RedHat Linux:

    yum install memcached

    Ubuntu / Debian Linux:

    apt-get install memcached
  3. Let Django know how to access memcached:
  4. In Django’s settings.py file, add the following line:

    'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache'
  5. Load the cache within your application
  6. from django.core.cache import cache
  7. Save the value to the cache
  8. cache.set('exampleValue',exampleValue)
  9. Retrieve the value from the cache
  10. exampleValue = cache.get('exampleValue')

The beauty being that exampleValue can be anything from a computed / database retrieved value to large blocks of static text or a URL etc.

The only problem with caches is they don’t always contain the data you expect, what if the value got flushed or hasn’t yet been stored? Lets rewrite step 5 to handle the event of the value not being available in the cache:

exampleValue = cache.get('exampleValue')
if not exampleValue:
     exampleValue = exampleValueLookup
     cache.set('exampleValue',exampleValue)

Here we see the value exampleValue being retrieved with a backup regeneration if the value has not been set. In a real application this would usually be encapsulated in a getExampleValue function or somewhere appropriate.