Lazy evaluation of Django ORM objects

When you create a Django QuerySet object, no database activity occurs until you do something to evaluate the queryset. Evaluation is forced by the following: iterating over it,  calling len() or list(), slicing it with the ‘step’ parameter, or testing it in a boolean context.

How data is held in memory

When a queryset is created the cache is empty. When it is evaluated and database interaction occurs, the results of the query are stored and the requested results are returned. In many cases the queryset should be stored and re-used instead of consuming, in order to avoid unnecessary database lookups:

entry = Entry.objects.get(id=1)
entry.blog  # Blog object is retried
entry.blog  # cached version, no DB lookup

entry.authors.all()    #query performed
entry.authors.all()    #query performed again

As caching objects can involve significant memory usage, if the queryset will not need to be re-used sometimes then there is no need for it to be cached. As well as caching of querysets, there is also caching of attributes of ORM objects. In general, attributes of ORM objects that are not callable will be cached whereas callable attributes cause DB lookups every time.

Retrieve everything you need in one hit

But not the things you don’t need. Using QuerySet.values() can signficantly reduce the overhead from a database lookup and is useful where when you just need a dictionary or a list of the values, not the ORM model objects. QuerySet.select_related() is useful for lookups spanning multiple tables:

class Album(models.Model):
    title = models.CharField(max_length=50)
    year = models.IntegerField()

Song(models.Model):
    name = models.CharField(max_length=50)
    album = models.ForeignKey(Album)

song = Song.objects.get(id=5) # query performed
album =  song.album # query performed again

song = Song.objects.select_related(‘album’).get(id=5)
song.album # database query not required

QuerySet.select_related() works by creating an SQL join and including the fields of the related object in the SELECT statement. For this reason, select_related gets the related objects in the same database query. However, to avoid the much larger result set that would result from joining across a ‘many’ relationship, select_related is limited to FK and one-to-one relationships.

QuerySet.prefetch_related() serves a similar purpose, but the strategy is quite different. It does a separate lookup for each relationship, and does the “joining” in Python.  This allows it to prefetch many-to-many and many-to-one objects, which cannot be done using select_related.

Advertisements

Leave a Reply