33
The Django feature that can slow your website to a crawl
Ordering Django querysets via
order_by
causes Django to ask the database to order the rows by a given column.As long as the number of rows being ordered is not too large then this operation should be quick. However, there is a situation where using the build-in
order_by
can trigger a series of very expensive and slow operations: getting a random row from the database.Django's build in way to get a random record is via
order_by('?')[0]
. Note though that the Django docs themselves give this warning:order_by('?')
queries may be expensive and slow
The reason for the poor performance is the fact Django generates the following inefficient SQL:
SELECT * FROM entry ORDER BY RAND() LIMIT 1
Let's break that SQL down:
SELECT * FROM entry
This tells the database to include all columns from the
entry
table.ORDER BY RAND()
This tells the database to generate a random number for every row (depending on database used. Some do it differently).
LIMIT 1
This tell the database to scan the generated numbers for the smallest one. This rows is then read.
Let that sink in: if we have 100,000 rows then 100,000 random numbers are first be generated, then all 100,000 are scanned for the smallest one. Generating random numbers is slow, and scanning is not quick. For perspective this whole operation smells a lot like mining Bitcoin but ultimately all we get out of it is we read one row from the database! A lot of work must be done for small pay-off.
If there are a lot of rows in the database and
order_by('?')[0]
is used then expect bad performance.In short, databases are not good at random. Applications are though. So consider splitting responsibilities between the database and the application:
This will mean two database reads are performed, but those will are very efficient operations and so significantly quicker than the database copying the table and ordering all the rows randomly.
With that in mind, that is why Django Doctor gives this advice when reviewing Pull Requests and it spots
order_by('?')[0]
being used:
Consider improving the layering of the solution by moving the logic to a custom model manager like so:
from random import randint
from django.db import models, transaction
class RandomManager(models.Manager):
@transaction.atomic
def random(self):
index = randint(0, self.count()-1)
return self.all()[index]
class Entry(models.Model):
objects = RandomManager()
random_item = Entry.objects.random()
Notice the use of
transaction.actomic
to make both the count and the row read happen in the same transaction, preventing a rare race conditions occurring if:Without a atomic transaction under very rare circumstances an
IndexError
could occur.33