Estimated counts for faster Django Admin change_list
April 19, 2017 • programming
Short story of how we reduced the response time of some of our admin pages by 1000x.
The Problem
One of our tables grew to over 25 million rows. Which, while not a large number, makes the most frequent queries slow.
A direct effect it had was on our Admin panel. The change_list
form for the model started taking ages to load. Our Operations team uses the Admin 24*7 for most of what we do, we could see them grumbling about how slow things have become as they had to sit for 2–3 seconds minimum before the page would load completely. This lead to low efficiency for at least 10 people every day. This had to be fixed.
We had already done stuff like using select_related
or prefetch_related
on the related items to reduce the number of queries, what else could we do? On some investigation (Thank you DDT), we found out that a certain count
query was taking 98% of the total time taken by SQL queries on that page. It was clear that this count
query would be painfully slow for large tables. Why was this query being made? For the admin’s paginator to work.
The Solution
It was clear that the pagination logic will have to change. Our requirement with the admin was just to give an estimate of the number of rows that were there, and be able to navigate from one page to the another. For that, we would need a faster way to get the count of rows in a table.
We are using PostgreSQL and it has wonderful docs for things like how to estimate counts. PostgreSQL maintains metadata about the database in system catalogs. One such catalog is the pg_class
. On every VACUUM
, ANALYZE
, or for commands like CREATE INDEX
, postgres updates the relevant rows in pg_class
. So instead of getting count like:
SELECT COUNT(*) AS "__count" FROM "my_table"
We could get a rough estimate by doing:
SELECT reltuples FROM pg_class WHERE relname = 'my_table'
This query takes no more than 1 ms to complete!
To integrate it with Django’s ModelAdmin
seamlessly, we created a custom Paginator:
from django.contrib.admin import ModelAdmin | |
class MyTableAdmin(ModelAdmin): | |
... | |
paginator = LargeTablePaginator | |
... | |
from django.core.paginator import Paginator | |
class LargeTablePaginator(Paginator): | |
""" | |
Warning: Postgresql only hack | |
Overrides the count method of QuerySet objects to get an estimate instead of actual count when not filtered. | |
However, this estimate can be stale and hence not fit for situations where the count of objects actually matter. | |
""" | |
def _get_count(self): | |
if getattr(self, '_count', None) is not None: | |
return self._count | |
query = self.object_list.query | |
if not query.where: | |
try: | |
cursor = connection.cursor() | |
cursor.execute("SELECT reltuples FROM pg_class WHERE relname = %s", | |
[query.model._meta.db_table]) | |
self._count = int(cursor.fetchone()[0]) | |
except: | |
self._count = super(LargeTablePaginator, self)._get_count() | |
else: | |
self._count = super(LargeTablePaginator, self)._get_count() | |
return self._count | |
count = property(_get_count) |
And we are sorted. Lightening fast page loads are back, and our Operations team is happy.
UPDATE: The code has been updated to be compatible with Django ≤ 1.11
The Future
You can extend the work done here to make a more generic library:
- By making a queryset method that returns the approximate count. This method can then be used anywhere you just need an estimate of the count.
- By checking for
connection.vendor
before estimating, and using database specific queries.
and, The Gotchas
The custom paginator is not suitable with tables having small number of rows. This is because:
- Sometimes the admin will not show any rows for a given model. That will be because the
reltuples
for that table inpg_class
returned0
. This happens with small tables which have never been analysed/vacuumed. (Running anANALYSE
for these tables will fix these, but read point 2) - They generally get analysed/vacuumed with lesser frequency and the percentage deviation from the actual values will be higher here.
Other than that, this just works with PostgreSQL (but there should be similar ways to estimate counts in other databases as well).
(Originally posted on my Medium account)