originally posted on medium
One of our tables grew to over 25 million rows. Which, while not a large number, makes the most frequent queries slow.
A direct effect it had was on our Admin panel. The
change_list form for the model started taking ages to load. Our Operations team uses the Admin 24*7 for most of what we do, we could see them grumbling about how slow things have become as they had to sit for 2–3 seconds minimum before the page would load completely. This lead to low efficiency for at least 10 people every day. This had to be fixed.
We had already done stuff like using
prefetch_related on the related items to reduce the number of queries, what else could we do? On some investigation (Thank you DDT), we found out that a certain
count query was taking 98% of the total time taken by SQL queries on that page. It was clear that this
count query would be painfully slow for large tables. Why was this query being made? For the admin’s paginator to work.
It was clear that the pagination logic will have to change. Our requirement with the admin was just to give an estimate of the number of rows that were there, and be able to navigate from one page to the another. For that, we would need a faster way to get the count of rows in a table.
We are using PostgreSQL and it has wonderful docs for things like how to estimate counts. PostgreSQL maintains metadata about the database in system catalogs. One such catalog is the pg_class. On every
ANALYZE, or for commands like
CREATE INDEX, postgres updates the relevant rows in
pg_class. So instead of getting count like:
We could get a rough estimate by doing:
This query takes no more than 1 ms to complete!
To integrate it with Django’s
ModelAdmin seamlessly, we created a custom Paginator:
And we are sorted. Lightening fast page loads are back, and our Operations team is happy.
You can extend the work done here to make a more generic library:
- By making a queryset method that returns the approximate count. This method can then be used anywhere you just need an estimate of the count.
- By checking for
connection.vendorbefore estimating, and using database specific queries.
and, The Gotchas
The custom paginator is not suitable with tables having small number of rows. This is because:
- Sometimes the admin will not show any rows for a given model. That will be because the
reltuplesfor that table in
0. This happens with small tables which have never been analysed/vacuumed. (Running an
ANALYSEfor these tables will fix these, but read point 2)
- They generally get analysed/vacuumed with lesser frequency and the percentage deviation from the actual values will be higher here.
Other than that, this just works with PostgreSQL (but there should be similar ways to estimate counts in other databases as well).