Hack Django ORM to calculate Median and Percentiles (Or make annotations great again!)

Hacks should be quick. And so should be the articles about them.

Photo by Caspar Rubin on Unsplash

Problem

We needed to calculate medians, percentiles for some quantities for our own ETL system (note to self: write a post on this) grouped by certain fields.

Some options we had:

  1. Extra: But Django says this will be deprecated, and use it as a last resort. We still had resorts to explore.
  2. Raw SQL: Besides all usual bad things with writing RAW SQL (look at the number of warnings on the linked page!), the code starts to look ugly.

So what was the best way to do it?

Come on! Django also gives us something called a RawSQL. Great. So we can just use it to get the percentiles we wanted. Right?

Wrong. As we realised later, RawSQL is better suited for aggregates and not annotations. Exhibit:

Notice how our Raw expression percentile_disc(0.9) WITHIN GROUP (ORDER BY duration) also gets added to the GROUP BY clause?

This would not happen if we remove the Avg("duration") from annotation. So basically, if the query already has a GROUP BY clause, RawSQL will add the sql to the GROUP BY clause as well.

This is not what we want. It also didn’t make sense to us, why is that needed? Maybe when we want to use RawSQL in an order_by and want the expression to get added to GROUP BY automatically? Maybe.

Solution

We dug deep as to why is the sql added to GROUP BY. Looked at the source code, found this method get_group_by_cols which returns self. Super sensible naming by Django devs. I knew we could do something here. Ergo, the Hack:

We created a class RawAnnotation and overrode get_group_by_cols to return an empty array. And now it works as expected.

Yay.

(Originally posted on my Medium account)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s