Check yo queries before you wreck yo site

When building high performance Django sites, keeping the number of queries down is essential. And just like controlling technical debt and maintaining test coverage, being successful means making monitoring queries a natural part of your workflow. If your momentum has to be stopped to examine database queries, you’re not going to do it. The solution for most developers is Django-Debug Toolbar, which sits off to the side in your web browser, but I’m going to share the way I do it.

I do like Django-Debug Toolbar, but most of the time, it just gets in my way: it only works on pages that return HTML, the overhead adds to the response time, and it modifies the response (adding to the response size and render time). What I prefer is displaying SQL queries along HTTP requests in runserver’s output:

Terminal Output with Colorizing Output and SQL

There’s three parts to getting this to work:

  1. ColorizingStreamHandler — this lets me distinguish http requests (gray/info) from SQL queries (blue/debug)
  2. ReadableSqlFilter — this reformats output by stripping the SELECT arguments so you can focus on the WHERE clauses
  3. Opt-in — having SQL spat out everywhere can be distracting, so it’s opt-in with an environment variable

Getting started

ColorizingStreamHandler and ReadbleSqlFilter are a logging handler and a logging filter packaged in project_runpy. Add it to your Django project with pip install project_runpy (no installed apps changes needed). They get thrown into your Django logging configuration like:

[python]LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'root': {
        'level': os.environ.get('LOGGING_LEVEL', 'WARNING'),
        'handlers': ['console'],
    },
    'filters': {
        'require_debug_false': {
            '()': 'django.utils.log.RequireDebugFalse',
        },
        'require_debug_true': {
            '()': 'django.utils.log.RequireDebugTrue',
        },
        'readable_sql': {
            '()': 'project_runpy.ReadableSqlFilter',
        },
    },
    'handlers': {
        'console': {
            'level': 'DEBUG',
            'class': 'project_runpy.ColorizingStreamHandler',
        },
    },
    'loggers': {
        'django.db.backends': {
            'level': 'DEBUG' if env.get('SQL', False) else 'INFO',
            'handlers': ['console'],
            'filters': ['require_debug_true', 'readable_sql'],
            'propagate': False,
        },
        'factory': {
            'level': 'ERROR',
            'propagate': False,
        },
    }
}
[/python]

From: https://github.com/crccheck/crccheck-django-boilerplate/blob/master/project/project_name/settings.py#L84

I found that the logging can get obtrusive. It’ll show up in your runserver, your shell, when you run scripts, and even in your iPython notebooks. So using a short environment variable makes it easy to flip on and off. I’m also using project_runpy’s  env environment variable getter, but if you don’t want that, I suggest using  if 'SQL' in os.environ to avoid how  '0' and  'False' evaluate to  True when reading environment variables.

I was afraid that adding ReadableSqlFilter would add a performance penalty, but I don’t notice any. The extra verbosity is also nice when you have a view with several expensive queries because you can see them run before the page is rendered.

How to use this

Just code as usual! If you’re like me, you keep at least part of your runserver terminal visible. If you are me, hello! Use your peripheral vision to look for long flashes of blue text. When you do, you know you’ve hit something that’s making too many queries. The main culprits will be missing select_relateds and prefetch_relateds. Eventually, you’ll develop a sixth sense for when your querysets could be better, and then you’ll look back at your old code like this:

that-queryset-ain't-right

Continuing

A warning: don’t think too much about the query time reported. You can’t compare database queries done locally with what happens in production. One thing we did confirm is 10 one second queries are much slower than one 10 second query, even in the same AWS availability zone. Something that isn’t as obvious when everything is local. Just pay attention to the number of queries.

Finally, if you really want to keep your database hits low as you continue to develop, document them in your tests. Django makes an assertion available called  assertNumQueries that you can throw in your tests just to document how many queries an operation takes. They don’t even have to be good; for example, I wrote these scraper tests to document that my code currently makes too many database queries. It’s similar to making sure your views return 200s. Make sure you know how many queries you’re getting yourself into.

Too Many Queries

Leave a Reply

Your email address will not be published. Required fields are marked *