William Schroeder
Mar 23, 2023 2 min read

On the Unreasonable Effectiveness of NULL Checks

If you filter for values on a NULLABLE column, strongly consider adding a NOT NULL check as well.

NULLABLE columns in a relational database are useful: they can save space, and they can be used for 1 to 0 or 1 relationship situations. However, they will affect your query performance in surprising ways, and one must cover for them by making queries a little more complex.

Our database includes a table for tracking individual activities of our players. These activities may reference a single learning objective, so we have the NULLABLE field learning_objective_id on our player_activities table that is a foreign key to our learning_objective table. Without discussing the alternatives to this design, let’s focus on the performance effects.

First, the table is large, so let’s get a baseline:

psql 1> explain select count(*) from player_activities;
                           QUERY PLAN
-----------------------------------------------------------------
 Finalize Aggregate  (cost=5365250.92..5365250.93 rows=1 width=8)
...

We have an index for learning_objective_id on this table. Here is the cost for searching for a specific kind of player activity by learning objective:

psql 1> explain select count(*) from player_activities where learning_objective_id = 1;
                           QUERY PLAN
-----------------------------------------------------------------
 Aggregate  (cost=7868.23..7868.24 rows=1 width=8)

This cost may not be terrible in the grand scheme of things, and we can celebrate, but if this query were embedded in something more complex, such as with nested subqueries, groupings, and joins, we might be thankful for an improvement. Look what happens when we add a NULL check on this column:

psql 1> explain select count(*) from player_activities where learning_objective_id is not null and learning_objective_id = 1;
                           QUERY PLAN
-----------------------------------------------------------------
 Aggregate  (cost=4780.10..4780.11 rows=1 width=8)

This is a feature of using a NULLABLE column, not one of value. For example, we could instead look for learning_objective_ids that are not some other real value and see that the cost is actually worse than just looking the one value:

psql 1> explain select count(*) from player_activities where learning_objective_id != 2 and learning_objective_id = 1;
                           QUERY PLAN
-----------------------------------------------------------------
 Aggregate  (cost=7935.40..7935.41 rows=1 width=8)

The general rule of thumb is: if you are filtering or joining on a NULLABLE column, also check that it is NOT NULL. This is not a perfect rule, and we have run into rare situations where the extra check made performance cost worse. As always with optimization advice, profile!

« The pganalyze Postgres Query Planner Presentation Matrix Testing in Elixir »

Legends of Learning - Developer Talk

On the Unreasonable Effectiveness of NULL Checks

Explore →