Unknown

This is a little war story about deleting trees in a database. It’s also a story for graph theory lovers.

Did you know that MySQL has a hard-coded recursion limit when performing cascade deletion?

Cascading operations may not be nested more than 15 levels deep.
MySQL 5.7 Reference Manual: 14.6.1.5 InnoDB and FOREIGN KEY Constraints

This story is about how we learned that fact. I hope it demonstrates some useful debugging techniques.

The War Story

The Error

Got a bug report about a self-referential Foreign Key Constraint (FKC):

Mysql2::Error: Got error 193 ...
FOREIGN KEY (`parent_id`)
REFERENCES `trees` (`id`) ON DELETE CASCADE'
from InnoDB: DELETE FROM `trees` WHERE ...

If you’ve implemented a tree in a database, you’re familiar with the parent_id technique. It is called an Adjacency List. Here, we’re trying to delete one tree from the trees table.

InnoDB Status

Whenever we get a FKC error, we first look at show engine innodb status:

LATEST FOREIGN KEY ERROR
...
DELETE FROM `trees` WHERE ...;
Foreign key constraint fails for table `trees`:
,
  CONSTRAINT `redacted` FOREIGN KEY (`parent_id`) REFERENCES `trees` (`id`) 
  ON DELETE CASCADE ON UPDATE CASCADE
Trying a too deep cascaded delete or update
...

Suspecting that cascade deletion is recursive, our eyes are immediately drawn to the words “too deep”. We know this is a safeguard to prevent infinite recursion, but we don’t yet know the limit.

Finding this key phrase in InnoDB status was the most important step in debugging this issue.

Find the Culprit Data

Suspect 1: Cycles

A cycle would cause infinite recursion.

We thought our trees table was acyclic. We can confirm that using a great ruby library, RGL.

require 'rgl/adjacency'
require 'rgl/topsort'
graph = RGL::DirectedAdjacencyGraph.new
graph.add_vertex('a')
graph.add_vertex('b')
graph.add_edge('a', 'b')
graph.add_edge('b', 'a')
graph.acyclic? #=> false

Running our actual data through the same process, we did not find any cycles.

Suspect 2: A big tree

Our trees table is a “forest”, ie. many disjoint trees. Generally, such a graph is said to have many connected components.

So, what if there’s a big tree in there? We don’t know what the limit is, but we know we went “too deep”.

Here again we can use RGL to identify our connected_components, and find which of them is largest.

def print_connected_component_info(graph)
  components = []
  graph.each_connected_component { |obj| components << obj }
  puts format('graph has %d components', components.length)
  largest_component = components.max { |a, b|
    a.length <=> b.length
  }
  puts format('largest component: %s', largest_component)
end

At this point, we had our answer, we saw our largest component (a tree) had a depth of 19. Deleting smaller trees worked, deleting a depth-19 tree did not.

Eventually, StackOverflow led us to the correct documentation, and we learned that the limit is 15.

Visualizing a Graph with graphviz

RGL is capable of producing a .dot file readable by graphviz. Our trees table looks like this:

A Few Good Trees

In the real picture, our depth-19 tree was very easy to identify. A picture is worth a thousand words.

Good Tools, Useful Techniques

If you work with trees in a database, these are good tools and techniques. I hope they help you some day.