I think I just killed The Great Typo Memory Leak of 2006 (not to be confused with last year’s great memory leak).

I think this was the leak that has been causing so many problems for Typo users on shared hosting systems lately. I’d been putting off debugging it, because there really aren’t any good tools for Ruby memory leak tracing, and there’s enough magic in Rails that makes it hard to even fully understand the control flow in your rails app, much less when memory is being allocated and freed.

Since Typo 4.0 is nearly ready for release, I was running out of time to fix it. Fortunately, during some benchmarking this morning I discovered something useful:

  1. With the cache disabled, my Typo processes stay around 30 MB.
  2. With the cache enabled, my Typo processes grew to 70 MB within a couple hundred hits to the same URL.

Since the cache-hit path is about 1/100th as long as the cache-miss path, it’s a lot easier to debug. And since we’re using our own reimplementation of Rails’s Action Cache, I have a decent understanding of the caching code. As it turns out, though, I think the normal action cache would have had the same problem.

The problem was that we have multiple before and after filters, and at least one of them saves state in the before filter and expects the after filter to free it. But suppose our filter order looks like this:

      render a page

The problem is that the action cache’s before filter will cancel the rest of the render chain if it gets a cache hit. So stateful_after_filter will never be called. So, whatever stateful_before_filter saved (by shoving it into a class variable, for instance) is now with us forever.

In general, you should be able to fix this by adjusting the order that you apply your filters. The first call to around_filter goes on the outside and all further calls are nested inside it. However, Typo currently applies a couple filters in a superclass controller, so there’s no easy way to move the caches_action call into the superclass. So, as a workaround, I changed the call to around_filter in caches_action_with_params to be prepend_around_filter instead of around_filter. Now my Typo processes stay around 32 MB even after thousands of cached hits.

Longer-term, I want to clean up our controller hierarchy, because I don’t fully understand why we need multiple layers of stacked controller classes when 95% of Typo is in one controller. But for now the memory leak is fixed and we’re back on track for Typo 4.0 in the next week or so.

There’s a lesson here to other Rails users–be careful nesting filters when you’re using the action cache (or any filter that can terminate the request), because you can’t depend on the after_filter being called. If you save things in the before_filter, expect leaks unless you’re very careful with your filter nesting order.