Ding, dong, the memory leak in Typo is dead

Posted by Scott Laird Sat, 08 Jul 2006 19:02:00 GMT

I think I just killed The Great Typo Memory Leak of 2006 (not to be confused with last year’s great memory leak).

I think this was the leak that has been causing so many problems for Typo users on shared hosting systems lately. I’d been putting off debugging it, because there really aren’t any good tools for Ruby memory leak tracing, and there’s enough magic in Rails that makes it hard to even fully understand the control flow in your rails app, much less when memory is being allocated and freed.

Since Typo 4.0 is nearly ready for release, I was running out of time to fix it. Fortunately, during some benchmarking this morning I discovered something useful:

  1. With the cache disabled, my Typo processes stay around 30 MB.
  2. With the cache enabled, my Typo processes grew to 70 MB within a couple hundred hits to the same URL.

Since the cache-hit path is about 1/100th as long as the cache-miss path, it’s a lot easier to debug. And since we’re using our own reimplementation of Rails’s Action Cache, I have a decent understanding of the caching code. As it turns out, though, I think the normal action cache would have had the same problem.

The problem was that we have multiple before and after filters, and at least one of them saves state in the before filter and expects the after filter to free it. But suppose our filter order looks like this:

  stateful_before_filter
    action_cache_before_filter
      render a page
    action_cache_after_filter
  stateful_after_filter

The problem is that the action cache’s before filter will cancel the rest of the render chain if it gets a cache hit. So stateful_after_filter will never be called. So, whatever stateful_before_filter saved (by shoving it into a class variable, for instance) is now with us forever.

In general, you should be able to fix this by adjusting the order that you apply your filters. The first call to around_filter goes on the outside and all further calls are nested inside it. However, Typo currently applies a couple filters in a superclass controller, so there’s no easy way to move the caches_action call into the superclass. So, as a workaround, I changed the call to around_filter in caches_action_with_params to be prepend_around_filter instead of around_filter. Now my Typo processes stay around 32 MB even after thousands of cached hits.

Longer-term, I want to clean up our controller hierarchy, because I don’t fully understand why we need multiple layers of stacked controller classes when 95% of Typo is in one controller. But for now the memory leak is fixed and we’re back on track for Typo 4.0 in the next week or so.

There’s a lesson here to other Rails users–be careful nesting filters when you’re using the action cache (or any filter that can terminate the request), because you can’t depend on the after_filter being called. If you save things in the before_filter, expect leaks unless you’re very careful with your filter nesting order.

Tags , , ,  | 6 comments

Push me, pull me

Posted by Scott Laird Mon, 05 Sep 2005 01:25:00 GMT

Someone pointed out today that none of the “convert from your old blog system to Typo” converters in the current Typo development tree were working. They all produce articles without any HTML in them. This was caused by my big filter update from a week or so ago; apparently no one has tried to convert directly to a development version of Typo in the last week or two. The problem is that none of the text filters were running. Unfortunately, there’s no easy way to make them run because they need access to a working Rails controller, and there isn’t one available from inside of the converters.

At the same time, Piers Cawley asked for an easy way to rebuild all of the HTML generated by filters on his site–he was doing filter development and he needed to rebuild everything. Unfortunately, the filter design doesn’t make this easy, either.

These two are basically the same problem–the way that we run text filters is kind of painful in the current Typo tree. In Typo 2.5 and earlier, filters were applied at the model level, and nothing outside of the model really needed to worry about them–the filters were automatically applied every time that the article (or comment, or page) body changed. Due to the changes in the dev tree, this just isn’t possible any more, but I’d tried to hack it together by changing the dozen or so actions that changed Articles. It worked, but it was ugly, and it breaks when something like a converter needs to create a new article, because the converter has no way to run the filter.

So I’ve been making a few changes to Typo.

The basic problem is that we’ve been using a “push” model for updating the HTML version of articles, when we should really be using a “pull” model. That is, instead of updating the HTML when the article changes, we should really be generating the HTML when the article is viewed and then caching the HTML so we don’t have to do it more then once per article.

Fortunately, this change was pretty easy to make–I just had to search for every reference to body_html, extended_html, or full_html and change it to a reference to article_html(article). Then I moved the filter calls into article_html(article), saving the generated HTML back into article.body_html.

Once that was done, I could rip out all of the complicated filtering code that I’d had to put in to make the new filters work right, and everything Just Worked. I had to tweak a few tests that expected the HTML to be available in the database immediately after posting new content, but I already had tests that verified that the content viewed right, so it was just a matter of removing code, not really adding new code.

There’s one more change that I’m debating making. From an architectural standpoint, we shouldn’t really be stuffing things back into body_html–we should be using Rails’ fragment cache. Switching to the fragment cache would be trivial, it would only take a couple extra lines in article_html, and then I could rip a bunch of lines in the editor actions, because I could use a sweeper instead of explicit calls to article.body_html = nil.

Unfortunately, if we do that then we’ll end up killing Typo’s performance when it’s running in development mode, because caching is disabled in dev mode. So it’d be cleaner, but probably too slow to be useful. I’ll probably revisit this again before the next Typo release–there are a bunch of performance tweaks that we need to make before the next release; once those are done, we might be able to stand the performance hit.

Posted in  | Tags , , ,  | 3 comments

Introduction to Typo filters

Posted by Scott Laird Wed, 24 Aug 2005 02:13:00 GMT

Although blogs are inherently HTML-based, HTML isn’t really a great format for writting plain-text documents. If nothing else, manually adding <p> and </p> around paragraphs interrupts the flow of writing. Most people would prefer to write in a more user-friendly manner, either via a GUI editor or a light-weight markup language like Markdown, which is then translated to HTML automatically. Very few people really want to write raw HTML blog postings on a daily basis.

Out of the box, Typo 2.5 supports the Markdown and Textile markup languages and the SmartyPants HTML-post processing filter, which adds typographical quotes and dashes to HTML. Adding additional filters is difficult because the filter setup is hard-coded into Typo. One of the new features that I’ve been working to add to Typo is the ability to easily add new text filters via filter plugins, similar to the sidebar plugins in Typo 2.5. At the same time, I’ve also added several new filtering plugins that extend Typo’s abilities in a number of useful ways.

The goal of all of this is to make it easier to write using Typo. I’ve tried to find things that cause me pain and then fix them. I want to make it easy to do common writing tasks without having to fire up an external tool. Admittedly, my definition of “common writing tasks” is probably different from most people’s, but the easy ability to extend Typo’s filtering system will allow people to adapt Typo to their own needs without having a deep understanding of Typo’s internals.

Read more...

Posted in  | Tags , , , ,  | 6 comments

Typo filters nearing completion

Posted by Scott Laird Sun, 21 Aug 2005 06:19:50 GMT

My little Typo filter project is finally nearing completion. I think I’ve been working on this for almost two weeks now, which makes it the most time-consuming Typo project that I’ve undertaken yet. I’ve added about 400 lines of code and 200 lines of new tests, and changed at least 400 more lines. Typo’s current trunk is only 2600 lines long, so I’ve touched almost 40% of the code.

At this point, almost everything works. I can drop new filters into components/plugins/textfilters and they’re immediately available for use. All of the current filters work (Textile, Markdown, SmartyPants), and I’ve added several new filters as well. There’s still a lot of cleanup left to do, and there are a bunch of corner cases that I need to write tests for, but the core code seems pretty solid, and it’s essentially feature-complete.

Posted in  | Tags , ,  | no comments

Another day, another filter plan

Posted by Scott Laird Wed, 17 Aug 2005 15:13:38 GMT

I’ve spent a bit of time playing with moving my Typo filter patch to use controller instances for filtering, and it doesn’t look too hideous on the filter side, but I’m going to need a lot of infrastructure changes before I can deploy it.

Unfortunately, the main user of filters right now is Article#set_defaults, which is called whenever an Article is saved. If I make filtering a controller issue, then I’ll have to rip out the filtering in Article (also Comment and Page) and move it to ArticleController. This won’t be a big performance hit, because I can use fragment caching, but it’s very invasive.

So, before I go any further down this road, I need to decide if this is really cleaner then just caching a base URL and using it to manually generate URLs. I was originally against this, but a thread on the Typo mailing list reminded me that we’re going to need a base URL if we want to support multiple blogs in a single Typo install anyway.

Ugh.

Posted in  | Tags ,  | 8 comments

Typo filters hit a bit of a wall

Posted by Scott Laird Tue, 16 Aug 2005 15:36:07 GMT

I just hit a bit of a problem with Typo filters, and I’m not sure what the best way out is.

The problem first showed up with the sparkline filter. This filter turns a block like <typo:sparkline data="10 20 30 40 50"/> into an <img/> tag that points to a Sparkline generator on the current website. Things were going great with this until I realized that the filters don’t know anything about “the current website.” Even though my filters are technically Rails controllers (there’s a reason for this, I just haven’t fully implemented it yet–it’s next on the list once this problem is fixed), the actual filter method is a class method, not an instance method, and anyway most of the time, the filtering code is called from outside of a controller context. Basically, when the filter code gets called, it doesn’t know which website it’s getting called for.

And that sucks. Even ignoring complex things like the sparkline code, this makes it impossible for filters to produce URL references to elsewhere in the current Typo site. That means filters can’t do locally-hosted images, or WikiWord-style links, or AJAX. And that’s going to be a problem.

As I see it, I have two options:

  1. Re-spin the filter API so that it’s all controller helpers and components. This way they’ll always be able to use url_for. It’s a big conceptual change, but the code will be cleaner, and I doubt I’ll actually have to change more then 50 lines of code, not counting unit tests.
  2. Cache the base URL for the site somewhere and hand-code URLs myself. This is easy, but ugly.

Since I’m trying to avoid ugly, it looks like I’m going to be taking option #1. That’ll probably push the first filter release out to this weekend, but it’ll be better code.

As a side note, my Rails Book finally showed up yesterday, so I have printed documentation to work with finally. PDFs are nice for skimming and searching, but they’re a pain to read cover-to-cover.

Posted in  | Tags , , , ,  | 2 comments

Questions about typo filters

Posted by Scott Laird Sat, 13 Aug 2005 23:29:08 GMT

After posting yesterday’s typo filter announcement, I started to have a few misgivings about the way that I was planning on configuring filters. I asked a few questions on the Typo IRC channel and got a number of wildly different suggestions and opinions, but one line from cDlm stuck with me:

it’s gonne be a nightmare to keep all those filters orthogonal

There was also a comment that every blog was going to end up with its own markup language, incompatible with everyone else. I thought about it a bit, and I think I’m approaching filters wrong. Or, rather, I’m approaching them like a programmer, not like an end user. The filter interface that I’ve been planning is too generic, and we’ll probably be better-served if we remove some of the genericness, at least on the front end.

As I see it, filters fall into 4 basic categories:

  1. Markup languages–they convert from some non-HTML markup language into HTML. Examples include Markdown and Textile.
  2. HTML post-processors–they convert generic HTML elements into other generic HTML elements. Example: SmartyPants, and sort of the Amazon filter that I discussed earlier.
  3. Typo macro tags. This includes the <flickr> and <sparkline> filters from yesterday.

Looking at this list, I see 2 very specific things:

  1. There’s really no reason to run two markup languages on the same post. You want at most one of them, and possibly 0 if you’re writing raw HTML.
  2. As long as the Typo macro tags fit into a clean namespace and they don’t have side-effects when they aren’t used, there’s no reason to ever turn them off. If a macro tag filter is installed, then it should be used on all articles.

This really simplifies things, because the filter configuration no longer requires the user to set up an arbitrary set of filters. Now it just needs to know:

  1. Which markup language
  2. Which post-processing filters.

I think we can call the Amazon filter a post-processing filter without breaking anything. Furthermore, we can probably pre-order the post-processing filters by having a priority built into them. This way, the user just needs to click checkboxes. This is a lot less complex then dragging a half-dozen filters around in a script.aculo.us Sortable like we have to use for the sidebar.

This won’t handle 100% of the cases that people want, but it’s almost certainly well over 90%, and I think it’ll have about 10% of the complexity. The common cases, like “I want Markdown with SmartyPants” or “I want Textile with SmartyPants and WikiWords” will be really simple, and that’s much more important then the ability to stack filters in arbitrary orders. Anyway, there will be a way to get around the remaining cases by hand-creating TextFilter database entries; if someone really wants to do sufficiently weird things, like applying Markdown and Textile to the same article, then I don’t think asking them to do something like this is excessive:

$ ./script/console production
>> TextFilter.create(:name => 'weird filter', :description => 'My Weird Filter', :filters => [:markdown, :textile, :macros, :smartypants, :textile, :piglatin])

I’m going to start re-working my filter code to fit into this framework; I should have something to show in a few days.

Posted in  | Tags , , ,  | 1 comment