Typo progress

We've been making pretty good progress towards the next major release of Typo. Here's a short list of what's went in so far:

  • Tags
  • File uploads
  • Gravatars
  • Filter plugins, including:
    • Easy flickr image linking
    • Syntax highlighting
    • Sparklines
    • Auto-generating Amazon affiliate links
  • Comment previews
  • More powerful themes
  • Atom 1.0 support
  • Per-category and per-tag RSS and Atom feeds

There are still quite a few features left on our wishlist; some of those will make it into the next release, some won't. I'm starting to feel like we've passed the halfway point on this release cycle, but we don't have any firm plans for the next Typo release yet. Still, if there's anything that people really want to see in the next major Typo release, now would be a great time to speak up.

Posted by Scott Laird Sun, 04 Sep 2005 13:43:29 GMT

Rails Schema Generator 0.1.0

I just uploaded the first version of my schema generator to rubyforge. This is a Rails generator that knows how to take a collection of migration scripts and use them to build up a valid SQL schema file.

You should be able to install it via gem install schema_generator, and run it on any Rails project by running ./script/generate schema from the root of the Rails project. The current release (0.1.0) supports MySQL, PostgreSQL, and SQLite. It will auto-generate a schema for each DB in db/schema.DBTYPE.sql every time it runs, prompting you before overwriting existing files.

For this to work, your Rails migrations must describe your complete database schema. Many projects, like Typo, are older then Rails’s migration support, so their migrations don’t start with a clean slate; instead they describe how to migrate from a specific old version of the DB schema to the current version. In this case, either create a 0_initial_schema migration or to modify the existing migration #1 to create all of the original tables. I just committed an example to Typo’s subversion tree, feel free to use it as an example.

Here’s an example of the a schema generated by the generator. This is for Typo on PostgreSQL, as of migration #14. I had to create a db/migrate/0_initial_schema.rb file, but all of the other migrations were completely untouched.

The schemas for MySQL and SQLite are similar, but use the correct types (like int(11)) and syntax for each DB.

-- This file is autogenerated by the Rail schema generator, using
-- the schema defined in db/migration/*.rb
-- Do not edit this file.  Instead, add a new migration using
-- ./script/generate migration <name>, and then run
-- ./script/generate schema

-- tables 

CREATE TABLE articles (
  id serial primary key,
  title character varying(255),
  author character varying(255),
  body text,
  body_html text,
  extended text,
  excerpt text,
  keywords character varying(255),
  allow_comments integer,
  allow_pings integer,
  published integer DEFAULT '1',
  created_at timestamp,
  updated_at timestamp,
  extended_html text,
  guid character varying(255),
  permalink character varying(255),
  user_id integer,
  text_filter_id integer

CREATE TABLE articles_categories (
  article_id integer,
  category_id integer,
  is_primary integer

CREATE TABLE articles_tags (
  article_id integer,
  tag_id integer

CREATE TABLE blacklist_patterns (
  id serial primary key,
  type character varying(255),
  pattern character varying(255)

CREATE TABLE categories (
  id serial primary key,
  name character varying(255),
  position integer,
  permalink character varying(255)

CREATE TABLE comments (
  id serial primary key,
  article_id integer,
  title character varying(255),
  author character varying(255),
  email character varying(255),
  url character varying(255),
  ip character varying(255),
  body text,
  body_html text,
  created_at timestamp,
  updated_at timestamp

CREATE TABLE page_caches (
  id serial primary key,
  name character varying(255)

  id serial primary key,
  name character varying(255),
  user_id integer,
  body text,
  body_html text,
  created_at timestamp,
  updated_at timestamp,
  title character varying(255),
  text_filter_id integer

  id serial primary key,
  article_id integer,
  url character varying(255),
  created_at timestamp

CREATE TABLE resources (
  id serial primary key,
  size integer,
  filename character varying(255),
  mime character varying(255),
  created_at timestamp,
  updated_at timestamp,
  article_id integer

CREATE TABLE sessions (
  id serial primary key,
  sessid character varying(255),
  data text,
  created_at timestamp,
  updated_at timestamp

CREATE TABLE settings (
  id serial primary key,
  name character varying(255),
  value character varying(255),
  position integer

CREATE TABLE sidebars (
  id serial primary key,
  controller character varying(255),
  active_position integer,
  active_config text,
  staged_position integer,
  staged_config text

  id serial primary key,
  name character varying(255),
  created_at timestamp,
  updated_at timestamp

CREATE TABLE text_filters (
  id serial primary key,
  name character varying(255),
  description character varying(255),
  markup character varying(255),
  filters text,
  params text

CREATE TABLE trackbacks (
  id serial primary key,
  article_id integer,
  blog_name character varying(255),
  title character varying(255),
  excerpt character varying(255),
  url character varying(255),
  ip character varying(255),
  created_at timestamp,
  updated_at timestamp

  id serial primary key,
  login character varying(255),
  password character varying(255),
  email text,
  name text

-- indexes 

CREATE  INDEX articles_permalink_index ON articles (permalink);
CREATE  INDEX blacklist_patterns_pattern_index ON blacklist_patterns (pattern);
CREATE  INDEX categories_permalink_index ON categories (permalink);
CREATE  INDEX comments_article_id_index ON comments (article_id);
CREATE  INDEX page_caches_name_index ON page_caches (name);
CREATE  INDEX pings_article_id_index ON pings (article_id);
CREATE  INDEX trackbacks_article_id_index ON trackbacks (article_id);

-- data 

INSERT INTO sidebars ("staged_position", "active_config", "active_position", "controller", "staged_config") VALUES(NULL, NULL, 0, 'category', NULL);
INSERT INTO sidebars ("staged_position", "active_config", "active_position", "controller", "staged_config") VALUES(NULL, NULL, 1, 'static', NULL);
INSERT INTO sidebars ("staged_position", "active_config", "active_position", "controller", "staged_config") VALUES(NULL, NULL, 2, 'xml', NULL);
INSERT INTO text_filters ("name", "filters", "description", "params", "markup") VALUES('none', '--- []', 'None', '--- {}', 'none');
INSERT INTO text_filters ("name", "filters", "description", "params", "markup") VALUES('markdown', '--- []', 'Markdown', '--- {}', 'markdown');
INSERT INTO text_filters ("name", "filters", "description", "params", "markup") VALUES('smartypants', '--- 
- :smartypants', 'SmartyPants', '--- {}', 'none');
INSERT INTO text_filters ("name", "filters", "description", "params", "markup") VALUES('markdown smartypants', '--- 
- :smartypants', 'Markdown with SmartyPants', '--- {}', 'markdown');
INSERT INTO text_filters ("name", "filters", "description", "params", "markup") VALUES('textile', '--- []', 'Textile', '--- {}', 'textile');

-- schema version meta-info 

CREATE TABLE schema_info (
  version integer

insert into schema_info (version) values (14);

Posted by Scott Laird Sat, 03 Sep 2005 08:09:00 GMT

Rails schema generation is nearly complete

My Rails Schema Generator is nearly complete. Here’s a sample run:

$ ./script/generate schema
Found 6 migration classes
Starting migration for AddSidebars
Starting migration for AddCacheTable
Starting migration for AddPages
Starting migration for AddPageTitle
Starting migration for AddTags
Starting migration for AddTextfilters
Adding TextFilters table
Migrations complete.
 Tables found:   6
 Indexes found: 1
 Records found:   8
      exists  db
overwrite db/schema.postgresql.sql? [Ynaq] y
       force  db/schema.postgresql.sql
overwrite db/schema.mysql.sql? [Ynaq] y
       force  db/schema.mysql.sql
overwrite db/schema.sqlite.sql? [Ynaq] y
       force  db/schema.sqlite.sql

The migration classes that I’m using are copied straight from Typo without modification. I’ve left out all of the migrations that add features to “legacy” tables–tables like articles–since there isn’t a table definition that I can use. That’s my next project–adding a 0_initial_schema migration for Typo. Once that’s complete, I have a bit of code cleanup and then I’ll release my schema generator code to the world. Hopefully that’ll be later today.

Posted by Scott Laird Fri, 02 Sep 2005 22:56:00 GMT

Auto-generating schema from Rails migrations

One of the things that has really bugged me with recent Typo development is the pain of maintaining 3 different database schema files (PostgreSQL, MySQL, SQLite) along with a set of Rails DB migration scripts. Every time we add a new table, we have to edit 4 different files, even though all of the information that we need is available in the migration file. Unfortunately, without the static schemas, new users would be adrift, so we’re stuck having to hand-modify each of the static schema files. This violates the DRY principle, causes errors, and irritates developers.

So I figured I’d fix it by writing some code that can take a set of Rails DB migrations, fold, spindle, and mutilate Rails itself, and then spit out a database-specific schema file showing all of the tables, indexes, and seed data provided by the migration files. This includes handling cases where a table is added in migration #4, two new fields are added in migration #6, and one field is deleted in migration #9. There are some corner cases that just can’t be handled, mostly relating to seed data that needs to be migrated to be correct with more recent schemas, but I think I can come close enough to make Typo happy, and probably a lot of other open-source Rails projects.

This turned out to be easier then I expected. I’ve put about 4 hours into it so far, and I can take this migration:

# This is db/migrate/4_test4.rb
class Test4 < ActiveRecord::Migration
  def self.up
    create_table :sidebars do |t|
      t.column :controller, :string
      t.column :active_position, :integer
      t.column :active_config, :text
      t.column :staged_position, :integer
      t.column :staged_config, :text

    Sidebar.create(:active_position=>0, :controller=>'category')
    Sidebar.create(:active_position=>1, :controller=>'static')
    Sidebar.create(:active_position=>2, :controller=>'xml')

  def self.down
    drop_table :sidebars

And then do this:

$ irb
irb(main):001:0> require 'migrate'
=> true
irb(main):002:0> require 'db/migrate/4_test4' # the code above
=> true
irb(main):003:0> Test4.up
=> ...
irb(main):004:0> puts DBMigrator::Database.dump('postgresql')
CREATE TABLE sidebars (id serial primary key, controller character varying(255), active_position integer, active_config text, staged_position integer, staged_config text) ;
INSERT INTO sidebars ("staged_position", "active_config", "active_position", "controller", "staged_config") VALUES(NULL, NULL, 0, 'category', NULL);
INSERT INTO sidebars ("staged_position", "active_config", "active_position", "controller", "staged_config") VALUES(NULL, NULL, 1, 'static', NULL);
INSERT INTO sidebars ("staged_position", "active_config", "active_position", "controller", "staged_config") VALUES(NULL, NULL, 2, 'xml', NULL);

Postgres works now, at least with the 4 or 5 examples that I’ve swiped from Typo’s migrations. SQLite and MySQL are nearly working; I think I just need to fake out a couple classes each and they’ll be up and running. Once that’s done, I’ll bundle this all up into a Rails generator so people can do this:

$ ./script/generate schema postgresql
      create  db/schema.postgresql.sql
$ ./script/generate schema mysql
      create  db/schema.mysql.sql
$ ./script/generate schema sqlite
      create  db/schema.sqlite.sql

Posted by Scott Laird Thu, 01 Sep 2005 08:14:00 GMT

Introduction to Typo filters

Although blogs are inherently HTML-based, HTML isn’t really a great format for writting plain-text documents. If nothing else, manually adding <p> and </p> around paragraphs interrupts the flow of writing. Most people would prefer to write in a more user-friendly manner, either via a GUI editor or a light-weight markup language like Markdown, which is then translated to HTML automatically. Very few people really want to write raw HTML blog postings on a daily basis.

Out of the box, Typo 2.5 supports the Markdown and Textile markup languages and the SmartyPants HTML-post processing filter, which adds typographical quotes and dashes to HTML. Adding additional filters is difficult because the filter setup is hard-coded into Typo. One of the new features that I’ve been working to add to Typo is the ability to easily add new text filters via filter plugins, similar to the sidebar plugins in Typo 2.5. At the same time, I’ve also added several new filtering plugins that extend Typo’s abilities in a number of useful ways.

The goal of all of this is to make it easier to write using Typo. I’ve tried to find things that cause me pain and then fix them. I want to make it easy to do common writing tasks without having to fire up an external tool. Admittedly, my definition of “common writing tasks” is probably different from most people’s, but the easy ability to extend Typo’s filtering system will allow people to adapt Typo to their own needs without having a deep understanding of Typo’s internals.

Inside Typo Filters

The new filter code supports three different types of filter plugins:

  1. Markup filters, like Textile and Markdown
  2. Macro filters
  3. Post-processing filters, like SmartyPants

Markup filters convert from a specific markup language into XHTML. You generally only want to use one markup language per article.

Macro filters convert certain Typo-specific macro tags into longer HTML sequences. These will be explained below.

Post-processing filters convert valid HTML into valid (but possibly enhanced) HTML.

Typo’s filtering system allows the user to create filter sets that use one markup filter and any mixture of post-processing filters. Macro filters are always enabled; they’re difficult to trigger accidentally and this greatly simplifies the filter management user interface.

Using Typo Filters

Typo 2.5 came with 5 hard-coded filter sets:

  • No filtering
  • Textile
  • Markdown
  • SmartyPants
  • Markdown with SmartyPants

The new filtering code comes with the same filters defined. If one of these fits your needs perfectly, then you can continue using it unchanged. If you need to make changes, Typo’s admin system now includes a “Text Filters” tab that lets you edit these filter sets and create new ones.

Each text filter defined in the admin interface has a drop-down box for the markup language used (currently None, Markup, or Textile) and check boxes for each available post-processing filter.

Macro filters

Macro filters convert certain Typo-specific tags to longer HTML sequences. The new filter code comes with three macro filter plugins:

  • <typo:code>: displays formatted code snippets, optionally with syntax highlighting and line numbering.
  • <typo:flickr>: produces an image tag linked to an image on Flickr, optionally with a caption.
  • <typo:sparkline>: displays a SparklineTufte’s name for a small in-line chart.

All macro filters use <typo:NAME>-style tags. The <typo:NAME> tag is then replaced by the output of the macro filter during the filtering process. For example, the Flickr macro filter would replace this:

<typo:flickr img="31366117" size="square" style="float:left"/>


<div style=\"float:left\" class=\"flickrplugin\">
  <a href=\"http://www.flickr.com/photo_zoom.gne?id=31366117&size=sq\">
    <img src=\"http://photos23.flickr.com/31366117_b1a791d68e_s.jpg\" width=\"75\" height=\"75\" alt=\"Matz\" title=\"Matz\"/>
  <p class=\"caption\" style=\"width:75px\">
      This is Matz, Ruby's creator

Notice that the <typo:flickr> line is a lot less typing.

The other macro tags work similarly. Here’s a brief example of the code plugin in action:

<typo:code lang="ruby">
  class Foo
    def bar
</ typo:code>

The end result is basically the same as <pre>...</pre>, except that the text in the middle gets Ruby-specific syntax highlighting and all HTML is escaped.

Documentation enhancements

Each filter plugin has the opprotunity to define a self.help_text method that returns a help string. The admin interface currently has a button to show the help text for each filter; in the near future we’ll extend this to the content and comment editing pages as well. This way users will be able to see text formatting help that’s specific to the exact filter configuration in use.

Writing filters

Basic filters are pretty simple. Here’s a minimal markup filter, for example:

class Plugins::Textfilters::TextileController < TextFilterPlugin::Markup
  def self.display_name

  def self.description
    'Textile markup language'

  def filtertext
    text = params[:text]
    render :text => RedCloth.new(text).to_html

This is about as basic as it can be–it doesn’t include any help text, but it’s a fully functional text filter. Drop this into components/plugins/textfilters/textile_controller.rb, and Typo will automatically gain the ability to use Textile formatting.

To create markup filters, your filter class needs to be a subclass of TextFilterPlugin::Markup. Post-processing filters are essentially the same, except they’re subclasses of TextFilterPlugin::PostProcess.

Macro filters are slightly different. First, there are two different macro classes, TextFilterPlugin::MacroPre and TextFilterPlugin::MacroPost–one runs before markup filters, and the other runs after. Second, macro filters don’t define a filtertext method; instead they define a macrofilter method that looks like this:

def macrofilter(attrib,params,text="")
  data = text.to_s.split(/\s+/).join(',')

    data = attrib.delete('data').to_s.split.join(',')

  url = url_for(
    {:controller => '/textfilter', 
     :action => 'public_action', 
     :filter => 'sparkline',
     :public_action => 'plot', 
     :data => data}.update(attrib))
  "<img src=\"#{url}\"/>"

The attrib parameter is a hash of all attributes to the <typo:macroname> tag, params contains filter-wide parameters (see below), and text is the text between <typo:macro>...</typo:macro> tags, if any.

Filters are controllers, and they have access to all of the usual ActiveController methods, like url_for and friends. By default, none of the actions in plugins are visible to the public, so you don’t have to worry about someone feeding http://blog.example.com/plugins/textfilters/foo/exploit_me into their web browser and running code inside of your plugin. In some cases, though, you want to have certain methods in your plugin be accessible via URL. For instance, your plugin might need to use Ajax for something, or it might need to produce images, like the Sparkline plugin does.

To accomplish this, use plugin_public_action, like this:

class Plugins::Textfilters::SparklineController < TextFilterPlugin::MacroPost
  plugin_public_action :plot
  def plot

This will connect http://blog.example.com/plugins/textfilters/sparkline/plot to SparklineController#plot. If you need to use views, then create a controllers/plugins/textfilters/<plugin> directory and put your views in there.

Plugin parameters

Some filter plugins need more information then they can easily collect when filtering each article. For instance, think about a hypothetical WikiWords auto-linking filter that turned WikiWords into links to a Wiki somewhere. If it’s going to link words, then it’ll need to know which wiki to link them to. That’s where filter parameters come in. Each filter plugin can have a default_config method like this:

def self.default_config
  {"wiki-link" => {
    :default => "", 
    :description => "Wiki URL to link WikiWords to",
    :help => "The WikiWords plugin links..."}}

Typo collects all of the default_config items from all enabled plugins and presents them to the user in the Text Filter admin area. If the WikiWords filter was installed, then each filter set would have an editing box labeled “Wiki URL to link WikiWords to”.

Using filters from inside of Typo

In Typo 2.5, filters were called via the HtmlEngine.transform library method. Unfortunately, this had to change with the new plugin system, because several plugins need to be called from a Controller context so they can use views and helpers like url_for.

Unfortunately, this means that it’s no longer possible to call filters directly from Models–they have to be called from Controllers so that they have the right context available. Fortunately, the code wasn’t too hard to convert, even though there was a lot of it.

To use filter plugins from inside of a controller, just call filter_text, like this:

filter_text('text to be filtered',[:markdown, :macropost, :smartypants])

This is rather low-level. To use whole filter sets, use this:

filter_text_by_name('more text to be filtered','markdown')

This will look up the filter set named ‘markdown’ in the text_filters table and apply it to the text more text to be filtered.

Any time that Article#body (or any of the similar models, like Comment and Page) changes, the controller must manually call filter_text_by_name This happens around 10 times in the current Typo tree.

Update: The API for filters changed somewhat around r685; the programming examples given here are a bit out of date now. I’ll write a “writing filters” document once the interface is stable.

Posted by Scott Laird Wed, 24 Aug 2005 02:13:00 GMT

Typo filters nearing completion

My little Typo filter project is finally nearing completion. I think I’ve been working on this for almost two weeks now, which makes it the most time-consuming Typo project that I’ve undertaken yet. I’ve added about 400 lines of code and 200 lines of new tests, and changed at least 400 more lines. Typo’s current trunk is only 2600 lines long, so I’ve touched almost 40% of the code.

At this point, almost everything works. I can drop new filters into components/plugins/textfilters and they’re immediately available for use. All of the current filters work (Textile, Markdown, SmartyPants), and I’ve added several new filters as well. There’s still a lot of cleanup left to do, and there are a bunch of corner cases that I need to write tests for, but the core code seems pretty solid, and it’s essentially feature-complete.

Posted by Scott Laird Sun, 21 Aug 2005 06:19:50 GMT

Typo filters hit a bit of a wall

I just hit a bit of a problem with Typo filters, and I’m not sure what the best way out is.

The problem first showed up with the sparkline filter. This filter turns a block like <typo:sparkline data="10 20 30 40 50"/> into an <img/> tag that points to a Sparkline generator on the current website. Things were going great with this until I realized that the filters don’t know anything about “the current website.” Even though my filters are technically Rails controllers (there’s a reason for this, I just haven’t fully implemented it yet–it’s next on the list once this problem is fixed), the actual filter method is a class method, not an instance method, and anyway most of the time, the filtering code is called from outside of a controller context. Basically, when the filter code gets called, it doesn’t know which website it’s getting called for.

And that sucks. Even ignoring complex things like the sparkline code, this makes it impossible for filters to produce URL references to elsewhere in the current Typo site. That means filters can’t do locally-hosted images, or WikiWord-style links, or AJAX. And that’s going to be a problem.

As I see it, I have two options:

  1. Re-spin the filter API so that it’s all controller helpers and components. This way they’ll always be able to use url_for. It’s a big conceptual change, but the code will be cleaner, and I doubt I’ll actually have to change more then 50 lines of code, not counting unit tests.
  2. Cache the base URL for the site somewhere and hand-code URLs myself. This is easy, but ugly.

Since I’m trying to avoid ugly, it looks like I’m going to be taking option #1. That’ll probably push the first filter release out to this weekend, but it’ll be better code.

As a side note, my Rails Book finally showed up yesterday, so I have printed documentation to work with finally. PDFs are nice for skimming and searching, but they’re a pain to read cover-to-cover.

Posted by Scott Laird Tue, 16 Aug 2005 15:36:07 GMT

Pluggable text filters for Typo

Now that tags are working, I’ve started work on adding text-filter plugins for Typo. The current release (Typo 2.5.3) has support for 5 different combinations of Textile, Markdown, and SmartyPants hard-coded into it. The different combinations are actually repeated in 3 different places–the filtering code itself, the drop-down list for the built-in editor, and the Movable Type API code to list filter options.

That’s all gone now, replaced with a plugin system, similar to the sidebar plugins that made it into Typo 2.5. Individual filters get dropped into components/plugins/textfilters/ and the system picks up on them automatically. Then there’s an interface in the admin UI that lets you combine the filters into named filter sets, so you can combine Markdown and SmartyPants into “My Filters” (or “Markdown with SmartyPants”, which Ecto recognizes and performs some magic to get Markdown previews to work right). The UI isn’t really complete yet, but the entire back end is there, and I’ve added two new filters as a demonstration of what we can do. Here’s the current list:

  • Markdown. This is my favorite lightweight markup language, and I use it for everything that I write here.
  • SmartyPants. A companion to Markdown, it does a typographical cleanup on HTML, turning ASCII single and double quotes into their typographically correct cousins and fixing em-dashes.
  • Textile. Another lightweight markup language, like Markdown.
  • Amazon. This turns URLs like <a href="amazon:097669400X" ...> into a link to Amazon’s page for ASIN 097669400X, optionally attaching your Amazon affiliate tag. This is mostly a demonstration of what you can do with filters, although I’ll be using it on my blog.
  • Flickr. This sticks a picture from Flickr on the page. This is a bit more complex then the Amazon filter, but similar in concept. It turns <flickr img="31366117" ...> into a formated inline image, linked to Flickr’s full-sized image page, optionally with a caption attached. The full HTML produced is something like <div style=""><a><img/></a><p>Caption</p></div>, which saves a lot of typing.

I’m currently working on a Sparklines plugin, using Glyph’s Ruby sparklines code. It’ll be similar to the <flickr> tag, except it’ll spit out an <img> tag that points to a built-in sparkline generator. Turning <sparkline ...> into an image tag is trivial; allowing a text filter to export an action to the world is a bit more work.

There are currently two things that bother me about this code that I’ll need to resolve before releasing it:

  1. The <flickr> and <sparkline> tags–should they look like plain XHTML, or is that a mistake? Should I turn them into pseudo-bbcode tags, like [flickr]? I’m currently leaning towards sticking a typo pseudo-namespace on the front of them, and turning them into <typo:flickr .../> and <typo:sparkline ...>. Any objections to that?
  2. The admin interface to this is killing me. I’d love to have a nice, simple way of editing each filter set, but it’s turning into a nightmare. I could just copy the sidebar config page (with a few changes–you can only include each filter once, unlike sidebars), but lots of people have had problems with the sidebar editor, and I’d like something a bit cleaner. Except I have no idea what to do.

If all goes well, I’ll post a public patch for comment early next week, and then kick off the Typo 4.0 process by committing this and the tag code later in the week.

Posted by Scott Laird Fri, 12 Aug 2005 15:20:00 GMT

More Typo wishlist items

I updated my Typo to-do list this morning and uploaded it to the Typo wiki.

Hopefully we can get most of those features into the next major Typo release.

Posted by Scott Laird Tue, 09 Aug 2005 20:54:25 GMT

Problems with Rails and page caching

One of the biggest improvements in Typo 2.5 is page caching. By using Rails’s built-in page cache, we can get 100x the performance on many benchmarks without doing more then a few lines of work. This lets us serve high-volume weblogs (like weblog.rubyonrails.com) without requiring heroic measures like clustering.

Unfortunately, there are a number of hidden problems with Rail’s 0.13.1’s page cache implementation. We’ve had to work around a number of them in order to get Typo 2.5 out the door.

Basic page cache usage

Enabling Rails’s page cache is amazingly simple–just add caches_page :actionname to the top of your controller class and the :actionname action will spit out page cache files automatically. A couple small tweaks to Apache’s .htaccess file, and Apache will now serve cached files all on its own without involving Rails. If a client asks for http://blog.example.com/articles/2005/08/08/foo, Apache will first check for a articles/2005/08/08/foo.html file in Typo’s public directory. If that file exists, then it’s sent off to the client without touching Rails at all.


That part of caching is easy. It’s the other end that’s hard: sweeping the cache to remove stale cache entries. Rails provides a simple cache sweeper that can remove specified pages, but that’s not really good enough for us. With Typo, there are a number of events that end up touching a huge number of cached files. Adding a comment, for example, touches the cached article page, but it also changes the comment counter on the main index (if the article is still on the front page), the day, month, and year indexes, some number of category indexes, tag indexes, and potentially paginated versions of all of the above. The code to track these all down was trouble-prone and frequently missed one of the pages that needed to be changed; this led to stale caches. Even worse, some actions, like changing themes, need to invalidate all pages. Rails’s page cache doesn’t keep a list of cached pages, so there’s no clean way to sweep them all.

What we ended up doing was adding a page_caches table to the database and adding hooks to insert a new PageCache entry every time a page was cached. We also added a hook to remove entries from the page cache table whenever a page was manually swept, and then added a PageCache.sweep_all method to flush the entire page cache. For now, we’ve simply ripped out all of our old “smart” sweeping code and force a full sweep of the entire cache whenever anything substantial changes. Sooner or later we’ll start adding smart cache sweeping back in, but for now this works surprisingly well.

Query Parameters and Aliasing

Another shortcoming of Rails’s page cache implementation shows up when you start using query strings. Asking for http://blog.example.com/articles?page=2 ends up handing the ?page=2 parameter to the static .html cache page if it exists instead of calling Rails to ask for page 2. Even worse–if this cached page doesn’t exist, then Rails will generate it and store it for future access, even though it’s the second page of the index, not the first.

Finally, and worst of all, in Typo http://blog.example.com/articles is actually equivalent to http://blog.example.com/, because the article index view is the default index page. This means that the cached page for http://blog.example.com/articles?page=2 is actually /index.html, so anyone visiting page 2 of the article index screws up the front page of the blog. There’s no easy way around this with Rails 0.13.1; for now we’ve had to do work to keep ?page= from paginating anything. There’s one point that we could interrupt the page cache process from inside of Typo, but it doesn’t have any way to see the @request object or any of the query strings.

Long-term, we’re going to need to patch Rails to add a cachable property to @request that gets set to false when there’s a query string present, and also tweak Apache’s rewrite rules to skip static files if a query string is present. That assumes that Apache is even able to do that–every time I read the mod_rewrite documentation I end up with a headache. Since Typo officially supports lighttpd as well as Apache, we’ll need to get both of them to do the right thing, which is far from trivial.

Non 7-bit ASCII URLs and Caching

Finally, Rails screws up cached filenames when the URL has non-ASCII characters. So any URL with accented characters or any non-ASCII script is totally uncachable. At least with Apache and Webrick, Rails sees non-ASCII characters in the URL encoded using the usual %XX URL-encoding scheme. Unfortunately, both servers actually look for unencoded filenames. So Rails writes out the cache file for /foö as public/fo%C3%B6.html (assuming UTF-8 encoding), but Apache actually looks for public/fo<C3><B6>.html (where <C3> is a byte with the value of C3 in hex). This is actually not all that hard to fix–just add a URI::Util.decode to the right place inside of Rails–but it’s not clear what the security implications of this are.

Given all of these problems, I’ve been tempted to try using Rail’s action cache instead of the page cache–the action cache doesn’t let Apache serve the cached files directly, so Typo would have a brief chance to block the cache from handling specific files, and we could approach sweeping from the opposite direction. It’s not clear how big of a speedup the action cache would actually give us, though, compared to the massive win that we get from the page cache. We’d really like to keep using the page cache and fix all of its bugs to its usable by other Rails users.

Posted by Scott Laird Tue, 09 Aug 2005 04:23:41 GMT