One of the biggest improvements in Typo 2.5 is page caching. By using Rails’s built-in page cache, we can get 100x the performance on many benchmarks without doing more then a few lines of work. This lets us serve high-volume weblogs (like weblog.rubyonrails.com) without requiring heroic measures like clustering.
Unfortunately, there are a number of hidden problems with Rail’s 0.13.1’s page cache implementation. We’ve had to work around a number of them in order to get Typo 2.5 out the door.
Basic page cache usage
Enabling Rails’s page cache is amazingly simple–just add
caches_page :actionname to the top of your controller class and the
:actionname action will spit out page cache files automatically. A couple small tweaks to Apache’s
.htaccess file, and Apache will now serve cached files all on its own without involving Rails. If a client asks for
http://blog.example.com/articles/2005/08/08/foo, Apache will first check for a
articles/2005/08/08/foo.html file in Typo’s
public directory. If that file exists, then it’s sent off to the client without touching Rails at all.
That part of caching is easy. It’s the other end that’s hard: sweeping the cache to remove stale cache entries. Rails provides a simple cache sweeper that can remove specified pages, but that’s not really good enough for us. With Typo, there are a number of events that end up touching a huge number of cached files. Adding a comment, for example, touches the cached article page, but it also changes the comment counter on the main index (if the article is still on the front page), the day, month, and year indexes, some number of category indexes, tag indexes, and potentially paginated versions of all of the above. The code to track these all down was trouble-prone and frequently missed one of the pages that needed to be changed; this led to stale caches. Even worse, some actions, like changing themes, need to invalidate all pages. Rails’s page cache doesn’t keep a list of cached pages, so there’s no clean way to sweep them all.
What we ended up doing was adding a page_caches table to the database and adding hooks to insert a new PageCache entry every time a page was cached. We also added a hook to remove entries from the page cache table whenever a page was manually swept, and then added a
PageCache.sweep_all method to flush the entire page cache. For now, we’ve simply ripped out all of our old “smart” sweeping code and force a full sweep of the entire cache whenever anything substantial changes. Sooner or later we’ll start adding smart cache sweeping back in, but for now this works surprisingly well.
Query Parameters and Aliasing
Another shortcoming of Rails’s page cache implementation shows up when you start using query strings. Asking for
http://blog.example.com/articles?page=2 ends up handing the
?page=2 parameter to the static
.html cache page if it exists instead of calling Rails to ask for page 2. Even worse–if this cached page doesn’t exist, then Rails will generate it and store it for future access, even though it’s the second page of the index, not the first.
Finally, and worst of all, in Typo
http://blog.example.com/articles is actually equivalent to
http://blog.example.com/, because the article index view is the default index page. This means that the cached page for
http://blog.example.com/articles?page=2 is actually
/index.html, so anyone visiting page 2 of the article index screws up the front page of the blog. There’s no easy way around this with Rails 0.13.1; for now we’ve had to do work to keep
?page= from paginating anything. There’s one point that we could interrupt the page cache process from inside of Typo, but it doesn’t have any way to see the
@request object or any of the query strings.
Long-term, we’re going to need to patch Rails to add a
cachable property to
@request that gets set to
false when there’s a query string present, and also tweak Apache’s rewrite rules to skip static files if a query string is present. That assumes that Apache is even able to do that–every time I read the
mod_rewrite documentation I end up with a headache. Since Typo officially supports lighttpd as well as Apache, we’ll need to get both of them to do the right thing, which is far from trivial.
Non 7-bit ASCII URLs and Caching
Finally, Rails screws up cached filenames when the URL has non-ASCII characters. So any URL with accented characters or any non-ASCII script is totally uncachable. At least with Apache and Webrick, Rails sees non-ASCII characters in the URL encoded using the usual
%XX URL-encoding scheme. Unfortunately, both servers actually look for unencoded filenames. So Rails writes out the cache file for
public/fo%C3%B6.html (assuming UTF-8 encoding), but Apache actually looks for
<C3> is a byte with the value of
C3 in hex). This is actually not all that hard to fix–just add a URI::Util.decode to the right place inside of Rails–but it’s not clear what the security implications of this are.
Given all of these problems, I’ve been tempted to try using Rail’s action cache instead of the page cache–the action cache doesn’t let Apache serve the cached files directly, so Typo would have a brief chance to block the cache from handling specific files, and we could approach sweeping from the opposite direction. It’s not clear how big of a speedup the action cache would actually give us, though, compared to the massive win that we get from the page cache. We’d really like to keep using the page cache and fix all of its bugs to its usable by other Rails users.
Now that Typo 2.5.0 is out, it’s time to get back to hacking new features into Typo. I’m starting with tags. It’s amazing how easy Typo is to work with–I’m only a few hours into this, and I can almost see the end already. Here’s a screenshot of the tag sidebar:
It’s not complete, but it seems to work well enough for me. I’ve added a new tagging infrastructure, and I’ve spent some time learning how Rail’s test system works. At the moment, I’m populating the tags via the
keywords field–that lets me edit them directly from Ecto. I’m going to extend Typo’s built-in admin interface to show a keywords field for now, but not add a method for directly manipulating tags. Later, I’ll come back and add del.icio.us-like autocompletion and a way to rename and merge tags.
I haven’t bothered merging my tags into Technorati or anything yet. So far, they’re just an alternative to categories; we can look at interactions with other sites once tags get merged into the Typo trunk.
I’ve never been happy with my web stats software for this blog. I’ve been limping along with Awstats for a while, because it’s better then nothing, but only slightly. It’s never been able to tell me what I really want to know–things like “where are the visitors to this specific page coming from” or “what pages are especially popular today”. I’ve poked around at other stats packages, but none of the free ones seem any better (more secure, perhaps, but not much more useful). I’ve always had a nagging sense that I was missing something, but I assumed that if I waited long enough, a generic package would show up that was good enough and would do what I wanted.
Today, I finally figured out what I was missing–I want a report that shows the most popular categories on my blog. I want to know that I’ve had 900 Asterisk hits this month but only 252 PDA hits. Since there’s no way that a generic stats package can know that this page belongs to my “Ruby” category, it’s pretty clear that I’m going to need a stats package that knows about my blog software. And since no stats packages know anything about Typo, I’m going to have to write one.
I’m going to play with this a bit in my spare time over the next week or so.
For now, what I’d really like to see from people is a list of questions that they’re like their blog stats software to be able to answer. I’ll start the list here by giving some of the basics and repeating a couple from above:
- How many hits am I getting per day?
- Where are the visitors coming from?
- What search terms are people using?
- What categories generate the most hits?
- Which pages are getting a larger-then-normal number of hits today?
Feel free to leave more in the comments and I’ll add them to my list.
I upgraded this site to Typo revision 351 last night. It was a bigger pain then I anticipated–there were conflicts with some of my local patches that took a while to find and fix. Now that it’s working, I’m really impresses–Typo now caches the HTML that it generates and gets Apache to serve it up directly without involving Typo at all whenever possible. This gives a huge speed increase, but it’s taken Tobias several revisions to get it all working correctly.
Unfortunately, I found another static caching bug one I put it up on my test site: caching only works when the Typo site lives in the root of a website. I have mine in a subdirectory (http://scottstuff.net/blog/), and that did entertaining things to the caching system. It would work right for the first hit, but once the cache was generated then Apache would throw 400 errors every time the page was hit. That kinda sucked.
Fortunately, the fix wasn’t too hard, and now everything seems to be working perfectly. As usual, let me know if you see any bugs.
I was looking over yesterday’s sidebar work and noticed two shortcomings:
- While it’s really easy to add new sidebar plugins (just drop the files into place and it’ll pick them up and make them available for use), that only works if sidebar items don’t need any configuration options. If the sidebar needs to ask the user for configuration, then you need to patch
app/views/admin/general/index.rhtml, which can be kind of non-trivial.
- This whole model limits users to only displaying each sidebar type once. Take a look at poocs.net and notice that he has two Tada lists. That’s not really possible with my new sidebar model, short of cloning the
tadaplugin to create a
tada2plugin, which is ugly.
So, I’m reworking things. I’m allowing users to add individual sidebar items multiple times, and I’m moving the sidebar config data into the sidebar itself, out of Typo’s main configuration system. I’m currently able to create multiple Flickr entries in the database using the new admin UI, but there’s still quite a bit of work left to do.
When this is all done, we should have a really deeply cool drag-and-drop sidebar configuration system that is better then the current system in every way, except perhaps for a bit of complexity. Hopefully it’ll be acceptable to the Typo maintainers.
Update: A screenshot is available.
I spent half the night working on a monster patch for Typo that totally reworks the way Typo’s sidebar works. Before, all of the items that you see on the right side of this blog were statically coded into the
app/views/layout/article.rhtml file. To change the URL for your Flickr images, you had to edit the source by hand. Adding new items involved editing three or four files.
That’s all gone now.
I moved all of the sidebar items into a new
component/sidebars/plugins directory, and added a sidebar controller that drives them. It automatically loads all of the sidebars in the plugin directory, so you can install new sidebars by simply dropping the files into place. No more editing source files and then fighting to keep the changes merged. I moved all of the configuration data for each plugin into Typo’s configuration database, so you can edit everything via the admin interface.
I posted my patch to Typo’s bug-tracking database, but it’s really young code, mostly written in the middle of the night, so I wouldn’t recommend using it quite yet. Once I’ve had a chance to clean it up a bit I’ll give it a bit of testing here and then submit a newer patch to trac and hopefully it’ll get merged.
I’m basically finished with my last block of changes to Typo, and most of them have been merged upstream. At this point, I have most of the features that I care about, but there are still a few things left to do:
- Implement the rest of the Movable Type API, including the DB fields behind all of the useful stuff. Basically, everything that’s available in common blog editors should be available in Typo.
- Optionally link article keywords to Technorati tags.
- Look into turning filters and sidebar items into plugins, where all that’s needed to add them is to drop their files into a directory and then enabling/disabling/reordering them from the admin pages. (sidebars are done: #157)
- Add a Flickr text-formating plugin that will let me say something like
[[flickr:scottlaird/24727421 "Some guy on a bicycle"]]and have that turned into a clickable image (with size tags) and a caption. I’m not sure what the right syntax will be for this–I know what Markdown uses, so I can avoid running into it, but I’m not all that familiar with Textile. It might be worth looking at Unicode-only brackets, like «» or 「」. Of course, adding untypable brackets sort of cuts down on the utility of shortcuts like this.
- Add an Amazon text-formatting plugin that allows
amazon:<ASIN>URLs and transforms them into links, optionally with affiliate ID attached. This is a lot less complex the the Flickr filter, because it can just search for
- Add a better way of mapping stacks of text-formatting filters to names. The Admin UI and the MT API both want symbolic names for filters; right now, this is hard-coded. It’d be nice to make this dynamically managed, but it looks like a total pain.
These used to be up above, but they’re done now.
- Link to author’s email address if provided. (done, #156)
- Add GeoURL support, with a tag in the headers for it and a config option for providing your latitude and longitude. (done, #154)
- Add Flickr config parameters so adding Flickr doesn’t require editing the source. (done, #155)
- Fix the Flickr sidebar so it doesn’t do weird things with portrait images–as it is, the border around the images fits landscape images perfectly but leaves a big gap with portrait images. Alternately, just use the square layout that flickr likes. (done, I’m using square images)
Update: Pretty much everything here is complete as of August 26th. See the Typo Wishlist wiki page for details on where we’re going from here.
I’ve updated this blog again, using my latest Typo patches. It’s currently running Rails 0.13 (less then 2 days old), Typo SVN r280 (the most recent), and every change that I’ve submitted to the Typo Trac system. This includes:
- Per-article RSS feeds, so you can track comments and article changes via RSS.
- Fixed ‘comment’ link on the main index page; it now points to the
#commentanchor instead of the head of the article.
- Configurable number of articles included in index pages and RSS feeds.
- Improved comments. This includes comment threading, per-comment subject lines, and optional email addresses.
- Better text filtering, so I can use SmartyPants-style quote fixing and use different filtering styles for comments and articles.