Content Caching in Rails
Scott Laird
Introduction
- Rails has supported caching since at least 0.9.4
- Only works in production mode, not development mode
- Faster performance, but more complexity
- Hidden problems
Three different caches
- Page cache
- Action cache
- Fragment cache
Page Cache
- Easiest to use
- Fastest
- Hardest to get right
- Worst scaling properties
- Makes easy tasks easy and difficult tasks impossible.
Page Cache 2
- Pages created by Rails as normal
- Written out to public directory
- Cached pages served directly by webserver without involking rails
Page Cache Example
class ArticleController < ApplicationController
cache_sweeper :article_sweeper
caches_page :index, :permalink, :category, :view_page
def index
...
end
def permalink
...
end
...
end
Page Cache Sweeper
class BlogSweeper < ActionController::Caching::Sweeper
observe Article, Comment
def after_save(record)
expire_page(:controller => 'article', :action => 'index')
expire_page(:controller => 'article', :action => 'view_page',
:id => record.id)
record.categories.each do |category|
expire_page(:controller => 'article', :action => 'category',
:id => category)
end
end
end
Page Cache .htaccess
RewriteRule ^$ index.html [QSA]
RewriteRule ^([^.]+)$ $1.html [QSA]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ dispatch.fcgi [QSA,L]
Page Cache 3
- That's basically it.
- Rails will create pages as needed
- Web server will serve pages
- Everyone is happy, right?
Page Cache Problems
- Query strings
- Authenticated pages
- Multiple paths
- Complex sweeping requirements
Page Cache: Query strings
- Page caching doesn't consider the effects of ?foo=bar in the URL.
- Pages with query strings will be written into the cache, where they'll be served to users who didn't pass in the query string
- Once a page is in the cache, it will ignore query strings, because static HTML doesn't care.
- DIfficult to fix, because Rails's page cache code can't see the request object, and even if you configure Apache to work right, other servers will still be broken.
Page Cache: Authentication
- Don't use the page cache with authenticated pages
- Apache will happily serve authenticated content to whoever comes along
- Unless you implement authentication in Rails and Apache
- DRY
Page Cache: Multiple paths
- Rails generates cache filenames using url_for, not the actual path used to reach the page.
- If there is more then one route that can reach a specific action, then you're going to have problems.
- Queries using the first path will work fine.
- Queries using the second path will never get a cache hit, and will then overwrite the first path's cache file.
- This is rarely what you'd expect to have happen.
- Pagination can trigger this.
Page Cache: Complex sweeping
- The page cache sweeper is easy if you only need to expire a few pages per model change
- It gets complex when you have lots of pages that include content from a given model
- There's no way to say "sweep all pages"
Page Cache: Complex sweeping with Typo
When an Article changed, we needed to invalidate:
- The main index page
- The category indexes for each category
- The tag index for each tag
- The daily index page
- The monthly index
- The yearly index
- The article page itself
Page Cache: Sweeping 3
- Plus paginated pages.
- Posting comments is just about as bad
- We kept missing pages.
- Bug reports with stale pages that never got rebuilt.
- Solution: a PageCache model and wrap the page cache to call PageCache.create(url,file) for every new cached page.
- Then we implemented PageCache.sweep_all
- Not ideal from a performance standpoint, but it always works.
Page Cache: Scaling
- We don't like to talk about scaling in Rails, but...
- The page cache is hard to use if you start to "scale out" into a cluster.
- Cross-server cache invalidation hurts.
- So don't do it.
Page Cache: Summary
- It's fast
- It's easy to start using
- It gets complex quickly.
- There are many problems that are very difficult to work around.
- It limits how high you can scale.
- Just say no.
Action Cache
- Similar to page cache, but runs entirely inside of Rails
- Slower, because Rails is always involved
- Much less complex on the web server side; no mod_rewrite tricks
- Uses the fragment cache internally.
Action Cache: example
class ArticleController < ApplicationController
cache_sweeper :article_sweeper
caches_action :index, :permalink, :category, :view_page
def index
...
end
def permalink
...
end
...
end
Action Cache part 2
- Expire with expire_action.
- You can use before_filter, so authentication should work.
- Slower than page caching, but usually fast enough.
- Can be used in a "scale out" environment
Action Cache: problems
- Invalidating the cache is still hard.
- Query strings are still broken.
- It's not that hard to work around either problem.
Action Cache: invalidation
- The action cache uses the fragment cache back-end.
- You can expire fragment cache items by regex.
- expire_fragment(%r{articles/.*})
Action Cache: query strings
- Still broken in 0.13.1.
- Easier to work around then the page cache.
- Look at the source in Rails, it should just be a ~5 line fix.
- You just need to add query params to the fragment cache key and it should Just Work.
Fragment Cache
- The do-it-yourself solution
- You have to call it yourself, because it works on short bits of data, not whole pages.
- read_fragment(key) and write_fragment(key,value)
- Expire with expire_fragment(key) (or regex).
- No way to list entries.
Fragment Cache: example
From the Sparklines code in Typo:
fragment_cache = read_fragment(fragmentname)
if(not fragment_cache)
fragment_cache = Sparklines.plot(ary,params)
write_fragment(fragmentname,fragment_cache)
end
send_data(fragment_cache,
:disposition => 'inline',
:type => 'image/png')
Fragment Cache: backends
- Multiple back ends
- Only two are really useful.
- FileStore, writes cached items to file.
- MemCachedStore, uses memcached. Better for large sites, scales out well enough for livejournal.
Fragment Cache: summary
- Basically just a place to stick data.
- Great for places where you re-use the same fragment code over and over again, like sidebars in Typo*
- Most flexible, but the most work.
Recommendations
- Use the page cache for things that are almost completely static, like images or CSS, where they need to be autogenerated but almost never change and never take parameters.
- Use the action cache for other pages that are hit frequently and are cache-friendly.
- Use the fragment cache for commonly used pieces of data that don't easily fit into the other caches.