I’ve finally had it with my downstairs TiVo. As I mentioned before, it’s been crashing with increasing regularity. I’ve been avoiding dealing with it for a few reasons; mostly, I’ve been too busy with other things, but I also hate debugging problems with black boxes, and my TiVo is definitely a black box, even if it does run Linux.

Unfortunately, my inaction hasn’t made the problem go away. If anything, it’s happening even more often now then it was a month or two ago. But, in the meantime, I’ve been able to watch a couple crashes and gleaned a bit of information:

  1. It doesn’t just stop–when playing back recordings, it skips a couple times first, with long pauses, then a few frames, then another long pause. This repeats for a couple cycles, and then it finally crashes.
  2. When it’s working right, the TiVo’s front-panel LEDs blink brightly when I press buttons on the remote. When it’s dead, the front panel LEDS still blink, but only slightly. It looks like there are two LEDs that blink, one driven by hardware and the other by software. When it’s dead, the software one doesn’t respond. But, when the box has just crashed, the software LED still works. It takes a while for it to die.

If I saw the same symptoms on a server, I’d be thinking “bad hard drive”–we’re seeing processes lock up in the ‘D’ state here, followed by a congestive collapse of the whole system, as every system process that touches the disk (or talks to a process that talks to the disk, like syslog) slowly crawls to a halt. Bad motherboards or power supplies don’t usually leave parts of the OS working right for a few minutes.

So, I pulled the drive out of the box and I’m currently cloning it onto a new, larger drive. There are decent tools and directions online for this, so it’s not a big pain rounding up all of the pieces. Plus, when it’s done, I should have twice as much disk space.

As a plus, the original drive started spitting out unrecoverable drive errors around 2% of the way into the copy. Hopefully that was in the data area of the drive, not in the middle of important program code. Hmm–I wonder if it’s in the swap area? That’d really kill the box, but it’d leave it working fine after a reboot, until it got busy enough to swap. Yeah, that’s probably what happened.

Update (Apr 20, 2004): It seems to have worked. The TiVo has made it two days without crashing; that hasn’t happened in weeks. It seems slightly faster, and it now holds 146 hours, up from 60 hours.