There’s been a fair bit of discussion online about “Horus”, Newisys’s glue chip for building 8–32 processor Opteron systems. The Horus architecture glues 4-processor clumps of CPUs into a larger system. This lets them overcome the Opteron’s 8-processor scaling limit and build really big boxes.

Fortunately, one of Horus’s designers has posted some details to comp.arch. Here’s a quick summary:

  • Each Horus chip has 7 links; 1 to each local CPU and 3 to other Horus chips.
  • The inter-Horus link is a proprietary version of HyperTransport, modified to work better over cables.
  • While they can scale to 32 processors, going past 16 costs extra latency, because Horus only has 3 inter-Horus links. The sweet spot is 8–16 CPUs.
  • Their reference design uses 2 HT links per quad for I/O. This means that some intra-quad IPC has to go through an intermediary.
  • They designed for a NUMA factor of 3–remote memory costs 3x what local memory costs.
  • The Horus architecture can support up to 64 MB of “remote cache.” It’s still unclear if that’s 64 MB total spread across 8 Horus chips, 64 MB per chip, or 64 MB on a dedicated “Horus Cache Chip” that replaces a CPU quad in the design.
  • The inter-Horus links can be reconfigured and reset on the fly; this will allow for hot-swapping and partitioning.

We can expect to see Horus systems show up late next year. Since AMD’s dual-core Opterons are due in about the same timeframe, we should see some fascinatingly huge PCs by Christmas 2004. The basic design should be similar to Sun’s Enterprise Server [3456]xxx systems–a bunch of plug-in slots that take 4 CPUs and a bunch of RAM each. Since Horus clearly wants I/O to be local to each quad, it’s unclear exactly how networking and disk I/O will work–will there be FC and GigE controllers on each quad, and the system routes them through to the backplane? Will each card have I/O on the front panel, even though that makes swapping CPU cards a royal pain? Will the manufacturers include an “I/O node” on the motherboard with its own Horus and a bunch of HT-to-PCI-X bridge chips? Hopefully, we’ll see a few of each design and the market can sort it out.