Intel moving towards on-chip memory controllers (and the end of dual-CPU systems)

Posted by Scott Laird Thu, 16 Jun 2005 17:24:53 GMT

The Inquirer reports that Intel’s Tukwila chip is going to have an on-board memory controller, just like all of AMD’s newer chips. Tukwila is a multi-core Itanium, and is due sometime in 2007; the Inquirer suggests that Xeons will probably get on-board memory controllers in the same basic timeframe, simply because this will let Intel use the same controller chips for both Xeon and Itanium systems.

Assuming that the rumor is true (and considering how well AMD’s on-board controller works, I’d be surprised if it’s not), Intel will probably end up putting 4-6 FB-DIMM channels per CPU; since each channel’s good for around 10 GB/sec, a dual-chip system could potentially have 120 GB/sec in memory bandwidth. Even better, it’d be possible to build a high-capacity server with 48 DIMM sockets spread over the 12 channels; with 4 GB DIMMs, that’s 192 GB in a relatively simple box.

This assumes that multi-CPU systems remain common; given the way that multiple core systems are progressing, I’m not sure that there will really be a market for commodity multiple-CPU-chip systems after 2007 or so–if you can get 8 cores on a single chip, why would you pay the complexity cost of adding more chips, except for really high-end stuff? Even today, compare the cost and performance of an Athlon 64 x2 vs a system with 2 single-core Opteron 2xx chips–the Opteron system will have a bit more memory bandwidth, but they’ll have similar performance on a lot of workloads and an Athlon 64 x2 with cheap motherboard will be cheaper then most dual-CPU Opteron motherbards, never mind the CPUs.

Dual-CPU systems have been the bread and butter of the PC server world for the last 5-7 years, but I doubt that they have more then another two years to go before they fade into the sunset. Personally, I’d much rather manage a handful of single-chip 8-core clustered, virtualized (where virtual environments can migrate between physical systems under explicit admin control) systems then a smaller number of 2-4 CPU 16-32 core systems.

Posted in  | Tags , , , , ,  | no comments

What will Apple call their Intel systems?

Posted by Scott Laird Wed, 08 Jun 2005 16:34:00 GMT

I was just wondering–what will Apple call their professional computing line once they switch to Intel? The “Power” in PowerMac and PowerBook originally referred to the PowerPC chip inside. Will they keep the “Power” and define it to mean “powerful,” or will they spit out a new prefix?

The iBook/iMac/iPod/iSight/iWhatever product lineup doesn’t have this problem, of course.

Posted in  | Tags , , , ,  | no comments

Apple's move to x86: will it run on non-Apple PCs

Posted by Scott Laird Tue, 07 Jun 2005 01:00:51 GMT

After reading the transcripts of this morning’s WWDC keynote, I’m now feeling pretty good about Apple’s path. They’re going to have sales problems in mid 2006, but for now, there’s no reason for Mac users to avoid buying new systems. I’m still planning on going PowerBook shopping in a few months, and I suspect that most existing Mac users won’t see a problem with buying PPC-based Macs, at least until the x86 ones start appearing on the horizon. I can see waiting 2-3 months for new systems, but waiting for 18 months because something better is coming is just dumb.

The big question on a lot of minds (mine and my co-workers, at least) is whether Apple will sell OS X for non-Apple x86 systems. After today’s announcements, it’s clear that they’ll be technically able to do it in mid 2006. The only real difference between OS X for x86 Macs and OS X for x86 PCs is driver support (er, and bootloader, BIOS, installer, ACPI, and disk partitioning), and I’d bet any amount of money that Apple has people in-house expanding their driver support to allow them to run on generic PCs.

But, just because they can sell OS X for PCs doesn’t mean that they will. Sometime in early 2006, Apple will have to make a decision–are they going to try to take on Microsoft on their home turf, or are they going to stick to their usual niche and try to sell more Mac hardware. With most companies, it’d be a purely financial decision, but with Apple it’ll probably be more of a “where does Steve want to go today” sort of thing. And there’s only one person who can answer that.

Even if Apple decides against selling OS X for generic PCs, though, it isn’t going to stop people from doing it themselves. Unless Apple goes to great lengths, we’re going to see people taking bits of Darwin and grafting them onto the OS X for x86 install DVD and building their own OS X for PC systems. It’ll be just like XPostFacto all over again. It’ll be uglier then OS X on x86 Macs, and the bootloader will be downright strange, but it’ll happen. Or, alternately, someone could graft most of the bootloader and some of the hardware emulation into something like Xen; when running on a PC with either of the new virtualization technologies, that’d be a great way to get both OS X and Linux running on the same physical hardware.

It’s going to happen. Personally, I think it’ll take about two months after production x86 Macs start shipping before you see packaged instructions and tools for building a OS X PC. One of my co-workers thinks that the clock is going to start running as soon as Apple’s P4 development systems start shipping, but I doubt that we’ll see leaks this early–Apple sued people who leaked early Tiger images, and it’d be foolhardy to assume that Apple will ship complete x86 development systems without embedding serial numbers and identifying information all over the place.

Either way, though, there will be small numbers of non-Apple OS X PCs by late 2006. I wonder how Apple will respond.

Update: The rumors start.

Posted in  | Tags , , ,  | no comments

More Apple/Intel

Posted by Scott Laird Sun, 05 Jun 2005 15:02:32 GMT

So, I was convinced that CNet was being played on the Apple moving to Intel CPUs rumor. Then I read Scoble’s take on the topic, and I’m not so sure anymore. He claims that he’s personally confirmed it with people he knows, and that Apple’s going to announce a big move to x86 on Monday. John Gruber seems to be in a similar boat–it doesn’t make any sense, but both the WSJ and CNet are reporting it as fact, not as rumor, and it’s really unlikely that they’re both wrong.

But really, given the information that we have, it just doesn’t make sense. Apple may not be happy with IBM’s ability to speed up the G5 and build a laptop model, but dumping the PPC and moving to x86 seems like gross overkill.

Gruber speculates that Apple and Intel may be working on their own PPC chip, but I can’t really believe that–even if Apple does have the legal right to do that, I can’t see Intel going down that road. It’s too much work for too little profit.

So, like Gruber, I have to conclude that we’re only seeing part of the picture. He doesn’t seem to have a good theory on what’s happening, but here’s mine: Apple has decided that:

  • Tiger is basically as good now as Longhorn will be in late 2006.
  • 10.5 will be better then Tiger.
  • Windows is more vulnerable now then ever before–the burden of viruses and spyware is a crushing load for small companies.
  • Migrating to Longhorn will be a fairly traumatic event for smaller companies.
  • Most users only really need a limited amount of software: email, web browser, Office.
  • A lot of users have a very positive impression of Apple, thanks to the iPod.
  • Most users would really like to have a less-complex, easier to maintain, more reliable alternative to Windows.

Given that, what if Apple has decided that it’s time to bet the house on killing Windows’s monopoly on the PC desktop? Through some combination of Apple-branded X86 hardware, OS X-for-x86 (probably just x86-64), and maybe a Windows emulation environment. With Intel’s soon-to-be-shipping virtualization technology, it wouldn’t be that hard for Apple to get Windows to run underneath “OS X86”.

So, basically Apple will present Windows users with an option–run our software, and you’ll get our wonderful OS, no viruses, no spyware, and the ability to still run Windows if you really need to and you’re willing to pay for the extra license.

If this is the plan, then I wouldn’t even be stunned by Apple selling OS X for non-Apple-branded hardware. There’d be fairly limited hardware support (probably just brand-new x86-64 chips and video cards that Apple already supports, at least for now), but that probably won’t matter all that much, because you’ll be able to get a decent PC with the right specs for under $1k, and the same hardware will be usable under Windows as well.

Apple-branded X86 hardware would then be marketed as better-integrated, better-designed, better-built, and better-supported. They’d remain the BMW of computers. Plus, it’d come with an OS X license, which would make it somewhat price-competitive with buying a Dell (with Windows) plus OS X. A similar strategy seems to work for Sony, and their systems are legendary for dying after a year or so; Apple could probably make it work. Their profit margins would be lower, but they’d save a lot of hardware R&D money–they could probably get out of the ASIC business, for instance. That would let them concentrate more resources on the software side of things. If they could capture 20+% of the market, their revenue and profits would be substantially better then they are now.

So, is all this really going to happen? I have no clue. It’s an interesting theory, and it makes at least as much sense as anything else that I’ve read so far. It’s hugely out of line with what Apple has spent the last decade doing, but it’s agressive, and the timing is right.

Frankly, I’d probably be more inclined to believe it if we hadn’t seem similar rumors for years. So what’s different now? Why is it a good idea now when it wasn’t before? Two reasons: the iPod and spyware. Users have seen what Apple can do, and they’re buying iPods in droves, and at the same time they’re cursing their Windows systems and their amazing ability to collect crap off the net and install it behind the user’s back. Both of these are relatively new occurances, and they both play in Apple’s favor.

So, we’ll see tomorrow.

Update: Om Malik seems to believe something similar.

Posted in  | Tags , ,  | no comments

Apple to Intel, again

Posted by Scott Laird Sat, 04 Jun 2005 14:08:12 GMT

CNet is claiming, again, that Apple is about to dump IBM’s PowerPC chips and move to using Intel’s chips across their entire product lineup. They claim that Steve is going to announce this on Monday during the WWDC keynote.

The same basic rumor pops up every couple years. It usually seems to start with some stock analyst who believes that Apple is dying and the only way they can compete is to become just another PC company, but that doesn’t make a whole lot of sense this time around–it’s hard to claim that Apple is dying while surrounded by a sea of people in white headphones.

Gruber wrote about this a couple weeks ago, when the WSJ printed a rumor that Apple was going to start using Intel’s chips. It’s hard to argue with his basic point: even if Apple wanted to dump PPC chips and switch, it would take at least 18 months to get enough ISV support to allow them to launch products. And, during that 18-month window, no one would buy PPC-based Macs. Apple would be committing marketing suicide.

Personally, it’s unclear to me exactly why Apple would want to switch in the first place. The usual story is that Apple is unhappy with IBM’s progress on ramping the 970’s speed. Steve famously announced that the G5 would hit 3 GHz by this time last year, yet it’s still stuck at 2.7 GHz. That’s annoying and embarrassing, but it matches what the rest of the industry has seen. In the same two-year window, Intel has ramped the P4 from 3.06 GHz to 3.8 GHz (a 24% increase), while AMD’s top clock speed has went from the XP 3000+ (at 2.167 MHz) to their current top speed of 2.6 GHz (a 20% speed boost). So, IBM’s jump from 2.0 GHz to 2.7 GHz, while less then promised, is still better then the rest of the industry. As yesterday’s Mac article from Anandtech shows, the G5 isn’t exactly out of the performance game–it’s a bit slower then AMD’s fastest Opteron, but it’s competitive with a 3.8 GHz Xeon. It’s certainly not the laughingstock that the G4 was before the G5’s introduction.

Which, unfortunately, brings us to the one place where the G4 is still used: laptops. It’s been two years since the G5’s introduction, and people have been clamoring for G5 PowerBooks the whole time. Yesterday’s Gizmodo rumor aside, no one really expects a PowerBook G5 this year, and that really has to be bothering Apple. But, they can’t really change CPU families just to get faster laptops, can they?

The thing that bugs me is that CNet seems *so* sure of this–they have dates, they have names, they have a timeline for the switch. They clearly have what they believe to be a solid source. Apple is famous for their corporate secrecy. Is there any chance that they “leaked” a few Intel stories internally, just to see which ones ended up in the press?

I guess we’ll know on Monday.

Update:

Ars Technica has a nice bit on this rumor, including a good quote from Nathan Brookwood of Insight 64:

“If they actually do that, I will be surprised, amazed and concerned. I don’t know that Apple’s market share can survive another architecture shift. Every time they do this, they lose more customers” and more software partners, he said.

Ars Technica also pointed out an entry from Pavel Machek (a long-time Linux hacker)’s blog, where he claims that Apple offered him a job writing BIOS and ACPI code. ACPI is Intel’s power management spec, and there’s no* reason for Apple to use it when they control the entire hardware and software stack–it’s way too complex to bother with, *unless you need to integrate tons of generic hardware. Of course, this could just be an attempt to keep Darwin current on PC hardware. Or, I guess there’s a slim chance that Apple will start selling OS X for PCs without dumping their own hardware lineup. It seems a wee bit unlikely, though.

Update 2: I’ve changed my mind.

Posted in  | Tags , ,  | no comments

Intel I/O Acceleration Technology update

Posted by Scott Laird Wed, 02 Mar 2005 21:16:21 GMT

As mentioned earlier, Intel has been making noises about improving network I/O on PC servers. Today, at IDF, they released a few details on their plans. Apparently the presentation itself was good, but their web documentation is slim on details. Lennert Buytenhek summarized the important details, centering on the threading improvements:

[…] Rather than providing multiple hardware contexts in a processor like Hyper-Threading (HT) Technology from Intel, a single hardware context contains the network stack with multiple software-controlled threads. When a packet thread triggers a memory event a scheduler within the network stack selects an alternate packet thread and loads the CPU execution pipeline. Porcessing continues in the shadow of a memory access. […] Stall conditions, triggered by requests to slow memory devices, are nearly eliminated.

This isn’t exactly like the IXP2800, but there are some distinct similarities. In essence, it looks like Intel wants to provide the OS with the ability to task-switch on cache misses. I’m not sure that current OSes can switch threads much faster then the CPU can handle a cache miss, so this will be interesting to follow. I suspect that you could switch fast enough if you don’t touch the TLB or most of the CPU mode bits.

Intel also points out that with 10 GbE, just mitigating the effect of cache misses by processing multiple packets in parallel isn’t enough–packets actually arrive faster then the computer can fetch data from main memory–with 64 byte packets at 10 Gbps, a new packet arrives every 51.2 ns, which isn’t even long enough for a single main-memory access. According to Intel, normal packet processing requires 5 main memory reads. Intel’s fix for this is to add the ability to DMA directly into the CPU’s cache, and then add support for offloading memory copies onto the memory controller itself.

While Intel is aiming at improving network performance, I suspect that other types of processing may see big improvements from the planned changes. Video compression, for instance, can have horrible cache performance; I saw a study a while back that showed P4s running a MPEG-2 codec were averaging one instruction every 5 cycles during part of the processing, or way under 10% of what the CPU is capable of. A video codec that could compress several macroblocks at once, switching between them on cache misses, could easily see big speed boosts.

Posted in  | Tags , , , ,  | no comments

Linux, Intel, and TCP offloading

Posted by Scott Laird Wed, 23 Feb 2005 03:08:59 GMT

There’s an interesting thread going on right now on the Linux netdev mailing list, speculating about the network accelerator technology that Intel’s been talking about recently. No one’s quite sure what Intel is planning on adding, but for the past several years “network accelerator” has usually meant TCP offload engines (ToE), and Linux’s core networking guys are almost famously anti-ToE. Even though no one really knows what Intel’s up to, there’s a feeling that it’s not just ToE this time.

Several people have pointed out other technologies that can make a huge difference without requiring the sorts of compromises that ToE needs to work. For instance, this post by Lennert Buytenhek suggests that PCI and memory system latency is a big problem, but fixing it can have huge payoffs:

The reason a 1.4GHz IXP2800 processes 15Mpps while a high-end PC hardly does 1Mpps is exactly because the PC spends all of its cycles stalling on memory and PCI reads (i.e. ‘latency’), and the IXP2800 has various ways of mitigating this cost that the PC doesn’t have. First of all, the IXP has 16 cores which are 8-way ‘hyperthreaded’ each (128 threads total.)

I haven’t paid much attention to Intel’s IXP network processor family in the past, and that may be a mistake–from the description here, the IXP2800 sounds like a cross between Tera’s multithreaded CPU and IBM’s new Cell processor. Tera’s CPU, which was designed to support tons of threads, automatically switches between threads whenever one thread blocked due to I/O or memory access. The goal with Tera was to be able to remain efficient while the gap between CPU and memory speeds continued to grow. The IXP2800 isn’t as ambitious as the Tera, but the fundamental concept looks similar–support lots of threads in hardware, and switch when latency gets in the way. The IXP2800’s threaded CPUs aren’t full-blown processors, though–like the Cell, the IXP2800 contains one main CPU and a cluster of smaller domain-specific processors that are specialized for one specific task.

It’s unlikely that Intel will roll something like this into their Xeon CPUs anytime soon, though. It’s certainly not a quick fix–it’d require major changes in any OS that wanted to make use of it, and would probably take 3-6 years before it was really fully utilized.

Massively-multithreaded CPUs aren’t the only approach that has paid off for dedicated network processors, though. Some of FreeScale and Broadcom’s chips know how to pre-populate the CPU’s cache with headers from recently-received packets. This drastically cuts latency, but it seems to require that the CPU and network interface be very tightly coupled. Reducing the overhead needed to talk to the NIC can help, too–apparently some of Intel’s 865 and 875 motherboards use a version of their GigE chip that is connected directly to the north bridge, bypassing the PCI bus entirely, and some benchmarks show substantial improvements.

Reading the thread suggests that most of the effort going into Linux network optimization in the next few years will be happening on the receive end of things. Over the past several years, most higher-end NICs have added limited support for checksum generation and TCP segmentation offloading (TSO), where the CPU can hand the NIC a block of data and a TCP header template, and then have the NIC produce a stream of TCP packets without requiring the CPU to touch the data at all. Relatively little has happened on the receive side, but this seems to be changing. For example, Neterion’s newest card can separate headers from data, and is nearly able to re-assemble TCP streams on its own, sort of the inverse of transmit-time TSO. It’s not clear how many streams the card can handle at a time, though–even my little web server at home is currently maintaining 384 simultaneous TCP connections, and a busy system could easily have tens or hundreds of thousands of open streams. Odds are, throwing 100,000 steams at the card would run it out of RAM and completely negate any benefit that receive offloading would have. Unless it’s bright enough to be able to handle the 1,000 or so fastest streams and then let the main CPU handle the 99,000 that are dribbling data at 28k modem speeds.

This is a fascinating topic, and I can’t wait to see how this will turn out.

Posted in ,  | Tags , , , ,  | no comments

No P4 4.0 from Intel; bad times ahead?

Posted by Scott Laird Thu, 14 Oct 2004 18:12:14 GMT

According to CNET, Intel has dropped their plans for a 4 GHz Pentium 4 chip, replacing it with a 3.8 GHz chip with a 2 MB cache. Intel is spinning this as a deliberate move to distance themselves from the “more MHz is better” mindset:

Behind the shift is Intel President Paul Otellini, who wants the company to move away from focusing on increases in chip speed, measured in megahertz, as the primary way to increase performance. Intel has talked about such a shift for years, but remained fond of the clock-speed approach until recently. Speeches by executives about moving away from megahertz were often closely followed by announcements of faster chips.

Of course, the spin is wearing a bit thin on this–if Intel could release a 4 GHz P4, then they’d jump at the opportunity. It’s certainly cheaper to produce P4s with 1 MB of cache then with 2 MB; replacing their entire 1 MB line with 2 MB models will lower their profit (assuming that the replacement chips sell for more or less the same price).

On a similar note, Om Malik points out that Intel’s latest quarterly earnings were quite a bit worse then Intel’s been spinning.

Posted in ,  | Tags , , ,  | no comments

WiMAX in the news

Posted by Scott Laird Wed, 25 Aug 2004 20:26:54 GMT

There are two local WiMAX stories going around right now. First, Intel has apparently invested an unspecified amount of cash in Speakeasy. Presumably Intel is working on WiMAX hardware and they want Speakeasy to deploy it.

Second, Adaptix has come out of stealth. They’re based a mile or two from my house, and they’re apparently doing software-defined WiMAX. Pity they’re a hardware company, not an ISP, or I’d be all over them looking for a test connection.

WiMAX, also known as 802.16, is one of two contenders for licensed fixed-wireless service. The other contender is 802.20, which Nextel is testing in North Carolina. As I understand it, 802.20 is designed to handle Doppler correction from the beginning, while 802.16 is planning on retrofitting it at some point in the future. That means that 802.20 is more suitable for cellphone-like use, although it lags about a year behind 802.16 in the market.

Posted in  | Tags , , ,  | no comments