Verizon DSL speed upgrade
I was complaining to someone yesterday about my DSL bill, since I’m paying $80/month for 1.5/384 service. Out of curiosity, I took a look on Verizon’s business DSL website and noticed that they don’t sell 1.5/384 in my area anymore; the default is now 3.0/768 for the same price as before. So I called them up this morning and they’re doubling my line speed for free. It’ll take a day or two before it takes effect, but in a day or two I’ll either be:
- Happy with my new, fast service.
- Yelling at Verizon for completely screwing up my upgrade and knocking me off the net.
Anyone want to take bets on which one it’ll be?
Update: Well, I got my answer. I found this in the logs this morning:
Jul 27 00:22:35 guam kernel: wanpipe1: ADSL starting training 0x2 ... Jul 27 00:22:35 guam kernel: wanpipe1: GP_LINK_DOWN, Training State Jul 27 00:22:51 guam kernel: wanpipe1: Cell Delination successful Jul 27 00:22:51 guam kernel: wanpipe1: GP_LINK_UP, State Trained Jul 27 00:22:51 guam kernel: wanpipe1: ADSL Link connected (Down 3360 kbps, Up 704 kbps) Jul 27 00:22:59 guam kernel: wanpipe1: Link connected!
There was also a “congratulations on your upgrade” email in my inbox. So it looks like the speed upgrade went through without a hitch. I have to say, this is the first really nice experience I’ve had with Verizon DSL in years. I’m feeling a lot better about them then I used to–six months ago I was getting 768/128, now I’m getting 3.0/768 for the same price. I’m not feeling ripped-off any more.
A quick download test verifies that the speed really is faster–I’m getting 298 KB/sec from ftp-mirror.internap.com. Smokeping shows that my link’s lower-bound latency dropped overnight, and this morning’s latency graph is much less noisy then it was yesterday. Both of those are good for VoIP.
Update: I did a quick comparison with my old speeds. I was getting 1792/442, now I’m getting 3360/704. That’s an 88% increase in downloads and a 59% increase in uploads. Not quite the 2x boost that you’d expect, but still quite nice.
Apache ToS marking?
I’ve spent a fair bit of effort getting QoS on my home DSL link working right, so VoIP isn’t overwhelmed by downloads or by people hitting my web server. At this point, I’m down to one remaining problem–when Google and friends fire up their web crawlers and find a new directory full of JPEGs, they can slow other HTTP traffic to a crawl.
If I could tell Apache (2.0) to change the IP ToS flags associated with HTTP web crawler traffic, then my network’s QoS config would do the right thing and send user-driven HTTP traffic ahead of web crawler traffic. Unfortunately, I don’t see any obvious way to do this. I’d rather filter based on HTTP User-Agent, not network block, and that means either using a really smart packet filter or having my web server do the work on its own. And, as far as I can see, Apache 2 doesn’t have a ToS-setting module available. Dean Gaudet wrote mod_iptos for Apache 1.3, but it hasn’t been ported to Apache 2, and I’m not very eager to do it myself.
Does anyone have any suggestions?
Cisco buys Sipura
This isn’t exactly new news, but Cisco bought Sipura yesterday. Sipura makes a number of VoIP products, including the SPA-841 phone that I’ve been using for the past few weeks. They’re generally considered to have the best SIP implementation of any of the cheap vendors, and they make good, solid products for low prices. It’s a nice combination. Cisco has been licensing Sipura’s technology and using it in Linksys’s cheap VoIP hardware for around nine months now. Linksys has had to jump through a number of hoops to keep Sipura happy recently; apparently Sipura didn’t like customers buying the unlocked Linksys PAP2-NA instead of the more expensive Sipura SPA-2000. Now that Cisco owns both companies, I suspect that they’ll work out their differences.
Hopefully Cisco won’t gut Sipura to keep them from competing with Cisco’s more expensive products. The jury is still out on Cisco’s Linksys acquisition–they haven’t released many exciting new products since Cisco bought them, but they haven’t killed off any of their interesting product lines or tried to stop the flood of alternate Linux firmware distributions for the WRT54G family either.
One thing that’s interesting about this acquisition is that Sipura was formed by a bunch of ex-Cisco people. After Cisco bought Komodo in 2000, a bunch of the Komodo people left Cisco to go form Sipura. Now they’re back at Cisco again. This seems to be how Cisco does R&D these days–it spins employees off to work on their own products and then acquires them if they accomplish anything interesting. I’m not convinced that it’s a bad way to deal with R&D risk in a huge company–it shields Cisco from the cost of failure and promotes risk-taking by R&D engineers, but it doesn’t do anything to help unify Cisco’s massively fractured product lineup.
Cisco getting ready to push IOS XR on more routers
It looks like Cisco is finally starting to push their new, modular IOS code down from their uber-expensive CRS-1 router into their merely amazingly expensive routers. Network World is reporting that they’re almost ready to release a version of IOS XR for Cisco 12000-series routers. So now you’ll be able to run a semi-modern operating system that implements things like memory protection between processes on routers that cost under $500k. When they get under $100k, this might start to be interesting.
Cisco’s also going to release a line of “shared port adapters” that can be used in routers from the 7300, 7600, 12000, and CRS-1 product families. You need a SPA Interface Processor card for your router type, and then you plus SPAs into the SIP card. According to their picture, the SIP for the 7600 family can hold 4 SPAs, which means that the SPAs themselves must be fairly small–almost certainly smaller then the PAs that 7200/7400/7500 routers use. There are a pile of different SPAs on their list, from 8x cT1 to 10x GigE to 1x 10GigE, to OC 192.
Of course, in typical Cisco fashion, the “shared port adapters” aren’t really all that shared. There are 3 different SPA carrier cards for the 7600 series; one model is only good with the VPN SPA, and the other two overlap a bit–both are good with 2x or 4x OC-3 SPAs, but one supports 1x OC-12 while the other supports T1 and T3 SPAs. None of the three models support the GigE SPAs. Ther 12000-series has 2 different SIP models; one is good for T1/T3 use, while the other one is good for GigE and OC-192s. The CRS-1 SIP is even more fun–it supports POS OC-3s, POS OC-192, and GigE. No OC-12 or OC-48 support, apparently.
So, even though the cards are called “shared port adapters,” there are some real limits on which chassis will work with which cards. The DS3 SPA will apparently only work on the 7600 and 12000, and not on the CRS-1 or 7304. I suspect that a lot of this is mostly a driver support problem, but it shows how screwed up and fractured Cisco’s product lineup is.
Intel I/O Acceleration Technology update
As mentioned earlier, Intel has been making noises about improving network I/O on PC servers. Today, at IDF, they released a few details on their plans. Apparently the presentation itself was good, but their web documentation is slim on details. Lennert Buytenhek summarized the important details, centering on the threading improvements:
[…] Rather than providing multiple hardware contexts in a processor like Hyper-Threading (HT) Technology from Intel, a single hardware context contains the network stack with multiple software-controlled threads. When a packet thread triggers a memory event a scheduler within the network stack selects an alternate packet thread and loads the CPU execution pipeline. Porcessing continues in the shadow of a memory access. […] Stall conditions, triggered by requests to slow memory devices, are nearly eliminated.
This isn’t exactly like the IXP2800, but there are some distinct similarities. In essence, it looks like Intel wants to provide the OS with the ability to task-switch on cache misses. I’m not sure that current OSes can switch threads much faster then the CPU can handle a cache miss, so this will be interesting to follow. I suspect that you could switch fast enough if you don’t touch the TLB or most of the CPU mode bits.
Intel also points out that with 10 GbE, just mitigating the effect of cache misses by processing multiple packets in parallel isn’t enough–packets actually arrive faster then the computer can fetch data from main memory–with 64 byte packets at 10 Gbps, a new packet arrives every 51.2 ns, which isn’t even long enough for a single main-memory access. According to Intel, normal packet processing requires 5 main memory reads. Intel’s fix for this is to add the ability to DMA directly into the CPU’s cache, and then add support for offloading memory copies onto the memory controller itself.
While Intel is aiming at improving network performance, I suspect that other types of processing may see big improvements from the planned changes. Video compression, for instance, can have horrible cache performance; I saw a study a while back that showed P4s running a MPEG-2 codec were averaging one instruction every 5 cycles during part of the processing, or way under 10% of what the CPU is capable of. A video codec that could compress several macroblocks at once, switching between them on cache misses, could easily see big speed boosts.
Linux, Intel, and TCP offloading
There’s an interesting thread going on right now on the Linux netdev mailing list, speculating about the network accelerator technology that Intel’s been talking about recently. No one’s quite sure what Intel is planning on adding, but for the past several years “network accelerator” has usually meant TCP offload engines (ToE), and Linux’s core networking guys are almost famously anti-ToE. Even though no one really knows what Intel’s up to, there’s a feeling that it’s not just ToE this time.
Several people have pointed out other technologies that can make a huge difference without requiring the sorts of compromises that ToE needs to work. For instance, this post by Lennert Buytenhek suggests that PCI and memory system latency is a big problem, but fixing it can have huge payoffs:
The reason a 1.4GHz IXP2800 processes 15Mpps while a high-end PC hardly does 1Mpps is exactly because the PC spends all of its cycles stalling on memory and PCI reads (i.e. ‘latency’), and the IXP2800 has various ways of mitigating this cost that the PC doesn’t have. First of all, the IXP has 16 cores which are 8-way ‘hyperthreaded’ each (128 threads total.)
I haven’t paid much attention to Intel’s IXP network processor family in the past, and that may be a mistake–from the description here, the IXP2800 sounds like a cross between Tera’s multithreaded CPU and IBM’s new Cell processor. Tera’s CPU, which was designed to support tons of threads, automatically switches between threads whenever one thread blocked due to I/O or memory access. The goal with Tera was to be able to remain efficient while the gap between CPU and memory speeds continued to grow. The IXP2800 isn’t as ambitious as the Tera, but the fundamental concept looks similar–support lots of threads in hardware, and switch when latency gets in the way. The IXP2800’s threaded CPUs aren’t full-blown processors, though–like the Cell, the IXP2800 contains one main CPU and a cluster of smaller domain-specific processors that are specialized for one specific task.
It’s unlikely that Intel will roll something like this into their Xeon CPUs anytime soon, though. It’s certainly not a quick fix–it’d require major changes in any OS that wanted to make use of it, and would probably take 3-6 years before it was really fully utilized.
Massively-multithreaded CPUs aren’t the only approach that has paid off for dedicated network processors, though. Some of FreeScale and Broadcom’s chips know how to pre-populate the CPU’s cache with headers from recently-received packets. This drastically cuts latency, but it seems to require that the CPU and network interface be very tightly coupled. Reducing the overhead needed to talk to the NIC can help, too–apparently some of Intel’s 865 and 875 motherboards use a version of their GigE chip that is connected directly to the north bridge, bypassing the PCI bus entirely, and some benchmarks show substantial improvements.
Reading the thread suggests that most of the effort going into Linux network optimization in the next few years will be happening on the receive end of things. Over the past several years, most higher-end NICs have added limited support for checksum generation and TCP segmentation offloading (TSO), where the CPU can hand the NIC a block of data and a TCP header template, and then have the NIC produce a stream of TCP packets without requiring the CPU to touch the data at all. Relatively little has happened on the receive side, but this seems to be changing. For example, Neterion’s newest card can separate headers from data, and is nearly able to re-assemble TCP streams on its own, sort of the inverse of transmit-time TSO. It’s not clear how many streams the card can handle at a time, though–even my little web server at home is currently maintaining 384 simultaneous TCP connections, and a busy system could easily have tens or hundreds of thousands of open streams. Odds are, throwing 100,000 steams at the card would run it out of RAM and completely negate any benefit that receive offloading would have. Unless it’s bright enough to be able to handle the 1,000 or so fastest streams and then let the main CPU handle the 99,000 that are dribbling data at 28k modem speeds.
This is a fascinating topic, and I can’t wait to see how this will turn out.
Sangoma S518 PCI ADSL Modem Review
As regular readers know, I recently turned up a new DSL circuit at home, replacing an older, slower line that Verizon had refused to upgrade for months. As part of the upgrade process, I needed to buy a new DSL modem. Instead of using an external DSL modem (DSL-Ethernet bridge would probably be more accurate, but “modem” seems to have stuck), I decided to buy a Sangoma S518 PCI ADSL modem. I had two main reasons for preferring this internal modem to a generic external model:
- Better control over upstream buffering, for better VoIP QoS.
- Better visibility into the modem’s state, so I can syslog minor outages and notice things like speed changes.
I chose the Sangoma model instead of a cheap, generic card because the manufacturer strongly supports its use with Linux, and a number of people on the Asterisk-Users mailing list have recommended it. I paid $115 plus shipping from BSD Mall.
Packaging and physical installation
The card arrived via UPS about a week after I ordered it from BSD Mall. Historically, Sangoma has mostly made cards for T1s and leased lines; the S518 is their lowest-end product, but it uses the same configuration tools and drivers as their more expensive cards. Since this is Sangoma’s only product line, and Linux is a big part of their market (and has been for over 10 years), the drivers are much more mature and stable then you’d really expect for a DSL card.
The packaging for the card is fairly generic–it looks like Sangoma uses the same box for all of their products. Inside, there was a RJ11 cord, the PCI card, wrapped in an anti-static bag and bubble-wrap, a manual, and a CD with drivers.
Drivers
I downloaded their most recent drivers from their web site a few days before the card arrived and pre-installed them, so I’d be ready to install the card as soon as it arrived. After untarring the drivers, all you have to do to install them is run ./Setup install and their setup script takes care of everything else. Rather impressively, it located the source for the 2.6 kernel that I’m running, patched it (after asking my permission), built new kernel modules for everything needed, installed them, and then compiled and installed the user tools for configuring their interfaces. It also installed a startup script into /etc/init.d/wanrouter and created all of the right links in /etc/rc*.d to make sure that it starts on boot. The fact that it all worked correctly on my Debian unstable system was rather impressive, and a sign that Sangoma’s been doing this for a while.
Configuration
Configuring the card was also relatively easy. The setup program installed a tool called wancfg that provides a curses-based UI for setting up their network interface cards. It took me a couple minutes to tell it that I was about to install a S518, guess which ADSL encapsulation I’d need, and tell it to assign the new interface the name dsl0 and a dummy IP address.
Initial Use
Once the drivers were installed and configured, I shut the system down, installed the card in a spare PCI slot, and rebooted. The system came back up, loaded the S518 drivers successfully, set up the ADSL interface using the specs that I’d provided, and started training. After about 30 seconds, it told me that training had failed, and it couldn’t find a signal on the line. Since the line wasn’t supposed to be live for two more days, this didn’t seem like a problem. I left the interface installed in the box, perpetually attempting to re-train, and went to bed.
The problem
At 8:19 the next morning, Verizon finished configuring their end of the DSL line. The S518 immediately trained on the line, syslogging:
Feb 9 08:19:22 guam kernel: wanpipe1: ADSL Link connected (Down 1792 kbps, Up 448 kbps)
Feb 9 08:19:30 guam kernel: wanpipe1: Link connected!
About two seconds after logging the last line, the system locked up. I rebooted the box, only to discover that the system could no longer see the S518 card. Even lspci failed to detect anything–the card was locked up so hard that PCI bus probing no longer worked. I had to power the system down before I could access the card. I tried rolling the driver back to the previous stable version, without any luck, and rolled it forward to a newer beta, but that didn’t work, either. Everything was stable until it trained, so I unplugged the DSL line from the S518 and left for work.
During the day, I contacted Sangoma’s tech support department and asked what was wrong. Then emailed me back within 15 minutes and asked a few questions – “is this a 64-bit system in 64-bit mode?” for example. They suggested several things that I’d already tried–rolling forward or back to new releases. They looked through my lcpci output and noticed the Digium cards that I use with Asterisk and suggested removing them, just for testing. Their support engineer admitted he was grasping at straws–they had the same card with the same drivers and kernel working in the labs. They suggested re-compiling the driver, targeting it for a generic 386 kernel, instead of the Athlon-optimized version that built by default.
That night, I tried the re-compiled drivers: no luck. I pulled both Digium cards: no luck. It still crashed immediately after training.
Finally, I moved the S518 to a new PCI slot, and discovered that that worked perfectly. So, either the S518 driver had a hard time sharing an IRQ with my Ethernet card, or I have a bad PCI slot in my system. Since it’s a cheap motherboard, and it’s almost 5 years old, I’m going to go with ‘bad slot’. I re-installed the Digium cards, cleaned everything up, and it all continued working perfectly.
I was very pleased with Sangoma’s support–they seemed competent, they responded quickly to my request for help, they asked sensible questions, made decent suggestions, and didn’t disappear once they ran out of easy fixes. Frankly, this is why I paid more then I had to for their card–good support. In the future, when people come to me for recommendations on T1 or ADSL cards, I’m going to recommend Sangoma without any reservations at all.
ADSL and IP configuration
By this point, Verizon had emailed me my new static IP address, so I configured dsl0 for the right IP and tried pinging my gateway. No luck–I couldn’t ARP the gateway, so ping wasn’t working. I fired up tcpdump, and saw that there was traffic on the link, but it didn’t look right–the source MAC address was 00:00:00:ff:ff:ff and the Ethernet frame type was ffff. So, most likely, the DSL framing option that I’d picked when I’d configured the link was wrong. I fired wancfg back up and looked over my options:
- Bridged Eth LLC over ATM (PPPoE)
- Bridged Eth VC over ATM
- Classical IP LLC over ATM
- PPP (LLC) over ATM
- PPP (VC) over ATM (PPPoA)
I was pretty sure that I was being fed bridged Ethernet, although Verizon hadn’t actually told me what they were using, so I’d picked ‘Bridged Eth VC over ATM’. I looked at the configuration screen for a minute, deciding which one to try next, when I noticed the next line down: ATM_AUTOCFG-> NO. I set that to yes, ran /etc/init.d/wanrouter restart, and watched syslog. Within 30 seconds, it reported that the link was up, that there was traffic on VCI 35, VPI 0 (which was the default in wancfg), but that it wasn’t framed right. The kernel driver said that it was expecting Bridged Eth VC over ATM, but it was seeing Bridged Eth LLC instead. So I ran wancfg, turned off autoconfig, and changed the encapsulation to ‘Bridged Eth LLC over ATM (PPPoE)’, saved, and re-ran /etc/init.d/wanrouter restart. As soon as it came back up, I was able to ping the gateway; everything was working. For what it’s worth, the PPPoE in the encapsulation name is a complete misnomer in this case–there’s no PPP involved anywhere in this system.
If I’d spent a bit of time with Google, I probably wouldn’t have had to fiddle with encapsulation settings, but it was nice to see that it could auto-detect it for me.
Once I was confident that the link was up, I changed IP addresses in DNS, edited my system startup scripts to use the new IP address and device dsl0 instead of eth1 and rebooted. I noticed that the Sangoma /etc/init.d/wanrouter was set to run at step 20 in /etc/rc3.d, while my IP configuration script ran way earlier, and this was causing problems, because DNS was failing for some system services, like NTP and Apache, because the WAN link wasn’t up before they started. So I deleted the calls to start wanrouter out of /etc/rc*.d, and then called it by hand right before configuring my IP addresses and firewall. One more quick reboot, and everything seems to be working fine. I was able to download a kernel image over the DSL line at nearly 150 KB/sec, which was around twice as fast as before.
QoS
My main reason for ditching my old DSL line and modem and installing a new DSL line with the S518 was lower latency and jitter for VoIP. With normal external Ethernet-to-DSL modems, the modem had a buffer that it uses to hold outgoing packets. If you’ve every tried uploading a large file over DSL, only to discover that it takes 5 or 6 seconds to ping packets to cross the link, you’ve discovered the joys of big buffers in DSL modems. While a 5 second delay is bad for SSH, it’s horrible for VoIP.
The usual way around this is to install a rate limiter on your router’s outbound Ethernet interface, like Wondershaper on Linux. This works by limiting how fast your router’s Ethernet interface can transmit data. If you slow the router down so it’s just a bit slower then the DSL line, then the router can prioritize packets and let VoIP packets go to the head of the line, without letting the DSL modem receive enough traffic to fill up its buffers. This works, but it’s always a spotty thing–for best results, you need to set Wondershaper to be a bit slower then your DSL line, so you lose some performance there. In addition, Wondershaper’s default settings don’t really pay full attention to the ToS headers on IP packets, so you have to spend some time tweaking its idea of high-priority and low-priority traffic. In addition, it’s really hard to make changes–Linux’s QoS tools are powerful, but they’re complex and hard to understand.
The first thing that I did when I brought the S518 up was to turn off Wondershaper completely and see how well the kernel’s default QoS scheme (pfifo_fast) worked. By default, Linux prioritizes packets based on the ToS field on the IP header, and most tools actually seem to set the header to reasonable values. Or, at least Asterisk and BitTorrent both use reasonable settings. Since the S518 doesn’t have a buffer built into it, the kernel’s native queueing works perfectly, and I’m seeing nearly perfect Asterisk VoIP performance, even without a complex set of shaping tools.
Since this was my primary goal when I bought the S518, I’m quite pleased with the card.
Conclusion
I’d strongly recommend this card to anyone with a need for decent QoS over ADSL, as long as they have the technical skills needed to get it to work. As mentioned, the drivers did a good job of installing themselves, and Sangoma’s tech support is good, but it still took some understanding to get the system working correctly. Sangoma supports most of the *BSDs as well as Linux and Windows, so the only people left out in the cold are the ones trying to use OS X as a router.
In addition, based on what I’ve seen of Sangoma’s drivers and toolset, as well as their tech support, I’d recommend that people in the market for 1-4 channel T1 cards for data or voice check their offerings out as well. Sangoma supports Asterisk directly on their T1 cards, and while they’re slightly more expensive then Digium’s cards, they probably come with better support. Given what I’ve seen of their tools, setting up data T1s with Sangoma’s drivers looks like child’s play.
If Sangoma’s looking for suggestions, I’d love to see a model of the S518 that can act as a FXO card with Asterisk while still acting as a ADSL card. One card could handle voice and DSL at the same time. The market for this isn’t huge today, but it’s a relatively simple change that could have huge benefits as Asterisk grows.
Success: My new DSL line is up and running
I can’t believe it. After months and months of trying to upgrade my DSL, everything is finally up and running on my new DSL line. The line wasn’t supposed to go live until Friday evening, but Verizon sent me mail today with my IP address and claimed that it’s up and running. And, indeed, it is.
Unfortunately, I wasn’t quite as prepared as I thought I was; I’d forgotten to change the DNS TTL on scottstuff.net, so people may have a hard time getting through to this site for a couple days. Oops. Other then that, though, everything seems to be up and running perfectly. The sample size is still kind of small, but smokeping suggests that my average ping time has improved drastically–I was seeing numbers from 25 to 800 ms before. Right now, the line looks flat at 30 ms. That’ll probably break a bit once traffic picks back up on the website, but I should be able to keep it under 100 ms, easy.
I’ll post more details later tonight or tomorrow, along with a review of the Sangoma S518 PCI ADSL card. I have to put the kids to bed first, though.
DSL progress
My DSL modem showed up yesterday, so I dropped it into my gateway box and fired it up. It immediately reported that it was unable to train; there was nothing to talk to on the other end of the phone line yet. Since my official install day is still a couple days out, that didn’t surprise me. Then this morning, I saw this in the logs:
Feb 9 08:19:22 guam kernel: wanpipe1: ADSL Link connected (Down 1792 kbps, Up 448 kbps)
Feb 9 08:19:30 guam kernel: wanpipe1: Link connected!
Feb 9 08:41:03 guam kernel: klogd 1.4.1#11, log source = /proc/kmsg started.The gap between the second and third lines is the problem–the box went down, hard, right after the DSL line came up. On the other hand, it looks like I’m provisioned above 1.5/384 on the ATM side. Assuming a 20% cell tax, this gives me a usable connection of around 1430 kbps down and 360 kbps up, which isn’t too bad. Now I just have to keep the thing from crashing. I’m rolling my ADSL drivers back from the beta version that I’d started with to the most recent release; hopefully that’ll be good enough to fix my problem.
DSL: 6 days and counting
I have a date now for my new DSL line: February 11th. My new DSL PCI card is in UPS’s hands, too. I’m starting to believe that my months-long DSL upgrade quest is nearly complete.