Posted by Scott Laird
Tue, 30 Aug 2005 00:39:00 GMT
Oh, great–first the government mucks with DST, and now leap seconds are back for the first time in 7 years. I was starting to think that we were done with them for good.
My timezone is going to have an extra second added at 3:59:60 PM on December 31st, 2005. Fun; I wonder how many of the devices that I deal with will do the right thing with the extra second. Odds are most of them will just end up an extra second off. I assume that NTP has a way of dealing with this, although it might just be outside of the protocol’s scope–leap seconds really just change the seconds-since-some-epoch to human-visible-date mapping. (Update: it’s complicated)
Since leap seconds aren’t new, and I don’t really care about sub-second timing precision on any of my devices, I doubt I’ll even notice the change, although undoubtedly there are devices on the market that will have problems; I wouldn’t be surprised if there’s a cheap GPS receiver somewhere with leap seconds issues.
This reminds me of two of the pedantic sysadmin interview questions that I’ve never really had the guts to ask a real candidate–“exactly how many hours are in a day?” and “how many seconds are there in a minute?” Strictly speaking, the answers are “23, 24, or 25, depending on DST transitions” and “59, 60, or 61, depending on leap seconds.” The 23/24/25 thing actually bites new sysadmins–never schedule something that needs to happen exactly once per week to happen between 2:00 and 3:00 local time on a Sunday morning, because once per year it won’t happen at all, and another time it’ll happen twice.
Tags leapsecond, ntp, sysadmin, time, timekeeping | 1 comment
Posted by Scott Laird
Wed, 20 Jul 2005 08:23:36 GMT
There’s a surprisingly small amount of documentation out there on tuning Apache for optimum Rails performance. Almost everyone mentions the first step (use FastCGI, not regular CGI), but that’s such a huge performance boost that it’s really obvious–waiting 2-3 seconds per hit for Rails to start up is an indicator that you’re doing something wrong.
Once you get past that, there’s not a lot of documentation. There are examples from place to place, but no one seems to discuss what they mean or why they should be used.
Ever since I switched to Typo, I’ve been seeing occasional HTTP 500 errors from Apache, suggesting that Apache was unable to talk to Typo. Looking in the logs shows that Apache was usually in the middle of restarting a FastCGI instance whenever the errors occurred. Digging through the mod_fastcgi shows that FastCGI can work in three different modes with Apache:
- Static. FastCGI servers are started when Apache is reloaded and remain running.
- Dynamic. FastCGI server processes are started whenever a FastCGI URL is hit. Excess processes are killed off when there’s no traffic.
- External. Apache and your FastCGI app talk via TCP sockets.
Dynamic mode is the default, but that’s not a good fit for Rails, because of its slow startup time. Switching to static mode really helps. To do that, I added this line to /etc/apache2/apache.conf on my Debian server:
FastCgiServer /var/web/typo/public/dispatch.fcgi -idle-timeout 120 \
-initial-env RAILS_ENV=production -processes 2
Notice that I had to list the full path to Rails’s dispatch.fcgi file; on some systems you may be able to get away with only listing public/dispatch.fcgi, but that will almost certainly not work if you’re using virtual hosting.
By default, FastCGI assumes that your server will respond to queries within 30 seconds. I added the -idle-timeout 120 parameter just so I can deal with really slow responses better. Typo’s article admin page currently tries to list all 466 articles on one page, and that can take over 30 seconds to process.
The -processes parameter tells Apache how many FastCGI processes should run for this application. For 95% of users, 1 or 2 will be best. If you get a lot of traffic, then raising this to 3-4x the number of CPUs in your system might get you slightly better performance.
Finally, the -initial-env bit makes sure that Rails runs in production mode, talking to my production DB and not returning error backtraces to the user.
Posted in Typo, Ruby, Web stuff | Tags apache, fastcgi, rails, rubyonrails, sysadmin, tuning | 40 comments
Posted by Scott Laird
Wed, 29 Jun 2005 18:12:17 GMT
I just locked myself out of a remote server for the first time in years. I’m usually better then that, but I finally screwed up and typed something that ended up requiring local intervention.
I’m going to blame all of the GNU tools for this–GNU getopt almost universally allows you to enter command-line flags anywhere on the command line. So ls -l foo and ls foo -l are equivalent. Frequently, if I need to add a new flag to an existing command line, I’ll just tack it on at the end rather then using the arrows to go back a word or two.
Unfortunately, sometimes the order matters. For instance, kill -1 1234 and kill 1234 -1 do very different things. The first one sends SIGTERM to process 1234. The second one sends SIGTERM to process 1234, as well as every other process on the system.
Oops.
Posted in Computer System Administration | Tags oops, sysadmin | no comments
Posted by Scott Laird
Thu, 28 Apr 2005 18:20:56 GMT
Phil Windley says that Visa and Mastercard are starting to crack down on small merchants, requiring them to meet some sort of minimum information security standards or lose the ability to accept Visa or Mastercard purchases online. This is clearly a good thing.
He lists 12 basic requirements:
- Install and maintain a working firewall to protect data
- Keep security patches up-to-date
- Protect stored data
- Encrypt data sent across public networks
- Use and regularly update anti-virus software
- Restrict access by “need to know”
- Assign unique ID to each person with computer access
- Don’t use vendor-supplied defaults for passwords and security parameters
- Track all access to data by unique ID
- Regularly test security systems and processes
- Implement and maintain an information security policy
- Restrict physical access to data
The actual questionnaire from Visa goes into a lot more detail (“Do changes to the firewall need authorization and are the changes logged?”). A quick skim of the questionnaire shows a bit of Windows bias (you can’t pass unless you have virus scanners on all your servers–that’s kind of weird in a Unix environment), but it looks like a great step forward. It’s nice to see someone in a position of influence raising the security baseline.
Posted in Computer Security | Tags cryptography, mastercard, security, standards, sysadmin, visa | 1 comment
Posted by Scott Laird
Sun, 20 Mar 2005 13:24:46 GMT
Playing with my new PC:
# apt-get install xemacs21
...
The following NEW packages will be installed:
xemacs21 xemacs21-basesupport xemacs21-bin xemacs21-mule
xemacs21-mulesupport xemacs21-support
0 upgraded, 6 newly installed, 0 to remove and 21 not upgraded
Need to get 33.6MB of archives.
After unpacking 106MB of additional disk space will be used
Choke, choke. xemacs needs 106 MB of disk space?
Posted in Linux | Tags debian, sysadmin, xemacs | 1 comment
Posted by Scott Laird
Mon, 14 Mar 2005 19:17:29 GMT
Debian is easily my favorite Linux distribution. It has its issues (horrific installer, tends to value ideology over technology, glacial release schedule), but its core is fantastic, and I’ve grown used to all of its quirks over the years. I think I installed my first Debian box in 1996 starting with either the buzz or rex release (Debian names all of its releases after characters from Toy Story) and I’ve been running at least one Debian system ever since.
One thing about Debian is that it has historically tried to support every platform under the sun. At last count, there are 11 supported Debian platforms, from PCs to PDA-like devices to IBM mainframes. According to The Register, this is going to change, with Debian dropping 7 of their 11 platforms, largely in an attempt to speed up the Debian release process. There are examples where security bugfixes have been delayed for months because the fix won’t build properly on an uncommon platform. In other cases, it appears that there just isn’t enough CPU power to keep up with the build load on older platforms, so the build just keeps falling further and further behind.
The plan isn’t really to totally discontinue support for these 7 architectures, but rather to move them to a new Debian “second-class citizen” support system and not include them new releases.
The platforms that will continue to be supported are:
- i386
- amd64 (Athlon64/Opteron/new Intel 64-bit x86)
- powerpc
- ia64 (Itanium)
Posted in Linux | Tags debian, platforms, sysadmin | no comments
Posted by Scott Laird
Tue, 08 Mar 2005 00:36:14 GMT
I’ve been watching Xen for a while now, and I’m nearly ready to take the jump and do some testing with it. I’m thinking about ordering a cheap Athlon 64 box for home to use as a testbed for the lightweight server concept that I’ve been kicking around for years. In the 18 months that have passed since I last talked about it, virtualization on the PC has advanced by leaps and bounds; at the time, I was looking at UML, which wasn’t really fast or stable enough. Xen looks to be both fast and stable, and it has a clear migration path onto the virtualization hardware offered by the next generation of PC hardware. That makes it nearly ideal for my purposes.
Posted in Linux, Xen, Computer System Administration, LWVS | Tags sysadmin, xen | no comments
Posted by Scott Laird
Wed, 02 Mar 2005 21:16:21 GMT
As mentioned earlier, Intel has been making noises about improving network I/O on PC servers. Today, at IDF, they released a few details on their plans. Apparently the presentation itself was good, but their web documentation is slim on details. Lennert Buytenhek summarized the important details, centering on the threading improvements:
[…] Rather than providing multiple hardware contexts in a
processor like Hyper-Threading (HT) Technology from Intel, a
single hardware context contains the network stack with
multiple software-controlled threads. When a packet
thread triggers a memory event a scheduler within the network
stack selects an alternate packet thread and loads the CPU
execution pipeline. Porcessing continues in the shadow of a
memory access. […] Stall conditions, triggered by requests
to slow memory devices, are nearly eliminated.
This isn’t exactly like the IXP2800, but there are some distinct similarities. In essence, it looks like Intel wants to provide the OS with the ability to task-switch on cache misses. I’m not sure that current OSes can switch threads much faster then the CPU can handle a cache miss, so this will be interesting to follow. I suspect that you could switch fast enough if you don’t touch the TLB or most of the CPU mode bits.
Intel also points out that with 10 GbE, just mitigating the effect of cache misses by processing multiple packets in parallel isn’t enough–packets actually arrive faster then the computer can fetch data from main memory–with 64 byte packets at 10 Gbps, a new packet arrives every 51.2 ns, which isn’t even long enough for a single main-memory access. According to Intel, normal packet processing requires 5 main memory reads. Intel’s fix for this is to add the ability to DMA directly into the CPU’s cache, and then add support for offloading memory copies onto the memory controller itself.
While Intel is aiming at improving network performance, I suspect that other types of processing may see big improvements from the planned changes. Video compression, for instance, can have horrible cache performance; I saw a study a while back that showed P4s running a MPEG-2 codec were averaging one instruction every 5 cycles during part of the processing, or way under 10% of what the CPU is capable of. A video codec that could compress several macroblocks at once, switching between them on cache misses, could easily see big speed boosts.
Posted in Computer Networking | Tags hardware, intel, ioacceleration, sysadmin, threading | no comments
Posted by Scott Laird
Wed, 23 Feb 2005 03:08:59 GMT
There’s an interesting thread going on right now on the Linux netdev mailing list, speculating about the network accelerator technology that Intel’s been talking about recently. No one’s quite sure what Intel is planning on adding, but for the past several years “network accelerator” has usually meant TCP offload engines (ToE), and Linux’s core networking guys are almost famously anti-ToE. Even though no one really knows what Intel’s up to, there’s a feeling that it’s not just ToE this time.
Several people have pointed out other technologies that can make a huge difference without requiring the sorts of compromises that ToE needs to work. For instance, this post by Lennert Buytenhek suggests that PCI and memory system latency is a big problem, but fixing it can have huge payoffs:
The reason a 1.4GHz IXP2800 processes 15Mpps while a high-end PC hardly
does 1Mpps is exactly because the PC spends all of its cycles stalling on
memory and PCI reads (i.e. ‘latency’), and the IXP2800 has various ways
of mitigating this cost that the PC doesn’t have. First of all, the IXP
has 16 cores which are 8-way ‘hyperthreaded’ each (128 threads total.)
I haven’t paid much attention to Intel’s IXP network processor family in the past, and that may be a mistake–from the description here, the IXP2800 sounds like a cross between Tera’s multithreaded CPU and IBM’s new Cell processor. Tera’s CPU, which was designed to support tons of threads, automatically switches between threads whenever one thread blocked due to I/O or memory access. The goal with Tera was to be able to remain efficient while the gap between CPU and memory speeds continued to grow. The IXP2800 isn’t as ambitious as the Tera, but the fundamental concept looks similar–support lots of threads in hardware, and switch when latency gets in the way. The IXP2800’s threaded CPUs aren’t full-blown processors, though–like the Cell, the IXP2800 contains one main CPU and a cluster of smaller domain-specific processors that are specialized for one specific task.
It’s unlikely that Intel will roll something like this into their Xeon CPUs anytime soon, though. It’s certainly not a quick fix–it’d require major changes in any OS that wanted to make use of it, and would probably take 3-6 years before it was really fully utilized.
Massively-multithreaded CPUs aren’t the only approach that has paid off for dedicated network processors, though. Some of FreeScale and Broadcom’s chips know how to pre-populate the CPU’s cache with headers from recently-received packets. This drastically cuts latency, but it seems to require that the CPU and network interface be very tightly coupled. Reducing the overhead needed to talk to the NIC can help, too–apparently some of Intel’s 865 and 875 motherboards use a version of their GigE chip that is connected directly to the north bridge, bypassing the PCI bus entirely, and some benchmarks show substantial improvements.
Reading the thread suggests that most of the effort going into Linux network optimization in the next few years will be happening on the receive end of things. Over the past several years, most higher-end NICs have added limited support for checksum generation and TCP segmentation offloading (TSO), where the CPU can hand the NIC a block of data and a TCP header template, and then have the NIC produce a stream of TCP packets without requiring the CPU to touch the data at all. Relatively little has happened on the receive side, but this seems to be changing. For example, Neterion’s newest card can separate headers from data, and is nearly able to re-assemble TCP streams on its own, sort of the inverse of transmit-time TSO. It’s not clear how many streams the card can handle at a time, though–even my little web server at home is currently maintaining 384 simultaneous TCP connections, and a busy system could easily have tens or hundreds of thousands of open streams. Odds are, throwing 100,000 steams at the card would run it out of RAM and completely negate any benefit that receive offloading would have. Unless it’s bright enough to be able to handle the 1,000 or so fastest streams and then let the main CPU handle the 99,000 that are dribbling data at 28k modem speeds.
This is a fascinating topic, and I can’t wait to see how this will turn out.
Posted in Linux, Computer Networking | Tags intel, linux, networking, sysadmin, toe | no comments
Posted by Scott Laird
Mon, 05 Apr 2004 23:18:35 GMT
topix.net has a very interesting article on Google, claiming that the single biggest thing Google has going for them right now is their ability to manage a 100,000-node distributed computing system, and then use the system to deploy new services that are nearly unthinkable using traditional mechanisms. In essence, they have the RAM and CPU power to keep the entire net cached in RAM and perform queries against it, even though individual nodes are dying and being rebuilt all the time.
Sounds like a fun job.
Posted in Computer System Administration | Tags google, sysadmin | no comments
Posted by Scott Laird
Wed, 10 Sep 2003 18:25:58 GMT
I try to be pragmatic, but sometimes I just can’t help it and try to pick up lost causes. I think the world would be a better place if computers were easier to maintain. Fortunately, I’m a server person, and the server side of things is actually a lot easier then the desktop side, at least for now.
Warning: most of my experience is with Linux and Solaris boxes in ISP-like settings, although I’ve done a fair bit of time in small non-computer-related businesses and software houses. I have no idea how much of this applies to Windows.
I’ve been thinking about server management for years. Sometimes, I’ve been paid for it, sometimes (like now), I’m paid for other things. I still can’t stop thinking about it, though. There has to be a better way to manage servers then we’re doing now. As I mentioned yesterday, I think I might have a solution for at least a few common cases.
Traditionally, there are two models for server deployment. Either the heavyweight model (deploy a small number of servers and run lots of services on each) or the lightweight model (deploy a lot of servers, and run a small number of services on each). One of the problems is that, at least for small services, the heavyweight model seems cheaper. Why buy 10 servers that are going to sit 95% idle when you could buy 2 servers and have them be 75% idle? Or even one server that’ll only be 50% full. What happens pretty much every time is that a couple of the services start conflicting with each other somehow–one needs perl 5.6 for something, while another needs 5.8. Or they need two different versions of the JVM. Or one needs a critical security upgrade that ends up killing another service. So, you keep tweaking things, and you (barely) keep everything running, largely by avoiding making changes. Except, when you avoid making small changes, you inevitably miss little security fixes and little bug fixes, and you drift further and further from the mainline of whatever OS you’re running. So, inevitably, you reach the “server event horizon,” where things have grown so complex and unmanageable that the only thing you can do is buy 2-3 new computers to replace your one big system, and then slowly migrate services off of the old box onto the newer box. Except you end up with a lot of implicit assumptions lurking, assuming that DNS and DHCP are on the same server, or that Apache and Mysql are on the same box, and it takes forever to untangle them. Even once that’s done, you’ll find out that people have hard-coded server names into applications deployed all over the company, and you’ll end up spending 3 months untangling your one heavy-weight server that seemed like such a good way to save money at the time.
Conventional wisdom says that the way out of this problem is virtualization. Instead of buying 10 small computers, you buy one or two really big computers, and then partition them in software, and then install the software that you would have installed onto the little computers onto the partitions of the big computers. Lots of vendors love this model; IBM’s whole Linux-on-mainframes push is based on it. There are a couple problems with it as I see it, though. First, you’ll end up paying a ton of money for virtualization hardware or software–VMWare wants at least $2,500 per server for their PC-based virtualization code; pretty much everything else else is more expensive. Second, you’re still left with a bunch of small general-purpose servers that you need to manage individually, even if they do happen to physically reside within a single box. There are also reliability issues, but I’m going to ignore them for now; in my experience, even cheap PCs running Linux rarely crash, and when they do it’s usually a bad power supply, a bad hard drive, or bad RAM. Spending more money on hardware gives you multiple power supplies, better RAID, and more redundant memory. Plus buggy virtualization software, but we’ll come back to that, too.
Fortunately, the open-source world is making progress. User-mode-linux (UML) is making a lot of headway. It’s included in Linux 2.6, although it still needs a few little patches for optimum operation. It seems to have a 30% speed hit in a lot of cases; sometimes that’s a problem, sometimes it isn’t. Using it, you can build a big Linux host server, and then run a bunch of little virtualized servers on it for free. Sounds nice? Sort of–you still have to admin a ton of little general-purpose boxes, but at least you’ve mostly solved the dependancy problem that killed us a few paragraphs ago.
The nice thing about Linux is that it’s so flexible. Unlike every other OS that I’m aware of, there’s no one environment that is definitively Linux. Instead, we have a herd of Linux distributions, ranging from Red Hat to Debian to Gentoo to “Linux From Scratch” all the way down to the mini Linux distribution that wireless access point vendors call “firmware.” There’s no real reason for a special-purpose DNS server to run a full Linux distribution, except that it’s usually less work that way. However, once we have a UML-based virtualization scheme in place, it can actually be easier to use specialized distributions then general-purpose ones. I mean, the hard parts of a distribution are generally the installer, the hardware handling, and the update code. With a virtualized server, none of that applies. There is no hardware, really (it all pretends to be the same), the installer is really just a script that copies a hard disk image into place on the host system, and the update system is even easier–just save the data and completely discard the old OS image. In an ideal world, the OS image would be completely read-only, with configuration settings and data kept outside of the server in a standardized format. Then, software upgrades are truly trivial–kill off the old server VM and start up a new VM using the old data.
This won’t work for everything, of course. It’d be horrible for big database servers, or frankly big servers of any type. In my experience, though, there are a lot more small servers then there are big servers.
The other nice thing about this scheme is that the virtual server images are simple to build and easy to trade. They’re not utterly trivial, but it’s easier to build a server image then to actually write the server software or or maintain a full-sized OS distribution. Given a standardized interface between the host OS and the server image (things like IP address, DNS server, hostname, logging, and all of the other little details needed to make a server run), there’s no real reason that you can’t swap between server images from different “vendors”, grabbing whatever best serves your needs.
I’m starting to build a framework for this, more details as I have time to write them down.
Posted in Linux, LWVS | Tags management, management, server, server, servermanagement, servermanagement, sysadmin, sysadmin | 1 comment
Posted by Scott Laird
Wed, 10 Sep 2003 18:25:58 GMT
I try to be pragmatic, but sometimes I just can’t help it and try to pick up lost causes. I think the world would be a better place if computers were easier to maintain. Fortunately, I’m a server person, and the server side of things is actually a lot easier then the desktop side, at least for now.
Warning: most of my experience is with Linux and Solaris boxes in ISP-like settings, although I’ve done a fair bit of time in small non-computer-related businesses and software houses. I have no idea how much of this applies to Windows.
I’ve been thinking about server management for years. Sometimes, I’ve been paid for it, sometimes (like now), I’m paid for other things. I still can’t stop thinking about it, though. There has to be a better way to manage servers then we’re doing now. As I mentioned yesterday, I think I might have a solution for at least a few common cases.
Traditionally, there are two models for server deployment. Either the heavyweight model (deploy a small number of servers and run lots of services on each) or the lightweight model (deploy a lot of servers, and run a small number of services on each). One of the problems is that, at least for small services, the heavyweight model seems cheaper. Why buy 10 servers that are going to sit 95% idle when you could buy 2 servers and have them be 75% idle? Or even one server that’ll only be 50% full. What happens pretty much every time is that a couple of the services start conflicting with each other somehow–one needs perl 5.6 for something, while another needs 5.8. Or they need two different versions of the JVM. Or one needs a critical security upgrade that ends up killing another service. So, you keep tweaking things, and you (barely) keep everything running, largely by avoiding making changes. Except, when you avoid making small changes, you inevitably miss little security fixes and little bug fixes, and you drift further and further from the mainline of whatever OS you’re running. So, inevitably, you reach the “server event horizon,” where things have grown so complex and unmanageable that the only thing you can do is buy 2-3 new computers to replace your one big system, and then slowly migrate services off of the old box onto the newer box. Except you end up with a lot of implicit assumptions lurking, assuming that DNS and DHCP are on the same server, or that Apache and Mysql are on the same box, and it takes forever to untangle them. Even once that’s done, you’ll find out that people have hard-coded server names into applications deployed all over the company, and you’ll end up spending 3 months untangling your one heavy-weight server that seemed like such a good way to save money at the time.
Conventional wisdom says that the way out of this problem is virtualization. Instead of buying 10 small computers, you buy one or two really big computers, and then partition them in software, and then install the software that you would have installed onto the little computers onto the partitions of the big computers. Lots of vendors love this model; IBM’s whole Linux-on-mainframes push is based on it. There are a couple problems with it as I see it, though. First, you’ll end up paying a ton of money for virtualization hardware or software–VMWare wants at least $2,500 per server for their PC-based virtualization code; pretty much everything else else is more expensive. Second, you’re still left with a bunch of small general-purpose servers that you need to manage individually, even if they do happen to physically reside within a single box. There are also reliability issues, but I’m going to ignore them for now; in my experience, even cheap PCs running Linux rarely crash, and when they do it’s usually a bad power supply, a bad hard drive, or bad RAM. Spending more money on hardware gives you multiple power supplies, better RAID, and more redundant memory. Plus buggy virtualization software, but we’ll come back to that, too.
Fortunately, the open-source world is making progress. User-mode-linux (UML) is making a lot of headway. It’s included in Linux 2.6, although it still needs a few little patches for optimum operation. It seems to have a 30% speed hit in a lot of cases; sometimes that’s a problem, sometimes it isn’t. Using it, you can build a big Linux host server, and then run a bunch of little virtualized servers on it for free. Sounds nice? Sort of–you still have to admin a ton of little general-purpose boxes, but at least you’ve mostly solved the dependancy problem that killed us a few paragraphs ago.
The nice thing about Linux is that it’s so flexible. Unlike every other OS that I’m aware of, there’s no one environment that is definitively Linux. Instead, we have a herd of Linux distributions, ranging from Red Hat to Debian to Gentoo to “Linux From Scratch” all the way down to the mini Linux distribution that wireless access point vendors call “firmware.” There’s no real reason for a special-purpose DNS server to run a full Linux distribution, except that it’s usually less work that way. However, once we have a UML-based virtualization scheme in place, it can actually be easier to use specialized distributions then general-purpose ones. I mean, the hard parts of a distribution are generally the installer, the hardware handling, and the update code. With a virtualized server, none of that applies. There is no hardware, really (it all pretends to be the same), the installer is really just a script that copies a hard disk image into place on the host system, and the update system is even easier–just save the data and completely discard the old OS image. In an ideal world, the OS image would be completely read-only, with configuration settings and data kept outside of the server in a standardized format. Then, software upgrades are truly trivial–kill off the old server VM and start up a new VM using the old data.
This won’t work for everything, of course. It’d be horrible for big database servers, or frankly big servers of any type. In my experience, though, there are a lot more small servers then there are big servers.
The other nice thing about this scheme is that the virtual server images are simple to build and easy to trade. They’re not utterly trivial, but it’s easier to build a server image then to actually write the server software or or maintain a full-sized OS distribution. Given a standardized interface between the host OS and the server image (things like IP address, DNS server, hostname, logging, and all of the other little details needed to make a server run), there’s no real reason that you can’t swap between server images from different “vendors”, grabbing whatever best serves your needs.
I’m starting to build a framework for this, more details as I have time to write them down.
Posted in Linux, LWVS | Tags management, management, server, server, servermanagement, servermanagement, sysadmin, sysadmin | 1 comment
Posted by Scott Laird
Wed, 10 Sep 2003 01:14:42 GMT
Once a sysadmin, always a sysadmin. I’m not really sure why (although I suspect the folks on alt.sysadmin.recovery would call it a character defect), but even though I’m mostly a programmer this year, visions of well-managed servers are still dancing through my head. Maybe they’ll stop someday.
Anyway, I’ve been stewing over some interesting ideas on server management recently, and I think we’ve been doing it all wrong. We’ve been trying to build strong, resilient, flexible servers that we can maintain for years, adding and removing services as needed. This is an outgrowth of the way programmers are trained to think, with maintainability prized as once of the highest virtues of software. The XP people have a slightly different take on things, but underneath it all they still prize maintainable systems. On c2.com’s wiki, the discussion of Christoper Alexander’s A Pattern Language is informative, as it shows parallels between software design and building design, concentrating on features in buildings that give them a long, useful life.
I’m not sure that we really want any of that for servers, though, or at least not for standard, cookie-cutter services like DNS. We should be aiming for disposable servers, where the only things that we care about are the configuration state, data, performance, and security. The actual system files, and even OS should be irrelevant, and even ignored if possible.
I’m working on a demonstration, along with a design for a larger-scale system that have some bizarrely nice properties. I don’t see why it wouldn’t work, frankly, and it’ll make a huge change in the amount of work needed to implement small services in networks.
Posted in Linux, Computer System Administration, LWVS | Tags sysadmin | no comments