My favorite hack

Something reminded me recently of project that I worked on around 10 years ago. I was a simple, clean solution to a problem that had remained unsolved for over a year. It was also one of the most horrible hacks that I’ve ever put into production.

At the time, I was working for a startup that was suffering from explosive growth. We’d began rolling out new sales and engineering offices all over the US faster than we could build the infrastructure that we needed to manage the offices. We built each office with two small Linux servers; one acted as a file server (with Samba) and the other did email for the office. The problem that we were facing is that each office was completely separate from the others–we had technical and political issues that made rolling out a VPN very difficult, so there was no easy way for salespeople to share documents between offices other than email. Which they did with great abandon, and it was killing our servers in larger offices.

The Great And Glorious VPN Project had tried to fix the company-wide filesharing problem repeatedly without success. There were too many requirements, too little money, and the technology wasn’t mature enough. I assume that they managed to deploy something eventually, but not for years after I left the place.

So, we needed a solution for distributed file sharing. It needed to be relatively quick, compatible with Linux and Windows computers, encrypted when passing over the Internet at large, able to deal with 100+ ms latencies between servers, and ideally not require a full mesh, because we were adding 1-2 new offices per month and having to reconfigure all of them every time would have killed us.

We played around with distributed filesystems like Coda and AFS. None of them fit our needs. We considered some sort of read-only replication with rsync copying each office’s content to other locations, but it was too messy to explain to people. We considered deploying Novell Netware (into a Linux shop) just for this, but it wouldn’t have helped enough. We would have rolled out Windows servers, but Microsoft didn’t have anything useful at the time. We looked at other commercial solutions, but nothing that we found was more than a partial solution.

Until one day, when I was sitting in a meeting about some other topic and the solution just popped into my head fully formed. It was really, really evil, and it would all Just Work. Either that, or it wouldn’t work at all, and it’d only take a few minutes to determine the answer. I think I excused myself (“I think I just solved the file sharing problem”) and walked out in the middle of the meeting to test it.

Remember, each office had 2 servers: one acting as a file server, running Samba, the other a mail server, running sendmail and an IMAP server. Here’s all it took: I configured Samba on each mail server, and had it join the same workgroup (I don’t think we were using domains yet) as the existing file server. So each Windows computer now saw two Samba file servers instead of one. The Samba file server software actually consists of two different programs: nmbd, which handles naming and network browsing, and smbd, which actually serves files. I fired up nmbd on each mail server, but instead of running smbd, I ran stunnel and had it encrypt any traffic that was headed to smbd and ship it off to a single file server in our headquarters, where it was decrypted and handed off to a real smbd process.

So, any Windows computer would browse the network and see two servers. One named something like “nfs.nyc” and the other named something like “global-nfs”. When they tried to connect to “global-nfs,” their packets were shipped off to a completely different computer in a different city. And everything just worked. It was dead simple. We’d been looking and spending hundreds of thousands of dollars to solve this problem and having to install new hardware in every office; in the end we spent a couple thousand on a single new file server and rolled it all out overnight with a minor software change.

The ugly part is that Samba shares state between nmbd and smbd, and none of that was actually being shared here. You aren’t supposed to run the two processes separately. It gets confused. Except it doesn’t actually hurt anything in this case. None of the problems actually impact anything that we cared about.

While I was at it, I used the same mechanism to standardize printing across the company, but that’s a story for another day.