Time to retrain SpamProbe

Posted by Scott Laird Wed, 20 Jul 2005 01:36:07 GMT

I’ve been using SpamProbe for almost two years, and it’s done a great job of filtering my spam. Unfortunately, it’s become a resource pig in the process. My spam database has grown to over 500 MB, and iostat -x suggests that SpamProbe was keeping my disk busy almost 80% of the time for minutes at a stretch. It wasn’t uncommon for messages to sit in the queue for up to 10 minutes, delayed by spam checking.

I finally decided that this is too much, so I’m re-training SpamProbe using its new hash database format. Instead of saving the text from each Bayes entry, it simply saves a 32-bit hash of the spam text. It costs a little bit of accuracy, but it’s supposed to be a huge speed win. Unfortunately this will require over an hour of CPU and disk time to reprocess thousands of messages, but it should be worth it.

Posted in  | Tags , ,  | no comments

CommuniGate Pro does VoIP

Posted by Scott Laird Sat, 07 Aug 2004 01:38:05 GMT

I’m planning on doing more research on this in a while, but I should mention it now: the latest release of Stalker Software’s CommuniGate Pro email software includes basic VoIP support.

CommuniGate Pro (CGP) is kind of fascinating to me. At it’s heart, it’s just commercial email software. It does SMTP, POP, IMAP, LDAP, and web mail, all of which you can do for free with open-source software. However, if you’re a small business or ISP, and email means anything to you, and you aren’t tied to Exchange, you owe it to yourself to take a serious look at CGP. It’s fast, it’s reliable, it’s completely standards-based, it’s trivial to configure, and it’s cheap. It starts at $500 for 50 users and drops off quickly. For $2,000, you can get a 1,000 user license. Now ask yourself, how long would it take to set up a 1,000-user POP/Webmail/SMTP mail server? How much support time will it take?

I’m starting to sound like an ad. I’ll try to stop.

They also do clustered mail servers, but their previously-reasonable prices suddenly jump well into the 6-figure price range. This isn’t the way to go if you’re looking for SPOF-free corporate email for cheap.

Their more recent releases have added some Exchange-like functionality–they support MAPI- and web-based calendaring with an Outlook plugin (for an additional cost), and they’ll provide spam and virus filtering for a price.

The thing that’s always fascinated me about these guys is that they seem to be a dinky, 5-10 person outfit, but they’re able to keep adding features faster then anyone else on the market, and do it without turning their software into a complete pig. At Internap, we were amazed to discover that their basic server with SMTP, HTTP, POP, IMAP, LDAP, SSL for everything, decent logging, a web UI for configuration and for email, and a mailing list manager all fit into under 2 MB of RAM. Once it got running, with hundreds of busy users, it grew to need 15 MB or so, but that was about it. I think we only managed to crash it once or twice in two years, and that’s under a murderous load–I think I was averaging over 2,000 email messages/day for part of that, and I was rarely the busiest user. We had way more problems getting Linux to keep up with the server’s I/O load, but that’s a whole different issue–we were saturating 2 external RAID arrays for almost the entire day every day.

Anyway, the latest release (4.2) adds SIP and RADIUS to their list of supported protocols. It isn’t really intended for serious PBX-replacing VoIP, but rather for IM and voice messaging. Since Windows XP includes SIP IM software, this seems like a useful addition to CGP. It’ll do VoIP as well, but it’s based on email addresses, not phone numbers, so it’ll be hard to get SIP phones to interoperate with it (although not impossible–most of them will let you dial names, but it’s hard to enter them with a phone keypad).

Personally, I’m going to keep my eye on them over the next year or two–they aren’t very far from turning CGP into a cheap all-in-one solution for small-business communications. All they need is a dialing plan, voicemail, support for external SIP-to-PSTN devices, and maybe faxing.

One quick disclaimer–it’s been a couple years since I last used their software. I’m not a sysadmin at my present job, and I have nothing to do with out email environment. And, I’m not willing to pay $500 for my home email server, although I was tempted back in the .com days.

Posted in ,  | Tags , , , , ,  | 2 comments

Zoe

Posted by Scott Laird Thu, 06 May 2004 17:59:00 GMT

I’ve looked at Zoe once or twice in the past, but it never quite grabbed enough of my interest for me to bother installing it. If you aren’t familiar with Zoe, it’s a Java-based email search proxy thing that they’ve never really been able to explain on their website. Yesterday I was searching for more information on Near-Time Flow, and came across a blog entry by Tom Malaher titled “Google your Email”:

Who needs GMail? You’ve got your own CPU and Disk space, use it. ZOE lets you read and search your email (with Lucene), without supplying helpful related advertising. Not to mention that it also has a very cool non-linear email access metaphor. Forget Inbox/Sent Mail/…customFolders.. you just browse.

Ah, finally–someone explains the point of Zoe. It’s basically a personal email search engine. Once I got that, I grabbed a copy and tried it out. It’s trivial to install–just extract the files from the archive and double-click on Zoe.jar. Zoe runs its own web server on port 10080, and automatically fires up your favorite browser when it starts. The web interface is intuitive and reasonable attractive, and it’s easy to add new POP or IMAP accounts and have Zoe import mail from them. While it’s possible to use Zoe as a web-based mail reader, it’s not really very good at that–it doesn’t do folders at all, and I can’t figure out how to get it to do threads, but that’s not really a problem, because it’s not supposed to be used for normal mail reading: it’s a search engine, not a mail reader.

I probably have around 100,000 messages sitting in assorted IMAP mail boxes in various places, and Zoe is the first program that I’ve found that is actually usable for searching them. OS X’s Mail program isn’t very good at searching huge volumes of mail, particularly when most of it lives on IMAP servers.

The big problem with Zoe is its resource needs–it’s written in Java, and wants at least 70 MB of RAM when it’s running on my laptop, plus a few hundred MB of disk space. I just don’t have enough free RAM on my laptop to add another 70+ MB program, so I’m going to try running it on one of my Linux servers at home and see how that goes.

A couple points about Zoe: while its UI is predictable and easy to use, its documentation is nearly non-existant. Like Asterisk, you’re stuck using Google to search mailing lists and third-party wikis to find details. Zoe really needs a more detailed configuration interface. As it is, a lot of less-common features need to be controlled by editing Java property files.

It’s an easy install, though, and it’s very usable right out of the box, so I’d recommend installing it and checking it out.

Posted in  | Tags , , ,