Posted by Scott Laird
Fri, 21 Jul 2006 17:03:12 GMT
I just committed the last big chunk of code for Typo 4.0.0: improved spam filtering. We now know about Akismet, and will use them for spam filtering if you enter an Akismet key on Typo’s setttings page. Incoming comments (and trackbacks) that fail the spam check aren’t published unless the blog owner approves them via the handy new ‘feedback’ page.
I really want to get Typo 4.0.0 out before Monday, so please pound on this and let me know how it goes.
Tags spam, typo | 2 comments
Posted by Scott Laird
Wed, 20 Jul 2005 01:36:07 GMT
I’ve been using SpamProbe for almost two years, and it’s done a great job of filtering my spam. Unfortunately, it’s become a resource pig in the process. My spam database has grown to over 500 MB, and iostat -x suggests that SpamProbe was keeping my disk busy almost 80% of the time for minutes at a stretch. It wasn’t uncommon for messages to sit in the queue for up to 10 minutes, delayed by spam checking.
I finally decided that this is too much, so I’m re-training SpamProbe using its new hash database format. Instead of saving the text from each Bayes entry, it simply saves a 32-bit hash of the spam text. It costs a little bit of accuracy, but it’s supposed to be a huge speed win. Unfortunately this will require over an hour of CPU and disk time to reprocess thousands of messages, but it should be worth it.
Posted in Spam | Tags email, spam, spamprobe | no comments
Posted by Scott Laird
Mon, 14 Mar 2005 21:49:50 GMT
A few minutes ago, someone dropped a comment onto a recent post:
Sony/Apple Merger. Hollywood is buzzing about it today.
www.29hdnetwork.com
Anyone hear anything about it?
Since the comment was actually more or less on-topic, I didn’t immediately delete it as spam. I posted my usual reply to Apple mega-merger rumors–Not Likely. Then I went and read the article at the site listed, and didn’t see anything particularly interesting. It’s just idle speculation.
A few minutes after that I was reading a similar post at MacSlash and noticed a very similar comment:
Apple/Sony Merger
Hollywood is buzzing about a pending merger with Sony and Apple today.
http://www.29hdnetwork.com
So, is this just semi-targeted blog spam, or is it some sort of weird astroturf campaign for 29hdnetwork.com?
Posted in Blog stuff | Tags 29hdnetwork, apple, rumors, sony, spam | no comments
Posted by Scott Laird
Sat, 12 Feb 2005 15:58:27 GMT
I received my first three pieces of trackback spam overnight. By blog receives tons of comment spam, most of which are blocked by MT-Blacklist, but I haven’t seen any trackback spam before, even though I’ve been expecting it for over a year. Since trackback is designed to be automated, it seems like it’d be easier to abuse then comment spam. Fortunately, a few relatively simple steps should help stop trackback spam. A quick verification step would probably stop most of it–if you hand me a trackback URL, and I go fetch it, is there a link to my page anywhere on the trackback page?
One quick-and-easy suggestion for stopping comment spam–add </a> to your blacklist. MT 2.6 allows HTML in comments, even though it’s not supposed to, and comment spammers always try to stick links in there. Since my comment page isn’t supposed to allow HTML anyway, this does a great job of blocking spam.
Posted in Blog stuff | Tags blog, movabletype, spam, trackback | no comments
Posted by Scott Laird
Fri, 17 Dec 2004 18:52:52 GMT
I am swimming in spam. Every where I go, every direction I look, every medium I deal with, I am being spammed. Spam in my email box I can handle–my spam filter manages that well enough that I can ignore the problem. It’s all of the other spam that is driving me insane.
Let’s start with blog spam. I run Movable Type, and I have a PageRank of 6 or so, so like everyone else with a good PageRank, I’m being bombarded with blog comment spam. It’s not uncommon to wake up in the morning and find that 100 ads for Viagra or online poker or something less savory have managed to make it through my filters and pollute my blog. From looking at my logs, I’ve been had over 7,000 comments posted on this blog, with only 150 or so being legitimate, and around 6,000 blocked by MT-Blacklist.
Then there’s phone spam–the Do Not Call list has actually worked pretty well for my home phone number, but I’ve been besieged by calls from (905)-482-1663 for the past couple weeks. I assume that they’re a telemarketer, but I’ve never been able to figure out what they want–even when I’ve picked the phone up on the first ring, they just hang up on me. Google suggests that that number has done work for Bank of America and the Kerry campaign and pissed off a number of other people; it’s not just me. After a week of this, I had Asterisk blacklist them, so I don’t have to listen to them hang up on me 2 or 3 times per day. Yesterday, they escalated–they called my cell phone 3 times last night. I sent a Do Not Call list complaint today, but I doubt it’ll take. I’d probably be better off using one of the other laws on the books regarding telemarketing calls to cell phones or percentages of hangups, but it’s probably not worth the hassle.
My work phone isn’t immune, either–I’ve been getting 2 or 3 calls per week from random business magazines, wanting to give me free subscriptions or renewals. Frankly, I receive so many magazines that I can’t keep track of which ones I’m already getting–95% of them go straight into the recycling bin without ever being opened. I really don’t want more–my mailbox is too full as it is. Last week, I got two calls from Information Week and had to hang up on them–they wouldn’t take “no” for an answer. The week before, it was a call, a fax, and two emails from Network World. This morning, it was eWeek.
Thinking about all of these–the blog spam, the telemarketer spam, and the magazine renewal spam–the common thread is that none of them are actually trying to sell me anything. The blog spam is trying to increase their own PageRank. The magazine spam is trying to increase their circulation size and advertising rates. The telemarketer might be trying to sell me something, but since they refuse to actually talk to me, I can’t really tell. Largely, they’re all bothering me because they can sell something that I have (eyeballs, highly ranked blog) to others, and they don’t care that they’re wasting my time and money in the process.
Posted in Blog stuff, Personal | Tags asterisk, blog, google, spam, telemarketer | 165 comments
Posted by Scott Laird
Tue, 30 Nov 2004 23:50:16 GMT
I don’t know if it’s just me or if everyone is seeing this, but the amount of blog comment spam that I receive has exploded lately.

So far this month, MT-Blacklist has blocked over 2,400 comment spam attempts. That doesn’t count the number that it missed–that has to be at least 200 more, including *12* so far while I’ve been writing this message. The latest couple batches don’t even seem to be obvious spam–they include a fake email address and some text, but no web pages, either in the URL field of the comment or in the body, and the text is generic. If I wasn’t receiving a few dozen per hour from different IP addresses with the same basic text, I’d assume it was just a deranged poster or two. As it is, I can only assume that it’s an attempt to pollute a Bayes table with bogus text, except MT-Blacklist doesn’t use Bayes–it’s just keyword matching.
At the present rate, I think I’m actually seeing more comment spam attempts then legitimate page views some days. I think I’m getting more blog spam then email spam, too, but it’s a close race.
I swear, I need to move off of MovableType 2 one of these days, but the last time I tried, I just couldn’t find anything that I was willing to spend the effort on. Drupal is nifty, but it’s not really what I’m looking for. MT 3.1 would probably work, but it’s not exactly what I want, either. I keep waiting for one of the Rails-based blog systems to become usable, but I don’t think we’re quite there yet.
Posted in Blog stuff | Tags blog, movabletype, spam | 1 comment
Posted by Scott Laird
Tue, 29 Jun 2004 20:03:27 GMT
Today’s entry in the continuing saga of bad technology reporting comes from CNET:
Cable giant Comcast on Thursday said the volume of spam originating from its network has dropped 35 percent since it blocked an e-mail loophole weeks ago.
The new data comes after Comcast, the nation’s largest broadband service, earlier this month began blocking a gateway that spammers commonly use to send mass volumes of unsolicited e-mail. Called “port 25,” the gateway lets PCs send and receive e-mail based on SMTP (Simple Mail Transfer Protocol), the most common technology for exchanging messages.
So port 25 is a gateway now, is it?
Posted in Computer Networking | Tags comcast, spam | 1 comment
Posted by Scott Laird
Thu, 24 Jun 2004 19:34:01 GMT
I’ve received a handful of email messages recently that aren’t exactly normal spam. This includes messages with no body and very few headers, and messages composed of slightly random text with no real attempt to sell anything. Here’s an example:
Hello, handsome!
No one ever lost his honor, except he who had it not.
Difficult times always create opportunities for you to experience more love in your life.A man who exposes himself when he is intoxicated, has not the art of getting drunk.
necessarian unseldom pronationalistic malthas scarfing
I tend to play mostly villains and twisted people. Unsavory guys. I think it’s my face, the way I look.
Ain’t no man can avoid being born average, but there ain’t no man got to be common.
I can only assume that both of these types of messages are an attempt to screw up Bayesian filtering tables by sneaking borderline words into your pool of non-spam (“ham”) email messages. The idea is that spam filters won’t find anything objectionable in the message, so it won’t mark it as spam, and users will just delete the message without using it to train their filters. I’m not convinced that it’ll work, but it’s a nice try.
Posted in Spam | Tags bayes, spam | no comments
Posted by Scott Laird
Tue, 09 Mar 2004 22:50:39 GMT
My spam is missing.
I used to receive over 100 spams per day, but that was mostly due to forwarding from a former employer. Once the forwarding stopped, I was still receiving around 20 spams per day. Recently, though, they’ve all but stopped. I only received 5 spam messages yesterday.
It’s not a filter issue–spam only rarely makes it through the filter gauntlet into my inbox. For some reason, spammers aren’t sending me as much spam as usual. My usual spam load isn’t very diverse, it’s possible that only one or two spammers make up the bulk of my spam. Maybe they’re on vacation. Maybe they removed me from their lists for some reason (ha, right). Maybe the CAN-SPAM law scared them straight (ha, right). Maybe whoever’s been paying them for spam stopped (ha).
Or maybe I’ll get 50 tomorrow, just to even out the average.
Posted in Spam | Tags spam | no comments
Posted by Scott Laird
Fri, 30 Jan 2004 02:41:25 GMT
They’re back again, 100+ blog spams for some casino. Rather then delete them automatically, I added the mt-blacklist plugin. It includes the ability to bulk-delete comments based on IP address.
Interestingly enough, last week’s anti-blog-spam measures didn’t really help–the spammer followed the comment form right to the new, renamed comment CGI. So, it looks like we’re headed for a real spam arms race. Bastards.
Posted in Blog stuff | Tags blog, spam | no comments
Posted by Scott Laird
Sat, 24 Jan 2004 03:22:58 GMT
Overnight, I was hit with 108 comment spams for Xenical from 66.36.249.149. Very irritating, especially since MT doesn’t have a good way to delete bulk spam. This spammer was kind of interesting–it looks like he was actually following the HTML from my archive pages, rather then blindly attacking /mt/mt-comments.cgi. That means that simply renaming the comment CGI probably wouldn’t have stopped this attack.
Here’s a chunk of the access log for those who are interested:
66.36.249.149 - - [23/Jan/2004:03:42:54 -0800] "GET /scott/archives/000001.html HTTP/1.0" 200 5368 "-" "http://@nonymouse.com/ (Unix)" 0 scottstuff.net
66.36.249.149 - - [23/Jan/2004:03:43:02 -0800] "POST /mt/mt-comments.cgi HTTP/1.0" 200 59 "-" "http://@nonymouse.com/ (Unix)" 3 scottstuff.net
66.36.249.149 - - [23/Jan/2004:03:43:05 -0800] "GET /scott/archives/000002.html HTTP/1.0" 404 220 "-" "http://@nonymouse.com/ (Unix)" 0 scottstuff.net
66.36.249.149 - - [23/Jan/2004:03:43:13 -0800] "GET /scott/archives/000003.html HTTP/1.0" 200 8678 "-" "http://@nonymouse.com/ (Unix)" 0 scottstuff.net
66.36.249.149 - - [23/Jan/2004:03:43:17 -0800] "POST /mt/mt-comments.cgi HTTP/1.0" 200 59 "-" "http://@nonymouse.com/ (Unix)" 0 scottstuff.net
Google suggests that ‘@nonymouse.com’ is an anonymizer, so the spammer was actually abusing two services, not just mine. Which also means that the IP address given isn’t very useful.
I’m not sure how best to handle this sort of thing in the future–I’ll try renaming mt-comments.cgi to something less obvious, and probably javascript-ify the comment link on my pages. That’s rude to the poor users without javascript enabled in their browser, but I don’t want to spend hours deleting spam again.
Longer-term, it’d be nice if MT added moderated comments, and a way to automatically change the open/moderated/closed status of entries after a set period of time. That way, new posts could have open comments, and then be auto-moderated after a week or two. That seems like a decent compromise to me, and it’s orthogonal to most of the other anti-blog-spam suggestions that I’ve seen.
Bastards.
Comments closed: bizarrely enough, this post gets more comment spam then any other page on my blog (and nearly more then all other pages), so I’ve closed comments.
Posted in Blog stuff | Tags blog, spam | 4 comments
Posted by Scott Laird
Fri, 23 Jan 2004 09:54:20 GMT
It’s been a weird day, traffic-wise: MSN has decided that my simple mention of Paris Hilton referrer spam is good enough to make me their search engine love me–I’m number 18 on their list of sites when searching for “Paris Hilton Video.” I’ve had at least 55 different users today arrive from MSN’s search engine.
Update: Gack, Google’s at it now, too. I just got a hit from “paris hilton jpegs.” Except this one was from someone with too much time on their hands–I’m apparently on the 44th page of listings on Google.
Update 2: I’m now up to number 9 on MSN’s site. I’ll probably break 150 paris hilton hits today. It’s not a lot of traffic, but it’s just so bizarre that they’ve decided to send it my way.
Posted in Blog stuff | Tags blog, searchengineweirdness, spam | 3 comments
Posted by Scott Laird
Thu, 15 Jan 2004 01:08:46 GMT
It seems like there’s always something interesting lurking in web access logs, but actually finding the interesting bit is a pain in the neck. For instance, over the past day or so, I discovered that I was briefly the #1 Google listing for “andy serkis seattle” and that Lockergnome included me on their list of 2004 PDA predictions. I didn’t see that coming. Cool. My MPx200 notes have generated a bit more traffic then usual, too, and they’re only a day or so old. Searches for the Sony/Ericsson CAR-100 have finally slacked off; google was sending me piles of CAR-100 traffic for a while.
I’m seeing a bit of referrer spam, too–mostly for paris-hilton-video.blogspot.com. Either that, or they’ve linked to me somewhere that I can’t see, and that link has generated a dozen hits over the last month, all from different IP addresses in different countries.
The thing is, I spotted all of these trends manually, by running tail -f on the log files, and then grepping for interesting strings. None of the web log analyzers seems quite appropriate for blog traffic. And, interestingly enough, searching for “web log analyzer” in google hits way too many (web logs) on (analysis). If anyone has any suggestions or recommendations, feel free to leave a comment.
Posted in Blog stuff | Tags blog, log, mpx200, spam, statistics | no comments
Posted by Scott Laird
Sun, 04 Jan 2004 13:31:34 GMT
Ugh, I got hit with 3 comment spams from 66.80.241.23, all for Bulgaria-based pharmacy sites. I’ll move the comment CGI around tomorrow; that’s supposed to be an easy fix for 95% of the comment spammers.
Posted in Blog stuff | Tags blog, spam | no comments