2007-03-07

Spam Reduction (and Postscript!)

www.ps2pdf.com Postscript to Portable Document format converter

Awesome. You're looking for some academic article and it's only available in Postscript format. Just visit this site and convert the file to an easily readable PDF.

In my case, I had a genius idea of computing an somewhat expensive operation (say, something that takes about a second to compute) for every piece of mail that is sent; the sender computes it, based on some hash of the sender, receiver, and date/time, and then sticks the result in an X-Header for the email message. The receiver does the same thing, and if the results match, then the email is worth looking at.

The idea is that it is so cheap to send a million emails that if we changed the system so that it took one second to process each email message, then a bulk emailing would take one million computer-seconds to send. So even if you put 1,000 computers to work on it, it would take 1,000 seconds (or 16 and 1/2 minutes) to send those million messages. Of course, it's easy to scale something like this; just adding a few bits to finding all the prime factors of the hash can have a pretty dramatic effect on how long it takes to compute the function. If we make it five seconds, or 10 seconds, then it would take 80 minutes, or 160 minutes to send all that junk email, even using 1,000 zombie computers to do the job.

The receiver doesn't care about the cost - even a busy human like me receives maybe a hundred real emails a day (on a bad day) and the extra time taken to compute the validation function is a very small price to pay for elimination or at least vastly reducing spam.

Real commercial senders, of course, would have to pay the "processing price" for their bulk mailings, which one would hope, would cause them to consider carefully the contents of their bulk e-mailings.

Well, just as I was congratulating myself on how great this idea was (I had even found a freely downloadable prime factoring program and was looking at how to integrate into the open source project Spay Bayes, which has all of the hooks for working within Outlook), I ran across an article in Communications of the ACM that referred to an article from 1992 which described basically the same system.

The article is Pricing Via Processing or Combating Junk Email by Cynthia Dwork and Moni Naor. The link is to a draft version which is posted for some reason at Microsoft Research. The only official versions of Combatting Junk I could find were in Postscript format. I downloaded GhostScript but it's a pain (command line driven) to use. The Unix versions claim to have a script, PS2PDF, but the version I grabbed from SourceForge.net was missing it (at least the Win32 version was missing it). A web search brought me to PS2PDF.com which solved my problem.

I still might undertake this project, even though it is not an original idea, because of the excellent framework provided by Spam Bayes for hooking into Outlook. (My approach is simpler and better - there is no need for a central authority.) A simple to use and install program would make a huge difference in how something like this could be adopted. Also, once the framework is in place, it would be easy to experiment with different costing functions.

Cool, huh?

(After reviewing more work by Naor I have to say I like my idea better. Naor's idea is based on making it difficult to sign a message but easy to verify the signature. She has another mechanism that depends on memory access. My scheme is simpler - I think it should be equally hard to sign and to verify the message. The idea is that receivers have tons of computer time whereas senders will be limited in direct proportion to the amount of spam they want to send.)

No comments:

Post a Comment