statistical-spam filtering

Darrel Lawrence darrel at coma.ucsd.edu
Fri Aug 16 11:42:43 PDT 2002


This is shameless poached from today's /., but I found it pretty 
interesting.

http://www.paulgraham.com/spam.html

It describes algorithms for classifying mail as spam based
on a statistical analysis of words in the header and the body.
Down near the bottom, he suggests an add-on where messages
containing a URL get a crawler sent to them, which analyses the
URL in the same way, deciding it if was a legit link, or a spammy
page.  Very elegant.  It got me thinking though of an unintended
side-effect.  If this solution was adopted by very large numbers
of people (such as a plugin to OE), every time someone started
spamming, whatever website they were plugging would get DOS'd by
the analytical crawlers.  Couldn't it grow to the point where
spamming would immediately generate so many automated hits that it
would serve as its own deterrent.  Talk about hoisted by your own
petard :).

-Darrel

-- 
Diplomacy: The ability to tell someone to go to hell in such a way he looks 
forward to the trip.



More information about the KPLUG-List mailing list