Current oav website
This commit is contained in:
416
mirrors/image_spam.html
Normal file
416
mirrors/image_spam.html
Normal file
@ -0,0 +1,416 @@
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=iso-8859-1">
|
||||
<title>Fighting image spam on our Debian spamfilter with FuzzyOcr and ImageInfo</title>
|
||||
<style type="text/css">
|
||||
.cl2 {white-space: nowrap;}
|
||||
body {font-family:Arial, Helvetica, sans-serif, "MS sans serif";
|
||||
font-size: 12px;
|
||||
color:black;}
|
||||
a { color: #ff0000}
|
||||
a:hover {text-decoration: none; color: #000000}
|
||||
pre {
|
||||
font-family: monospace;
|
||||
font-size:14px;}
|
||||
td {
|
||||
font-family:Arial, Helvetica, sans-serif, "MS sans serif";
|
||||
font-size:14px;}
|
||||
i {
|
||||
font-size:16px;
|
||||
font-family : "Times New Roman", Times, serif;}
|
||||
code {
|
||||
font-family: monospace;
|
||||
font-size:14px;}
|
||||
H3
|
||||
{font-family:Arial, Helvetica, sans-serif, "MS sans serif"; font-size: 16px;
|
||||
MARGIN-TOP: 3px;
|
||||
MARGIN-BOTTOM: 3px;}
|
||||
H4
|
||||
{font-family:Arial, Helvetica, sans-serif, "MS sans serif"; font-size: 16px;
|
||||
MARGIN-TOP: 3px;
|
||||
MARGIN-BOTTOM: 3px;}
|
||||
</style>
|
||||
</head>
|
||||
<body lang="en-US">
|
||||
<h3>Fighting image spam on our Debian spamfilter with FuzzyOcr and ImageInfo plugins</h3>
|
||||
<br>
|
||||
Absolutely no warranty, see the disclaimer at
|
||||
<a href="http://www200.pair.com/mecham/spam/" target="_new10">
|
||||
http://www200.pair.com/mecham/spam/</a>
|
||||
<br><br>
|
||||
Thanks to Robert LeBlanc and this
|
||||
<a href="https://secure.renaissoft.com/maia/wiki/FuzzyOCR23" target="_new20">
|
||||
excellent guide</a>.
|
||||
<br><br>
|
||||
|
||||
<table cellpadding="4" border="1">
|
||||
<tr>
|
||||
<td>
|
||||
<i>Image spam is rather difficult to deal with. FuzzyOcr has helped me do so.
|
||||
Of course, running an OCR scanner on images contained in messages will slow
|
||||
down the spam scanning process (considerably in some cases), but for me this is not a problem, just
|
||||
be aware of it. I document my installation of FuzzyOcr here.
|
||||
I will assume you have at least testing and stable sources listed
|
||||
in /etc/apt/sources.list, and 'stable' has top priority in
|
||||
<a href="http://jaqque.sbih.org/kplug/apt-pinning.html" target="_new091">/etc/apt/preferences</a>
|
||||
and you are able to compile programs from source (but of course there is
|
||||
no reason you could not install this stuff on an Etch or Sid system).
|
||||
You need to be at <b>SpamAssassin version 3.1.1 or greater</b> to use these plugins.
|
||||
Consider installing SpamAssassin from
|
||||
<a href="http://www.backports.org/dokuwiki/doku.php?id=instructions" target="_new109">sarge-backports</a>
|
||||
if you are trying to keep your system 'stable'. You can possibly install
|
||||
SpamAssassin from 'testing' without upgrading to the testing versions
|
||||
of libc6 or Perl or the Kernel by
|
||||
using the form 'apt-get install spamassassin/testing'. Using this form should
|
||||
install dependencies from 'stable'. Simulate it first with 'apt-get -s install [...]'.
|
||||
In order to prevent accidental upgrades when running 'apt-get upgrade'
|
||||
you might also consider putting the package on hold if you use this method:<br>
|
||||
echo "spamassassin hold" | dpkg --set-selections
|
||||
|
||||
<br><br>
|
||||
One of the programs
|
||||
we need is not available in the stable release, so we will install from testing.
|
||||
First test that installing libimage-exiftool-perl will not install libc6 or related
|
||||
programs:</i><br>
|
||||
<code>
|
||||
apt-get update<br>
|
||||
apt-get -s install libimage-exiftool-perl/testing
|
||||
</code>
|
||||
<br><br>
|
||||
<i>If libimage-exiftool-perl is the only thing that will be installed, then install it.
|
||||
If it is not, and you are trying to keep your system stable, then contact me before you continue:</i>
|
||||
<br>
|
||||
<code>
|
||||
apt-get install libimage-exiftool-perl/testing
|
||||
<br><br>
|
||||
</code>
|
||||
<i>See what version you have:</i><br>
|
||||
<code>dpkg -l libimage-exiftool-perl
|
||||
</code><br><br>
|
||||
|
||||
<i>If your version is less than 6.36-1, patch ExifTool:</i><br>
|
||||
<code>
|
||||
cd /usr/share/perl5/Image/ExifTool/<br>
|
||||
wget http://antispam.imp.ch/patches/patch-GIF-Colortable<br>
|
||||
patch -b GIF.pm < patch-GIF-Colortable
|
||||
<br><br>
|
||||
</code>
|
||||
|
||||
<i>To prevent future accidental upgrades during 'apt-get upgrade', place
|
||||
the libimage-exiftool-perl package on hold:</i><br>
|
||||
<code>
|
||||
echo "libimage-exiftool-perl hold" | dpkg --set-selections
|
||||
</code>
|
||||
<br><br>
|
||||
|
||||
<i>You should not have both giflib-bin and libungif installed. Simulate removing giflib-bin:</i><br>
|
||||
<code>apt-get -s remove giflib-bin</code>
|
||||
<br><br>
|
||||
<i>If it's not installed, then you can move on. If it's the only thing that will
|
||||
be removed, then remove it:</i><br>
|
||||
<code>apt-get remove giflib-bin</code>
|
||||
<br><br>
|
||||
|
||||
<i>Download, extract, patch, compile and install libungif:</i><br>
|
||||
<code>
|
||||
cd /usr/local/src<br>
|
||||
wget http://internap.dl.sourceforge.net/sourceforge/libungif/libungif-4.1.4.tar.gz<br>
|
||||
tar xzvf libungif-4.1.4.tar.gz<br>
|
||||
cd libungif-4.1.4/util<br>
|
||||
wget http://users.own-hero.net/~decoder/fuzzyocr/giftext-segfault.patch<br>
|
||||
patch giftext.c < giftext-segfault.patch<br>
|
||||
cd ..<br>
|
||||
./configure --prefix=/usr && make && make install
|
||||
<br><br>
|
||||
</code>
|
||||
|
||||
<i>Continue to install other required programs:</i>
|
||||
<br>
|
||||
<code>
|
||||
apt-get install libnetpbm10-dev netpbm giflib3g-dev libimage-exif-perl libstring-approx-perl<br>
|
||||
apt-get install imagemagick libjpeg-progs
|
||||
<br><br>
|
||||
|
||||
<i>Download, extract, patch, compile and install gocr:</i><br>
|
||||
<code>
|
||||
cd /usr/local/src<br>
|
||||
wget http://www-e.uni-magdeburg.de/jschulen/ocr/gocr-0.40.tar.gz<br>
|
||||
tar xzvf gocr-0.40.tar.gz<br>
|
||||
cd gocr-0.40/src<br>
|
||||
wget http://antispam.imp.ch/patches/patch-gocr-segfault<br>
|
||||
patch pgm2asc.c < patch-gocr-segfault<br>
|
||||
cd ..<br>
|
||||
./configure --prefix=/usr && make && make install
|
||||
<br><br>
|
||||
</code>
|
||||
|
||||
<i>Grab an image from me and run a test:</i><br>
|
||||
<code></code>
|
||||
cd<br>
|
||||
wget http://www200.pair.com/mecham/spam/image001.gif<br>
|
||||
giftopnm image001.gif > image001.pnm<br>
|
||||
gocr image001.pnm
|
||||
<br><br>
|
||||
</code>
|
||||
|
||||
<i>The beginning of the output should look something like this:</i><pre>
|
||||
' AnENTlON ALL DAY TRADERS AND INVESTORS '</pre>
|
||||
|
||||
<i>Run another test. The result should be roughly the same, but this
|
||||
time you should not get error messages from giftopnm:</i><br>
|
||||
<code>
|
||||
giffix image001.gif > image001.fixed<br>
|
||||
giftopnm image001.fixed > image001.pnm<br>
|
||||
gocr image001.pnm
|
||||
<br><br>
|
||||
</code>
|
||||
|
||||
<i>Visit <a href="http://users.own-hero.net/~decoder/fuzzyocr/" target="_new1">
|
||||
http://users.own-hero.net/~decoder/fuzzyocr/</a> and see what the latest version
|
||||
of FuzzyOcr is (this document is based on 2.3b dated 29-Aug-2006), then modify the lines below if needed. If you install a different version
|
||||
than what I have listed below, then the instructions could differ considerably.
|
||||
Begin by locating the /Plugin/ directory used by SpamAssassin:</i><br>
|
||||
<code>
|
||||
updatedb<br>
|
||||
locate /SpamAssassin/Plugin
|
||||
</code>
|
||||
<br><br>
|
||||
<i>If you installed SpamAssassin using apt-get, the /Plugin directory should be
|
||||
/usr/share/perl5/Mail/SpamAssassin/Plugin. If yours is different, you will need to modify
|
||||
the commands below.</i><br>
|
||||
<code>
|
||||
cd /usr/local/src/<br>
|
||||
wget http://users.own-hero.net/~decoder/fuzzyocr/fuzzyocr-2.3b.tar.gz<br>
|
||||
tar xzvf fuzzyocr-2.3b.tar.gz<br>
|
||||
cd FuzzyOcr-2.3b
|
||||
<br><br>
|
||||
</code>
|
||||
|
||||
<i>We will use a new patch Robert LeBlanc created for this particular version of FuzzyOcr.</i><br>
|
||||
<code>
|
||||
wget http://www200.pair.com/mecham/spam/fuzzyocr-23b-hashdb-poison.patch<br>
|
||||
patch FuzzyOcr.pm < fuzzyocr-23b-hashdb-poison.patch
|
||||
<br><br>
|
||||
</code>
|
||||
|
||||
<i>Then place the files:</i><br>
|
||||
<code>
|
||||
cp FuzzyOcr.pm /usr/share/perl5/Mail/SpamAssassin/Plugin/<br>
|
||||
cp FuzzyOcr.cf /etc/spamassassin/<br>
|
||||
cp FuzzyOcr.words.sample /etc/spamassassin/FuzzyOcr.words
|
||||
<br><br>
|
||||
</code>
|
||||
|
||||
<i>Edit v310.pre and add the plugin:</i><br>
|
||||
<code>
|
||||
vi /etc/spamassassin/v310.pre
|
||||
</code>
|
||||
<br><br>
|
||||
<i>and insert (at the bottom):</i><br>
|
||||
<code>
|
||||
loadplugin FuzzyOcr /usr/share/perl5/Mail/SpamAssassin/Plugin/FuzzyOcr.pm
|
||||
<br><br>
|
||||
</code>
|
||||
|
||||
<i>Configure FuzzyOcr.cf:</i><br>
|
||||
<code>
|
||||
vi /etc/spamassassin/FuzzyOcr.cf
|
||||
<br><br>
|
||||
</code>
|
||||
<i>comment out the first line (the one that loads the plugin):</i><br>
|
||||
#loadplugin FuzzyOcr FuzzyOcr.pm<br>
|
||||
<i>If (and only if) you are using a version of SpamAssassin less than 3.1.4, uncomment this line and set the value to 1.0:</i><br>
|
||||
focr_pre314 1.0<br>
|
||||
<i>Set focr_base_score to 2 (this is my personal choice):</i><br>
|
||||
focr_base_score 2<br>
|
||||
<i>Only while we test, set focr_autodisable_score to 50:</i><br>
|
||||
focr_autodisable_score 50<br><br>
|
||||
|
||||
<i>Save and exit the file, then we test. Start by linting spamassassin:</i><br>
|
||||
<code>
|
||||
spamassassin --lint
|
||||
</code>
|
||||
<br><br>
|
||||
|
||||
<i>Once you have resolved any (serious) lint errors, we do some more testing.
|
||||
This assumes you are still in the /usr/local/src/fuzzyocr-2.3b directory:</i>
|
||||
<br>
|
||||
<code>
|
||||
cd samples<br>
|
||||
spamassassin -t < animated-gif.eml
|
||||
</code>
|
||||
<br><br>
|
||||
<i>I got:</i><pre>
|
||||
19 FUZZY_OCR BODY: Mail contains an image with common spam text inside
|
||||
Words found:
|
||||
"alert" in 4 lines
|
||||
"charts" in 1 lines
|
||||
"symbol" in 1 lines
|
||||
"alert" in 4 lines
|
||||
"stock" in 2 lines
|
||||
"company" in 3 lines
|
||||
"trade" in 1 lines
|
||||
"xanax" in 1 lines
|
||||
"meridia" in 1 lines
|
||||
"growth" in 1 lines
|
||||
(19 word occurrences found)
|
||||
</pre><i>If you did not get something similar, check the log
|
||||
for the last error message (if any).<br>
|
||||
For example, on a low powered machine you may have to increase focr_timeout in /etc/spamassassin/FuzzyOcr.cf:</i><br>
|
||||
<code>
|
||||
cat /etc/spamassassin/FuzzyOcr.log
|
||||
</code>
|
||||
<br><br>
|
||||
<i>Ideally, FuzzyOcr.log will not exist. Continue on to the next test:</i><br>
|
||||
<code>
|
||||
spamassassin -t < corrupted-gif.eml
|
||||
</code>
|
||||
<br><br>
|
||||
<i>I got:</i><pre>
|
||||
1.5 FUZZY_OCR_WRONG_CTYPE BODY: Mail contains an image with wrong
|
||||
content-type set
|
||||
Image has format "GIF" but content-type is
|
||||
"image/jpeg"
|
||||
2.5 FUZZY_OCR_CORRUPT_IMG BODY: Mail contains a corrupted image
|
||||
Corrupt image: GIF-LIB error: Image is
|
||||
defective, decoding aborted.
|
||||
10 FUZZY_OCR BODY: Mail contains an image with common spam text inside
|
||||
Words found:
|
||||
"alert" in 1 lines
|
||||
"alert" in 1 lines
|
||||
"stock" in 2 lines
|
||||
"investor" in 1 lines
|
||||
"company" in 1 lines
|
||||
"trade" in 1 lines
|
||||
"target" in 1 lines
|
||||
"service" in 1 lines
|
||||
"recommendation" in 1 lines
|
||||
(10 word occurrences found)
|
||||
</pre><i>Continue on to the next test (make sure your focr_autodisable_score is 50):</i><br>
|
||||
<code>
|
||||
spamassassin -t < jpeg.eml
|
||||
</code>
|
||||
<br><br>
|
||||
<i>I got:</i><pre>
|
||||
4.0 FUZZY_OCR BODY: Mail contains an image with common spam text inside
|
||||
Words found:
|
||||
"viagra" in 2 lines
|
||||
"cialis" in 1 lines
|
||||
"levitra" in 1 lines
|
||||
(4 word occurrences found)
|
||||
</pre>
|
||||
<i>Continue on to the next test<br>
|
||||
(if this one fails, read
|
||||
<a href="http://marc.theaimsgroup.com/?l=spamassassin-users&m=115664281009909" target="_new4">
|
||||
http://marc.theaimsgroup.com/?l=spamassassin-users&m=115664281009909</a>):</i><br>
|
||||
<code>
|
||||
spamassassin -t < png.eml
|
||||
</code>
|
||||
<br><br>
|
||||
<i>I got:</i><pre>
|
||||
28 FUZZY_OCR BODY: Mail contains an image with common spam text inside
|
||||
Words found:
|
||||
"alert" in 2 lines
|
||||
"news" in 2 lines
|
||||
"symbol" in 1 lines
|
||||
"alert" in 2 lines
|
||||
"stock" in 1 lines
|
||||
"investor" in 3 lines
|
||||
"company" in 2 lines
|
||||
"buy" in 1 lines
|
||||
"price" in 2 lines
|
||||
"trade" in 2 lines
|
||||
"target" in 2 lines
|
||||
"service" in 2 lines
|
||||
"recommendation" in 1 lines
|
||||
"levitra" in 1 lines
|
||||
"software" in 2 lines
|
||||
(26 word occurrences found)</pre>
|
||||
<i>Edit FuzzyOcr.cf and set focr_autodisable_score score back to a more reasonable level:</i>
|
||||
<br>
|
||||
<code>
|
||||
vi /etc/spamassassin/FuzzyOcr.cf
|
||||
</code>
|
||||
<br><br>
|
||||
|
||||
<i>I set the focr_autodisable_score to the same value as my
|
||||
$sa_kill_level_deflt in amavisd.conf:</i><br>
|
||||
focr_autodisable_score 8
|
||||
<br><br>
|
||||
|
||||
<i>Reload amavisd-new (or spamd if you are using that):</i><br>
|
||||
<code>
|
||||
amavisd-new reload
|
||||
</code>
|
||||
<br><br>
|
||||
|
||||
<i>And keep an eye on the mail.log for a while:</i><br>
|
||||
<code>
|
||||
tail -f /var/log/mail.log
|
||||
</code>
|
||||
<br><br>
|
||||
|
||||
<i>If you upgrade to SpamAssassin 3.1.4 or newer from an older 3.1.x version,
|
||||
remember to set focr_pre314 to 0.0</i>
|
||||
<br><br>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
|
||||
<table cellpadding="4" border="1">
|
||||
<tr>
|
||||
<td>
|
||||
<i>Now we will install another plugin. This one is from SARE
|
||||
<a href="http://www.rulesemporium.com/plugins.htm" target="_new33">
|
||||
http://www.rulesemporium.com/plugins.htm</a>.
|
||||
Once again, navigate to your Plugin directory and grab the plugin:</i><br>
|
||||
<code>
|
||||
cd /usr/share/perl5/Mail/SpamAssassin/Plugin<br>
|
||||
wget http://www.rulesemporium.com/plugins/ImageInfo.pm
|
||||
</code>
|
||||
<br><br>
|
||||
|
||||
<i>Also get the configuration file:</i><br>
|
||||
<code>
|
||||
cd /etc/spamassassin/<br>
|
||||
wget http://www.rulesemporium.com/plugins/imageinfo.cf
|
||||
</code>
|
||||
<br><br>
|
||||
|
||||
<i>Edit v310.pre:</i><br>
|
||||
<code>
|
||||
vi v310.pre
|
||||
</code>
|
||||
<br><br>
|
||||
|
||||
<i>and insert (at the bottom):</i><br>
|
||||
loadplugin Mail::SpamAssassin::Plugin::ImageInfo
|
||||
<br><br>
|
||||
|
||||
<i>Edit imageinfo.cf and lower any scores that are 3.0 or more to half their value.
|
||||
This is to help prevent false positives:</i><br>
|
||||
<code>
|
||||
vi imageinfo.cf
|
||||
</code>
|
||||
<br><br>
|
||||
|
||||
<i>Save and exit the file, and of course, lint spamassassin:</i><br>
|
||||
<code>
|
||||
spamassassin --lint
|
||||
</code>
|
||||
<br><br>
|
||||
<i>and reload amavisd-new (or spamd if you are using that):</i><br>
|
||||
<code>
|
||||
amavisd-new reload
|
||||
</code>
|
||||
<br><br>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
<br>
|
||||
mr88talent at yahoo dot com<br>
|
||||
8/28/2006<br>
|
||||
</body>
|
||||
</html>
|
||||
Reference in New Issue
Block a user