Current oav website

This commit is contained in:
Charlie Root
2023-03-20 12:18:38 +01:00
commit a096ce07cf
3270 changed files with 261778 additions and 0 deletions

416
mirrors/image_spam.html Normal file
View File

@ -0,0 +1,416 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=iso-8859-1">
<title>Fighting image spam on our Debian spamfilter with FuzzyOcr and ImageInfo</title>
<style type="text/css">
.cl2 {white-space: nowrap;}
body {font-family:Arial, Helvetica, sans-serif, "MS sans serif";
font-size: 12px;
color:black;}
a { color: #ff0000}
a:hover {text-decoration: none; color: #000000}
pre {
font-family: monospace;
font-size:14px;}
td {
font-family:Arial, Helvetica, sans-serif, "MS sans serif";
font-size:14px;}
i {
font-size:16px;
font-family : "Times New Roman", Times, serif;}
code {
font-family: monospace;
font-size:14px;}
H3
{font-family:Arial, Helvetica, sans-serif, "MS sans serif"; font-size: 16px;
MARGIN-TOP: 3px;
MARGIN-BOTTOM: 3px;}
H4
{font-family:Arial, Helvetica, sans-serif, "MS sans serif"; font-size: 16px;
MARGIN-TOP: 3px;
MARGIN-BOTTOM: 3px;}
</style>
</head>
<body lang="en-US">
<h3>Fighting image spam on our Debian spamfilter with FuzzyOcr and ImageInfo plugins</h3>
<br>
Absolutely no warranty, see the disclaimer at
<a href="http://www200.pair.com/mecham/spam/" target="_new10">
http://www200.pair.com/mecham/spam/</a>
<br><br>
Thanks to Robert LeBlanc and this
<a href="https://secure.renaissoft.com/maia/wiki/FuzzyOCR23" target="_new20">
excellent guide</a>.
<br><br>
<table cellpadding="4" border="1">
<tr>
<td>
<i>Image spam is rather difficult to deal with. FuzzyOcr has helped me do so.
Of course, running an OCR scanner on images contained in messages will slow
down the spam scanning process (considerably in some cases), but for me this is not a problem, just
be aware of it. I document my installation of FuzzyOcr here.
I will assume you have at least testing and stable sources listed
in /etc/apt/sources.list, and 'stable' has top priority in
<a href="http://jaqque.sbih.org/kplug/apt-pinning.html" target="_new091">/etc/apt/preferences</a>
and you are able to compile programs from source (but of course there is
no reason you could not install this stuff on an Etch or Sid system).
You need to be at <b>SpamAssassin version 3.1.1 or greater</b> to use these plugins.
Consider installing SpamAssassin from
<a href="http://www.backports.org/dokuwiki/doku.php?id=instructions" target="_new109">sarge-backports</a>
if you are trying to keep your system 'stable'. You can possibly install
SpamAssassin from 'testing' without upgrading to the testing versions
of libc6 or Perl or the Kernel by
using the form 'apt-get install spamassassin/testing'. Using this form should
install dependencies from 'stable'. Simulate it first with 'apt-get -s install [...]'.
In order to prevent accidental upgrades when running 'apt-get upgrade'
you might also consider putting the package on hold if you use this method:<br>
echo "spamassassin hold" | dpkg --set-selections
<br><br>
One of the programs
we need is not available in the stable release, so we will install from testing.
First test that installing libimage-exiftool-perl will not install libc6 or related
programs:</i><br>
<code>
apt-get update<br>
apt-get -s install libimage-exiftool-perl/testing
</code>
<br><br>
<i>If libimage-exiftool-perl is the only thing that will be installed, then install it.
If it is not, and you are trying to keep your system stable, then contact me before you continue:</i>
<br>
<code>
apt-get install libimage-exiftool-perl/testing
<br><br>
</code>
<i>See what version you have:</i><br>
<code>dpkg -l libimage-exiftool-perl
</code><br><br>
<i>If your version is less than 6.36-1, patch ExifTool:</i><br>
<code>
cd /usr/share/perl5/Image/ExifTool/<br>
wget http://antispam.imp.ch/patches/patch-GIF-Colortable<br>
patch -b GIF.pm &lt; patch-GIF-Colortable
<br><br>
</code>
<i>To prevent future accidental upgrades during 'apt-get upgrade', place
the libimage-exiftool-perl package on hold:</i><br>
<code>
echo "libimage-exiftool-perl hold" | dpkg --set-selections
</code>
<br><br>
<i>You should not have both giflib-bin and libungif installed. Simulate removing giflib-bin:</i><br>
<code>apt-get -s remove giflib-bin</code>
<br><br>
<i>If it's not installed, then you can move on. If it's the only thing that will
be removed, then remove it:</i><br>
<code>apt-get remove giflib-bin</code>
<br><br>
<i>Download, extract, patch, compile and install libungif:</i><br>
<code>
cd /usr/local/src<br>
wget http://internap.dl.sourceforge.net/sourceforge/libungif/libungif-4.1.4.tar.gz<br>
tar xzvf libungif-4.1.4.tar.gz<br>
cd libungif-4.1.4/util<br>
wget http://users.own-hero.net/~decoder/fuzzyocr/giftext-segfault.patch<br>
patch giftext.c &lt; giftext-segfault.patch<br>
cd ..<br>
./configure --prefix=/usr && make && make install
<br><br>
</code>
<i>Continue to install other required programs:</i>
<br>
<code>
apt-get install libnetpbm10-dev netpbm giflib3g-dev libimage-exif-perl libstring-approx-perl<br>
apt-get install imagemagick libjpeg-progs
<br><br>
<i>Download, extract, patch, compile and install gocr:</i><br>
<code>
cd /usr/local/src<br>
wget http://www-e.uni-magdeburg.de/jschulen/ocr/gocr-0.40.tar.gz<br>
tar xzvf gocr-0.40.tar.gz<br>
cd gocr-0.40/src<br>
wget http://antispam.imp.ch/patches/patch-gocr-segfault<br>
patch pgm2asc.c &lt; patch-gocr-segfault<br>
cd ..<br>
./configure --prefix=/usr && make && make install
<br><br>
</code>
<i>Grab an image from me and run a test:</i><br>
<code></code>
cd<br>
wget http://www200.pair.com/mecham/spam/image001.gif<br>
giftopnm image001.gif > image001.pnm<br>
gocr image001.pnm
<br><br>
</code>
<i>The beginning of the output should look something like this:</i><pre>
' AnENTlON ALL DAY TRADERS AND INVESTORS '</pre>
<i>Run another test. The result should be roughly the same, but this
time you should not get error messages from giftopnm:</i><br>
<code>
giffix image001.gif > image001.fixed<br>
giftopnm image001.fixed > image001.pnm<br>
gocr image001.pnm
<br><br>
</code>
<i>Visit <a href="http://users.own-hero.net/~decoder/fuzzyocr/" target="_new1">
http://users.own-hero.net/~decoder/fuzzyocr/</a> and see what the latest version
of FuzzyOcr is (this document is based on 2.3b dated 29-Aug-2006), then modify the lines below if needed. If you install a different version
than what I have listed below, then the instructions could differ considerably.
Begin by locating the /Plugin/ directory used by SpamAssassin:</i><br>
<code>
updatedb<br>
locate /SpamAssassin/Plugin
</code>
<br><br>
<i>If you installed SpamAssassin using apt-get, the /Plugin directory should be
/usr/share/perl5/Mail/SpamAssassin/Plugin. If yours is different, you will need to modify
the commands below.</i><br>
<code>
cd /usr/local/src/<br>
wget http://users.own-hero.net/~decoder/fuzzyocr/fuzzyocr-2.3b.tar.gz<br>
tar xzvf fuzzyocr-2.3b.tar.gz<br>
cd FuzzyOcr-2.3b
<br><br>
</code>
<i>We will use a new patch Robert LeBlanc created for this particular version of FuzzyOcr.</i><br>
<code>
wget http://www200.pair.com/mecham/spam/fuzzyocr-23b-hashdb-poison.patch<br>
patch FuzzyOcr.pm &lt; fuzzyocr-23b-hashdb-poison.patch
<br><br>
</code>
<i>Then place the files:</i><br>
<code>
cp FuzzyOcr.pm /usr/share/perl5/Mail/SpamAssassin/Plugin/<br>
cp FuzzyOcr.cf /etc/spamassassin/<br>
cp FuzzyOcr.words.sample /etc/spamassassin/FuzzyOcr.words
<br><br>
</code>
<i>Edit v310.pre and add the plugin:</i><br>
<code>
vi /etc/spamassassin/v310.pre
</code>
<br><br>
<i>and insert (at the bottom):</i><br>
<code>
loadplugin FuzzyOcr /usr/share/perl5/Mail/SpamAssassin/Plugin/FuzzyOcr.pm
<br><br>
</code>
<i>Configure FuzzyOcr.cf:</i><br>
<code>
vi /etc/spamassassin/FuzzyOcr.cf
<br><br>
</code>
<i>comment out the first line (the one that loads the plugin):</i><br>
#loadplugin FuzzyOcr FuzzyOcr.pm<br>
<i>If (and only if) you are using a version of SpamAssassin less than 3.1.4, uncomment this line and set the value to 1.0:</i><br>
focr_pre314 1.0<br>
<i>Set focr_base_score to 2 (this is my personal choice):</i><br>
focr_base_score 2<br>
<i>Only while we test, set focr_autodisable_score to 50:</i><br>
focr_autodisable_score 50<br><br>
<i>Save and exit the file, then we test. Start by linting spamassassin:</i><br>
<code>
spamassassin --lint
</code>
<br><br>
<i>Once you have resolved any (serious) lint errors, we do some more testing.
This assumes you are still in the /usr/local/src/fuzzyocr-2.3b directory:</i>
<br>
<code>
cd samples<br>
spamassassin -t &lt; animated-gif.eml
</code>
<br><br>
<i>I got:</i><pre>
19 FUZZY_OCR BODY: Mail contains an image with common spam text inside
Words found:
"alert" in 4 lines
"charts" in 1 lines
"symbol" in 1 lines
"alert" in 4 lines
"stock" in 2 lines
"company" in 3 lines
"trade" in 1 lines
"xanax" in 1 lines
"meridia" in 1 lines
"growth" in 1 lines
(19 word occurrences found)
</pre><i>If you did not get something similar, check the log
for the last error message (if any).<br>
For example, on a low powered machine you may have to increase focr_timeout in /etc/spamassassin/FuzzyOcr.cf:</i><br>
<code>
cat /etc/spamassassin/FuzzyOcr.log
</code>
<br><br>
<i>Ideally, FuzzyOcr.log will not exist. Continue on to the next test:</i><br>
<code>
spamassassin -t &lt; corrupted-gif.eml
</code>
<br><br>
<i>I got:</i><pre>
1.5 FUZZY_OCR_WRONG_CTYPE BODY: Mail contains an image with wrong
content-type set
Image has format "GIF" but content-type is
"image/jpeg"
2.5 FUZZY_OCR_CORRUPT_IMG BODY: Mail contains a corrupted image
Corrupt image: GIF-LIB error: Image is
defective, decoding aborted.
10 FUZZY_OCR BODY: Mail contains an image with common spam text inside
Words found:
"alert" in 1 lines
"alert" in 1 lines
"stock" in 2 lines
"investor" in 1 lines
"company" in 1 lines
"trade" in 1 lines
"target" in 1 lines
"service" in 1 lines
"recommendation" in 1 lines
(10 word occurrences found)
</pre><i>Continue on to the next test (make sure your focr_autodisable_score is 50):</i><br>
<code>
spamassassin -t &lt; jpeg.eml
</code>
<br><br>
<i>I got:</i><pre>
4.0 FUZZY_OCR BODY: Mail contains an image with common spam text inside
Words found:
"viagra" in 2 lines
"cialis" in 1 lines
"levitra" in 1 lines
(4 word occurrences found)
</pre>
<i>Continue on to the next test<br>
(if this one fails, read
<a href="http://marc.theaimsgroup.com/?l=spamassassin-users&m=115664281009909" target="_new4">
http://marc.theaimsgroup.com/?l=spamassassin-users&amp;m=115664281009909</a>):</i><br>
<code>
spamassassin -t &lt; png.eml
</code>
<br><br>
<i>I got:</i><pre>
28 FUZZY_OCR BODY: Mail contains an image with common spam text inside
Words found:
"alert" in 2 lines
"news" in 2 lines
"symbol" in 1 lines
"alert" in 2 lines
"stock" in 1 lines
"investor" in 3 lines
"company" in 2 lines
"buy" in 1 lines
"price" in 2 lines
"trade" in 2 lines
"target" in 2 lines
"service" in 2 lines
"recommendation" in 1 lines
"levitra" in 1 lines
"software" in 2 lines
(26 word occurrences found)</pre>
<i>Edit FuzzyOcr.cf and set focr_autodisable_score score back to a more reasonable level:</i>
<br>
<code>
vi /etc/spamassassin/FuzzyOcr.cf
</code>
<br><br>
<i>I set the focr_autodisable_score to the same value as my
$sa_kill_level_deflt in amavisd.conf:</i><br>
focr_autodisable_score 8
<br><br>
<i>Reload amavisd-new (or spamd if you are using that):</i><br>
<code>
amavisd-new reload
</code>
<br><br>
<i>And keep an eye on the mail.log for a while:</i><br>
<code>
tail -f /var/log/mail.log
</code>
<br><br>
<i>If you upgrade to SpamAssassin 3.1.4 or newer from an older 3.1.x version,
remember to set focr_pre314 to 0.0</i>
<br><br>
</td>
</tr>
</table>
<br>
<table cellpadding="4" border="1">
<tr>
<td>
<i>Now we will install another plugin. This one is from SARE
<a href="http://www.rulesemporium.com/plugins.htm" target="_new33">
http://www.rulesemporium.com/plugins.htm</a>.
Once again, navigate to your Plugin directory and grab the plugin:</i><br>
<code>
cd /usr/share/perl5/Mail/SpamAssassin/Plugin<br>
wget http://www.rulesemporium.com/plugins/ImageInfo.pm
</code>
<br><br>
<i>Also get the configuration file:</i><br>
<code>
cd /etc/spamassassin/<br>
wget http://www.rulesemporium.com/plugins/imageinfo.cf
</code>
<br><br>
<i>Edit v310.pre:</i><br>
<code>
vi v310.pre
</code>
<br><br>
<i>and insert (at the bottom):</i><br>
loadplugin Mail::SpamAssassin::Plugin::ImageInfo
<br><br>
<i>Edit imageinfo.cf and lower any scores that are 3.0 or more to half their value.
This is to help prevent false positives:</i><br>
<code>
vi imageinfo.cf
</code>
<br><br>
<i>Save and exit the file, and of course, lint spamassassin:</i><br>
<code>
spamassassin --lint
</code>
<br><br>
<i>and reload amavisd-new (or spamd if you are using that):</i><br>
<code>
amavisd-new reload
</code>
<br><br>
</td>
</tr>
</table>
<br>
mr88talent at yahoo dot com<br>
8/28/2006<br>
</body>
</html>