Warning: Attempt to read property "date" on null in /usr/local/www/websvn.planix.org/blame.php on line 247

Warning: Attempt to read property "msg" on null in /usr/local/www/websvn.planix.org/blame.php on line 247
WebSVN – planix.SVN – Blame – /os/branches/planix-v0/sys/man/8/scanmail – Rev 2

Subversion Repositories planix.SVN

Rev

Go to most recent revision | Details | Last modification | View Log | RSS feed

Rev Author Line No. Line
2 - 1
.TH SCANMAIL 8
2
.SH NAME
3
scanmail, testscan \-  spam filters
4
.SH SYNOPSIS
5
.B upas/scanmail
6
[
7
.I options
8
]
9
[
10
.I qer-args
11
]
12
.I root
13
.B mail
14
.I sender system rcpt-list
15
.PP
16
.B upas/testscan
17
[
18
.B -avd
19
]
20
[
21
.B -p
22
.I patfile
23
]
24
[
25
.I filename
26
]
27
.SH DESCRIPTION
28
.B Scanmail
29
accepts a mail message supplied on standard input,
30
applies a file of patterns to a portion of it,
31
and dispatches
32
the message based
33
on the results.
34
It exactly replaces the
35
generic queuing command
36
.IR qer (8)
37
that is executed from the
38
.IR rc (1)
39
script
40
.B /mail/lib/qmail
41
in the mail processing pipeline.
42
Associated with each pattern is an
43
.I action
44
in order of decreasing priority:
45
.in +5
46
.TP 10
47
.B dump
48
the message is deleted and a log entry is written to
49
.B /sys/log/smtpd
50
.TP 10
51
.B hold
52
the message is placed in a queue for human inspection
53
.TP
54
.B log
55
a line containing the matching portion of the message is written to a log
56
.in -5
57
.PP
58
If no pattern matches or only patterns with an action of
59
.B log
60
match, the message is accepted and
61
.I scanmail
62
queues the message for delivery.
63
.I Scanmail
64
meshes with the blocking facilities
65
of
66
.IR smtpd (6)
67
to provide several layers of
68
filtering on gateway systems.  In all cases the sender
69
is notified that the message has been successfully
70
delivered,
71
leaving the sender unaware that the message has been potentially delayed or deleted.
72
.PP
73
.I Scanmail
74
accepts the arguments of
75
.IR qer (8)
76
as well as the following:
77
.TF filename
78
.TP
79
.B  -c
80
Save a copy of each message in a
81
randomly-named file in
82
directory
83
.BR /mail/copy .
84
.TP
85
.B -d
86
Write debugging information to standard error.
87
.TP
88
.B -h
89
Queue
90
.I held
91
messages by sending domain name.
92
The
93
.B -q
94
option must specify a root directory; messages
95
are queued in subdirectories of this directory.
96
If the
97
.B -h
98
option is not specified,
99
messages are accumulated in a subdirectory of
100
.B /mail/queue.hold
101
named for the contents of
102
.BR /dev/user ,
103
usually
104
.BR none .
105
.TF filename
106
.TP
107
.B -n
108
Messages are never held for inspection, but are delivered.  Also known as
109
.IR "vacation mode" .
110
.TP
111
.BI -p " filename"
112
Read the patterns from
113
.I filename
114
rather than
115
.BR /mail/lib/patterns .
116
.TP
117
.BI -q " holdroot"
118
Queue deliverable messages in subdirectories of
119
.IR holdroot .
120
This option is the same as the
121
.B -q
122
option of
123
.IR qer (8)
124
and must be present if the
125
.B -h
126
option is given.
127
.TP
128
.B  -s
129
Save deleted
130
messages.   Messages are stored, one per randomly-named file,
131
in subdirectories of
132
.B /mail/queue.dump
133
named with the date.
134
.TP
135
.B -t
136
Test mode.  The pattern matcher is applied but the message is
137
discarded and the result is not logged.
138
.TP
139
.B -v
140
Print the highest priority match.
141
This is useful
142
with the
143
.B -t
144
option for testing the pattern matcher without actually
145
sending a message.
146
.PD
147
.PP
148
.I Testscan
149
is the command line version of
150
.IR scanmail .
151
If
152
.I filename
153
is missing, it applies the pattern set to
154
the message on standard input.  Unlike
155
.IR scanmail ,
156
which finds the highest priority match,
157
.I testscan
158
prints all matches in the portion of the message under test.
159
It is useful for testing a pattern set or
160
implementing a personal filter
161
using the
162
.B pipeto
163
file in a user's mail directory.
164
.I Testscan
165
accepts the following options:
166
.TP
167
.B -a
168
Print matches in the complete input message
169
.TP
170
.B -d
171
Enable debug mode
172
.TP
173
.B -v
174
Print the message after conversion to canonical form
175
.RI ( q.v. ).
176
.TP
177
.BI -p " filename"
178
Read the patterns from
179
.I filename
180
rather than
181
.BR /mail/lib/patterns .
182
.SS Canonicalization
183
Before pattern matching, both programs convert a portion of
184
the message header and the beginning of the
185
message to a canonical form.  The amount of the header
186
and message body processed are set by
187
compile-time parameters in the source files.
188
The canonicalization process converts letters to lower-case and
189
replaces consecutive spaces, tabs and newline characters
190
with a single space.  HTML commands are
191
deleted except for the parameters following
192
.B A
193
.BR HREF ,
194
.B IMG
195
.BR SRC ,
196
and
197
.B IMG
198
.B BORDER
199
directives.  Additionally, the following MIME escape sequences
200
are replaced by their ASCII
201
equivalents:
202
.PP
203
.EX
204
           Escape Seq   ASCII
205
           ----------   -----
206
                =2e       .
207
                =2f       /
208
                =20    <space>
209
                =3d       =
210
.EE
211
and the sequence
212
.I =<newline>
213
is elided.
214
.I Scanmail
215
assembles the sender, destination domain and recipient fields of
216
the command line into a string that is
217
subjected to the same canonical processing.
218
Following canonicalization, the command line and
219
the two long strings containing
220
the header and the message body are passed to the
221
matching engine for analysis.
222
.SS Pattern Syntax
223
The matching engine compiles the pattern set
224
and matches it to each canonicalized input string.
225
Patterns are specified one per line
226
as follows:
227
.PP
228
.EX
229
	{*}\fIaction\fP: \fIpattern-spec\fP {~~\fIoverride\fP...~~\fIoverride\fP}
230
.EE
231
.PP
232
On all lines, a
233
.B #
234
introduces a comment; there is no way to escape this character.
235
.PP
236
Lines beginning with
237
.B *
238
contain a
239
.I pattern-spec
240
that is a string; otherwise, the the
241
.I pattern-spec
242
is a regular expression in the style of
243
.IR regexp (6).
244
Regular expression matching is many
245
times less efficient than string matching, so it is
246
wiser to enumerate several similar strings
247
than to combine them into a regular expression.
248
The
249
.I action
250
is a keyword terminated by a
251
.B : 
252
and separated from the pattern by optional white-space.
253
It must be one of the following:
254
.TP 10
255
.B dump
256
if the pattern matches, the message is deleted.  If the
257
.B -s
258
command line option is set, the message is saved.
259
.TP 10
260
.B hold
261
if the pattern matches, the message is queued in a subdirectory
262
of
263
.B /mail/queue.hold
264
for manual inspection.  After inspection, the queue can be swept
265
manually using
266
.B runq
267
(see
268
.IR qer (8))
269
to deliver messages that were inadvertently matched.
270
.TP 10
271
.B header
272
this is the same as the
273
.B hold
274
action, except the pattern is only applied to the message header.
275
This optimization is useful for patterns that match header fields
276
that are unlikely to be present in the body of the message.
277
.TP 10
278
.B line
279
the sender and a section of the message around the match are written to
280
the file
281
.BR /sys/log/lines .
282
The message is always delivered.
283
.TP 10
284
.B loff
285
patterns of this type are applied only to the canonicalized command line.
286
When a match occurs, all patterns with
287
.B line
288
actions are disabled.  This is useful for limiting
289
the size of the log file by excluding repetitive messages, such
290
as those from mailing lists.
291
.PP
292
Patterns are accumulated into pattern sets sharing the same action.
293
The matching engine applies the
294
.B dump
295
pattern set first, then the
296
.B header
297
and
298
.B hold
299
pattern sets, and finally the
300
.B line
301
pattern set.  Each pattern set is applied three times:
302
to the canonicalized command line, to the message header, and
303
finally to the message body.  The ordering of patterns
304
in the pattern file is insignificant.
305
.PP
306
The
307
.I pattern-spec
308
is a string of characters terminated by a
309
.BR newline ,
310
.B #
311
or override indicator,
312
.BR ~~ .
313
Trailing white-space is deleted but
314
patterns containing leading or trailing white-space can
315
be enclosed in double-quote
316
characters.  A pattern containing a double-quote
317
must be enclosed in double-quote
318
characters and preceded by a backslash.
319
For example, the pattern
320
.PP
321
.EX
322
	"this is not \\"spam\\""
323
.EE
324
.PP
325
matches the string \fLthis is not "spam"\fP.
326
The
327
.I pattern-spec
328
is followed by zero or more
329
.I override
330
strings.  When the specific pattern matches,
331
each override is applied and
332
if one matches, it cancels the effect of the pattern.
333
Overrides must be strings; regular expressions are not supported.
334
Each override is introduced by the string
335
.BR ~~
336
and continues until a subsequent
337
.BR ~~ ,
338
.B #
339
or
340
.BR newline ,
341
white-space included.
342
A
343
.B ~~
344
immediately followed by a
345
.B newline
346
indicates a line continuation and further overrides continue
347
on the following line.
348
Leading white-space
349
on the continuation line is ignored.  For example,
350
.PP
351
.EX
352
        *hold:   sex.com~~essex.com~~sussex.com~~sysex.com~~
353
                 lasex.com~~cse.psu.edu!owner-9fans
354
.EE
355
.PP
356
matches all input containing the string
357
.B sex.com
358
except for messages that also contain the
359
strings in the override list.  Often it
360
is desirable to override a pattern based on
361
the name of the sender or
362
recipient.  For this reason, each override
363
pattern is applied to the header and the command line as well
364
as the section of the
365
canonicalized input containing the matching data.
366
Thus a pattern matching the command line or the header
367
searches both the command line and the header
368
for overrides while a match in the body searches
369
the body, header and command line for overrides.
370
.PP
371
The structure of the pattern file and the matching
372
algorithm define the strategy for detecting
373
and filtering unwanted messages.  Ideally, a
374
.B hold
375
pattern selects a message for inspection and if it
376
is determined to be undesirable, a specific
377
.B dump
378
pattern is added to delete further instances
379
of the message.  Additionally, it is often
380
useful to block the sender by updating the
381
.B smtpd
382
control file.
383
.PP
384
In this regime, patterns with a
385
.I dump
386
action, generally match phrases
387
that are likely to be unique.  Patterns that
388
hold a message for inspection
389
match phrases commonly found in undesirable material and
390
occasionally in legitimate messages.  Patterns
391
that log matches are less specific yet.  In all
392
cases the ability to override a pattern by
393
matching another string, allows repetitive messages
394
that trigger the pattern, such as mailing lists,
395
to pass the filter after the first one is processed
396
manually.  The
397
.B -s
398
option allows deleted messages to be salvaged
399
by either manual or semi-automatic review, supporting
400
the specification of more aggressive patterns.
401
Finally, the utility of the pattern matcher is not
402
confined to filtering spam; it is a generally useful
403
administrative tool for deleting inadvertently harmful
404
messages, for example, mail loops, stuck senders or viruses.
405
It is also useful for collecting or counting messages
406
matching certain criteria.
407
.SH FILES
408
.TF /mail/queue.dump/*
409
.TP
410
.B /mail/lib/patterns
411
default pattern file
412
.TP
413
.B /sys/log/smtpd
414
log of deleted messages
415
.TP
416
.B /mail/log/lines
417
file where
418
.I log
419
matches are logged
420
.TP
421
.B /mail/queue/*
422
directories where legitimate messages are queued for delivery
423
.TP
424
.B /mail/queue.hold
425
directory where held messages are queued for inspection
426
.TP
427
.B /mail/queue.dump/*
428
directory where
429
.I dumped
430
messages are stored when the
431
.B -s
432
command line option is specified.
433
.TP
434
.B /mail/copy/*
435
directory where copies of all incoming messages
436
are stored.
437
.SH SOURCE
438
.TP
439
.B /sys/src/cmd/upas/scanmail
440
.SH "SEE ALSO"
441
.IR mail (1),
442
.IR qer (8),
443
.IR smtpd (6)
444
.SH BUGS
445
.I Testscan
446
does not report a match when the body of a message
447
contains exactly one line.