Subversion Repositories planix.SVN

Rev

Details | Last modification | View Log | RSS feed

Rev Author Line No. Line
96 7u83 1
.de EX
2
.nf
3
.ft CW
4
..
5
.de EE
6
.br
7
.fi
8
.ft 1
9
..
10
.TH AWK 1
11
.CT 1 files prog_other
12
.SH NAME
13
awk \- pattern-directed scanning and processing language
14
.SH SYNOPSIS
15
.B awk
16
[
17
.BI \-F
18
.I fs
19
]
20
[
21
.BI \-v
22
.I var=value
23
]
24
[
25
.I 'prog'
26
|
27
.BI \-f
28
.I progfile
29
]
30
[
31
.I file ...
32
]
33
.SH DESCRIPTION
34
.I Awk
35
scans each input
36
.I file
37
for lines that match any of a set of patterns specified literally in
38
.I prog
39
or in one or more files
40
specified as
41
.B \-f
42
.IR progfile .
43
With each pattern
44
there can be an associated action that will be performed
45
when a line of a
46
.I file
47
matches the pattern.
48
Each line is matched against the
49
pattern portion of every pattern-action statement;
50
the associated action is performed for each matched pattern.
51
The file name 
52
.B \-
53
means the standard input.
54
Any
55
.I file
56
of the form
57
.I var=value
58
is treated as an assignment, not a filename,
59
and is executed at the time it would have been opened if it were a filename.
60
The option
61
.B \-v
62
followed by
63
.I var=value
64
is an assignment to be done before
65
.I prog
66
is executed;
67
any number of
68
.B \-v
69
options may be present.
70
The
71
.B \-F
72
.I fs
73
option defines the input field separator to be the regular expression
74
.IR fs .
75
.PP
76
An input line is normally made up of fields separated by white space,
77
or by the regular expression
78
.BR FS .
79
The fields are denoted
80
.BR $1 ,
81
.BR $2 ,
82
\&..., while
83
.B $0
84
refers to the entire line.
85
If
86
.BR FS
87
is null, the input line is split into one field per character.
88
.PP
89
A pattern-action statement has the form:
90
.IP
91
.IB pattern " { " action " }
92
.PP
93
A missing 
94
.BI { " action " }
95
means print the line;
96
a missing pattern always matches.
97
Pattern-action statements are separated by newlines or semicolons.
98
.PP
99
An action is a sequence of statements.
100
A statement can be one of the following:
101
.PP
102
.EX
103
.ta \w'\f(CWdelete array[expression]\fR'u
104
.RS
105
.nf
106
.ft CW
107
if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
108
while(\fI expression \fP)\fI statement\fP
109
for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
110
for(\fI var \fPin\fI array \fP)\fI statement\fP
111
do\fI statement \fPwhile(\fI expression \fP)
112
break
113
continue
114
{\fR [\fP\fI statement ... \fP\fR] \fP}
115
\fIexpression\fP	#\fR commonly\fP\fI var = expression\fP
116
print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
117
printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
118
return\fR [ \fP\fIexpression \fP\fR]\fP
119
next	#\fR skip remaining patterns on this input line\fP
120
nextfile	#\fR skip rest of this file, open next, start at top\fP
121
delete\fI array\fP[\fI expression \fP]	#\fR delete an array element\fP
122
delete\fI array\fP	#\fR delete all elements of array\fP
123
exit\fR [ \fP\fIexpression \fP\fR]\fP	#\fR exit immediately; status is \fP\fIexpression\fP
124
.fi
125
.RE
126
.EE
127
.DT
128
.PP
129
Statements are terminated by
130
semicolons, newlines or right braces.
131
An empty
132
.I expression-list
133
stands for
134
.BR $0 .
135
String constants are quoted \&\f(CW"\ "\fR,
136
with the usual C escapes recognized within.
137
Expressions take on string or numeric values as appropriate,
138
and are built using the operators
139
.B + \- * / % ^
140
(exponentiation), and concatenation (indicated by white space).
141
The operators
142
.B
143
! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
144
are also available in expressions.
145
Variables may be scalars, array elements
146
(denoted
147
.IB x  [ i ] \fR)
148
or fields.
149
Variables are initialized to the null string.
150
Array subscripts may be any string,
151
not necessarily numeric;
152
this allows for a form of associative memory.
153
Multiple subscripts such as
154
.B [i,j,k]
155
are permitted; the constituents are concatenated,
156
separated by the value of
157
.BR SUBSEP .
158
.PP
159
The
160
.B print
161
statement prints its arguments on the standard output
162
(or on a file if
163
.BI > " file
164
or
165
.BI >> " file
166
is present or on a pipe if
167
.BI | " cmd
168
is present), separated by the current output field separator,
169
and terminated by the output record separator.
170
.I file
171
and
172
.I cmd
173
may be literal names or parenthesized expressions;
174
identical string values in different statements denote
175
the same open file.
176
The
177
.B printf
178
statement formats its expression list according to the
179
.I format
180
(see
181
.IR printf (3)).
182
The built-in function
183
.BI close( expr )
184
closes the file or pipe
185
.IR expr .
186
The built-in function
187
.BI fflush( expr )
188
flushes any buffered output for the file or pipe
189
.IR expr .
190
.PP
191
The mathematical functions
192
.BR atan2 ,
193
.BR cos ,
194
.BR exp ,
195
.BR log ,
196
.BR sin ,
197
and
198
.B sqrt
199
are built in.
200
Other built-in functions:
201
.TF length
202
.TP
203
.B length
204
the length of its argument
205
taken as a string,
206
number of elements in an array for an array argument,
207
or length of
208
.B $0
209
if no argument.
210
.TP
211
.B rand
212
random number on (0,1)
213
.TP
214
.B srand
215
sets seed for
216
.B rand
217
and returns the previous seed.
218
.TP
219
.B int
220
truncates to an integer value
221
.TP
222
\fBsubstr(\fIs\fB, \fIm\fR [\fB, \fIn\^\fR]\fB)\fR
223
the
224
.IR n -character
225
substring of
226
.I s
227
that begins at position
228
.I m 
229
counted from 1.
230
If no
231
.IR m ,
232
use the rest of the string
233
.I 
234
.TP
235
.BI index( s , " t" )
236
the position in
237
.I s
238
where the string
239
.I t
240
occurs, or 0 if it does not.
241
.TP
242
.BI match( s , " r" )
243
the position in
244
.I s
245
where the regular expression
246
.I r
247
occurs, or 0 if it does not.
248
The variables
249
.B RSTART
250
and
251
.B RLENGTH
252
are set to the position and length of the matched string.
253
.TP
254
\fBsplit(\fIs\fB, \fIa \fR[\fB, \fIfs\^\fR]\fB)\fR
255
splits the string
256
.I s
257
into array elements
258
.IB a [1] \fR,
259
.IB a [2] \fR,
260
\&...,
261
.IB a [ n ] \fR,
262
and returns
263
.IR n .
264
The separation is done with the regular expression
265
.I fs
266
or with the field separator
267
.B FS
268
if
269
.I fs
270
is not given.
271
An empty string as field separator splits the string
272
into one array element per character.
273
.TP
274
\fBsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
275
substitutes
276
.I t
277
for the first occurrence of the regular expression
278
.I r
279
in the string
280
.IR s .
281
If
282
.I s
283
is not given,
284
.B $0
285
is used.
286
.TP
287
\fBgsub(\fIr\fB, \fIt \fR[, \fIs\^\fR]\fB)
288
same as
289
.B sub
290
except that all occurrences of the regular expression
291
are replaced;
292
.B sub
293
and
294
.B gsub
295
return the number of replacements.
296
.TP
297
.BI sprintf( fmt , " expr" , " ...\fB)
298
the string resulting from formatting
299
.I expr ...
300
according to the
301
.IR printf (3)
302
format
303
.IR fmt .
304
.TP
305
.BI system( cmd )
306
executes
307
.I cmd
308
and returns its exit status. This will be \-1 upon error,
309
.IR cmd 's
310
exit status upon a normal exit,
311
256 + 
312
.I sig
313
upon death-by-signal, where
314
.I sig
315
is the number of the murdering signal,
316
or 512 +
317
.I sig
318
if there was a core dump.
319
.TP
320
.BI tolower( str )
321
returns a copy of
322
.I str
323
with all upper-case characters translated to their
324
corresponding lower-case equivalents.
325
.TP
326
.BI toupper( str )
327
returns a copy of
328
.I str
329
with all lower-case characters translated to their
330
corresponding upper-case equivalents.
331
.PD
332
.PP
333
The ``function''
334
.B getline
335
sets
336
.B $0
337
to the next input record from the current input file;
338
.B getline
339
.BI < " file
340
sets
341
.B $0
342
to the next record from
343
.IR file .
344
.B getline
345
.I x
346
sets variable
347
.I x
348
instead.
349
Finally,
350
.IB cmd " | getline
351
pipes the output of
352
.I cmd
353
into
354
.BR getline ;
355
each call of
356
.B getline
357
returns the next line of output from
358
.IR cmd .
359
In all cases,
360
.B getline
361
returns 1 for a successful input,
362
 
363
.PP
364
Patterns are arbitrary Boolean combinations
365
(with
366
.BR "! || &&" )
367
of regular expressions and
368
relational expressions.
369
Regular expressions are as in
370
.IR egrep ; 
371
see
372
.IR grep (1).
373
Isolated regular expressions
374
in a pattern apply to the entire line.
375
Regular expressions may also occur in
376
relational expressions, using the operators
377
.B ~
378
and
379
.BR !~ .
380
.BI / re /
381
is a constant regular expression;
382
any string (constant or variable) may be used
383
as a regular expression, except in the position of an isolated regular expression
384
in a pattern.
385
.PP
386
A pattern may consist of two patterns separated by a comma;
387
in this case, the action is performed for all lines
388
from an occurrence of the first pattern
389
though an occurrence of the second.
390
.PP
391
A relational expression is one of the following:
392
.IP
393
.I expression matchop regular-expression
394
.br
395
.I expression relop expression
396
.br
397
.IB expression " in " array-name
398
.br
399
.BI ( expr , expr,... ") in " array-name
400
.PP
401
where a
402
.I relop
403
is any of the six relational operators in C,
404
and a
405
.I matchop
406
is either
407
.B ~
408
(matches)
409
or
410
.B !~
411
(does not match).
412
A conditional is an arithmetic expression,
413
a relational expression,
414
or a Boolean combination
415
of these.
416
.PP
417
The special patterns
418
.B BEGIN
419
and
420
.B END
421
may be used to capture control before the first input line is read
422
and after the last.
423
.B BEGIN
424
and
425
.B END
426
do not combine with other patterns.
427
They may appear multiple times in a program and execute
428
in the order they are read by
429
.IR awk .
430
.PP
431
Variable names with special meanings:
432
.TF FILENAME
433
.TP
434
.B ARGC
435
argument count, assignable.
436
.TP
437
.B ARGV
438
argument array, assignable;
439
non-null members are taken as filenames.
440
.TP
441
.B CONVFMT
442
conversion format used when converting numbers
443
(default
444
.BR "%.6g" ).
445
.TP
446
.B ENVIRON
447
array of environment variables; subscripts are names.
448
.TP
449
.B FILENAME
450
the name of the current input file.
451
.TP
452
.B FNR
453
ordinal number of the current record in the current file.
454
.TP
455
.B FS
456
regular expression used to separate fields; also settable
457
by option
458
.BI \-F fs\fR.
459
.TP
460
.BR NF
461
number of fields in the current record.
462
.TP
463
.B NR
464
ordinal number of the current record.
465
.TP
466
.B OFMT
467
output format for numbers (default
468
.BR "%.6g" ).
469
.TP
470
.B OFS
471
output field separator (default space).
472
.TP
473
.B ORS
474
output record separator (default newline).
475
.TP
476
.B RLENGTH
477
the length of a string matched by
478
.BR match .
479
.TP
480
.B RS
481
input record separator (default newline).
482
.TP
483
.B RSTART
484
the start position of a string matched by
485
.BR match .
486
.TP
487
.B SUBSEP
488
separates multiple subscripts (default 034).
489
.PD
490
.PP
491
Functions may be defined (at the position of a pattern-action statement) thus:
492
.IP
493
.B
494
function foo(a, b, c) { ...; return x }
495
.PP
496
Parameters are passed by value if scalar and by reference if array name;
497
functions may be called recursively.
498
Parameters are local to the function; all other variables are global.
499
Thus local variables may be created by providing excess parameters in
500
the function definition.
501
.SH EXAMPLES
502
.TP
503
.EX
504
length($0) > 72
505
.EE
506
Print lines longer than 72 characters.
507
.TP
508
.EX
509
{ print $2, $1 }
510
.EE
511
Print first two fields in opposite order.
512
.PP
513
.EX
514
BEGIN { FS = ",[ \et]*|[ \et]+" }
515
      { print $2, $1 }
516
.EE
517
.ns
518
.IP
519
Same, with input fields separated by comma and/or spaces and tabs.
520
.PP
521
.EX
522
.nf
523
	{ s += $1 }
524
END	{ print "sum is", s, " average is", s/NR }
525
.fi
526
.EE
527
.ns
528
.IP
529
Add up first column, print sum and average.
530
.TP
531
.EX
532
/start/, /stop/
533
.EE
534
Print all lines between start/stop pairs.
535
.PP
536
.EX
537
.nf
538
BEGIN	{	# Simulate echo(1)
539
	for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
540
	printf "\en"
541
	exit }
542
.fi
543
.EE
544
.SH SEE ALSO
545
.IR grep (1), 
546
.IR lex (1), 
547
.IR sed (1)
548
.br
549
A. V. Aho, B. W. Kernighan, P. J. Weinberger,
550
.IR "The AWK Programming Language" ,
551
Addison-Wesley, 1988.  ISBN 0-201-07981-X.
552
.SH BUGS
553
There are no explicit conversions between numbers and strings.
554
To force an expression to be treated as a number add 0 to it;
555
to force it to be treated as a string concatenate
556
\&\f(CW""\fP to it.
557
.br
558
The scope rules for variables in functions are a botch;
559
the syntax is worse.
560
.br
561
POSIX-standard interval expressions in regular expressions are not supported.
562
.br
563
Only eight-bit characters sets are handled correctly.