Subversion Repositories planix.SVN

Rev

Rev 2 | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
2 - 1
.TH AWK 1
2
.SH NAME
3
awk \- pattern-directed scanning and processing language
4
.SH SYNOPSIS
5
.B awk
6
[
7
.B -F
8
.I fs
9
]
10
[
11
.B -d
12
]
13
[
14
.BI -mf
15
.I n
16
]
17
[
18
.B -mr
19
.I n
20
]
21
[
22
.B -safe
23
]
24
[
25
.B -v
26
.I var=value
27
]
28
[
29
.B -f
30
.I progfile
31
|
32
.I prog
33
]
34
[
35
.I file ...
36
]
37
.SH DESCRIPTION
38
.I Awk
39
scans each input
40
.I file
41
for lines that match any of a set of patterns specified literally in
42
.I prog
43
or in one or more files
44
specified as
45
.B -f
46
.IR progfile .
47
With each pattern
48
there can be an associated action that will be performed
49
when a line of a
50
.I file
51
matches the pattern.
52
Each line is matched against the
53
pattern portion of every pattern-action statement;
54
the associated action is performed for each matched pattern.
55
The file name 
56
.L -
57
means the standard input.
58
Any
59
.IR file
60
of the form
61
.I var=value
62
is treated as an assignment, not a file name,
63
and is executed at the time it would have been opened if it were a file name.
64
The option
65
.B -v
66
followed by
67
.I var=value
68
is an assignment to be done before the program
69
is executed;
70
any number of
71
.B -v
72
options may be present.
73
.B -F
74
.IR fs
75
option defines the input field separator to be the regular expression
76
.IR fs .
77
.PP
78
An input line is normally made up of fields separated by white space,
79
or by regular expression
80
.BR FS .
81
The fields are denoted
82
.BR $1 ,
83
.BR $2 ,
84
\&..., while
85
.B $0
86
refers to the entire line.
87
If
88
.BR FS
89
is null, the input line is split into one field per character.
90
.PP
91
To compensate for inadequate implementation of storage management,
92
the 
93
.B -mr
94
option can be used to set the maximum size of the input record,
95
and the
96
.B -mf
97
option to set the maximum number of fields.
98
.PP
99
The
100
.B -safe
101
option causes
102
.I awk
103
to run in 
104
``safe mode,''
105
in which it is not allowed to 
106
run shell commands or open files
107
and the environment is not made available
108
in the 
109
.B ENVIRON
110
variable.
111
.PP
112
A pattern-action statement has the form
113
.IP
114
.IB pattern " { " action " }
115
.PP
116
A missing 
117
.BI { " action " }
118
means print the line;
119
a missing pattern always matches.
120
Pattern-action statements are separated by newlines or semicolons.
121
.PP
122
An action is a sequence of statements.
123
A statement can be one of the following:
124
.PP
125
.EX
126
.ta \w'\fLdelete array[expression]'u
127
if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
128
while(\fI expression \fP)\fI statement\fP
129
for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
130
for(\fI var \fPin\fI array \fP)\fI statement\fP
131
do\fI statement \fPwhile(\fI expression \fP)
132
break
133
continue
134
{\fR [\fP\fI statement ... \fP\fR] \fP}
135
\fIexpression\fP	#\fR commonly\fP\fI var = expression\fP
136
print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
137
printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
138
return\fR [ \fP\fIexpression \fP\fR]\fP
139
next	#\fR skip remaining patterns on this input line\fP
140
nextfile	#\fR skip rest of this file, open next, start at top\fP
141
delete\fI array\fP[\fI expression \fP]	#\fR delete an array element\fP
142
delete\fI array\fP	#\fR delete all elements of array\fP
143
exit\fR [ \fP\fIexpression \fP\fR]\fP	#\fR exit immediately; status is \fP\fIexpression\fP
144
.EE
145
.DT
146
.PP
147
Statements are terminated by
148
semicolons, newlines or right braces.
149
An empty
150
.I expression-list
151
stands for
152
.BR $0 .
153
String constants are quoted \&\fL"\ "\fR,
154
with the usual C escapes recognized within.
155
Expressions take on string or numeric values as appropriate,
156
and are built using the operators
157
.B + \- * / % ^
158
(exponentiation), and concatenation (indicated by white space).
159
The operators
160
.B
161
! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
162
are also available in expressions.
163
Variables may be scalars, array elements
164
(denoted
165
.IB x  [ i ] )
166
or fields.
167
Variables are initialized to the null string.
168
Array subscripts may be any string,
169
not necessarily numeric;
170
this allows for a form of associative memory.
171
Multiple subscripts such as
172
.B [i,j,k]
173
are permitted; the constituents are concatenated,
174
separated by the value of
175
.BR SUBSEP .
176
.PP
177
The
178
.B print
179
statement prints its arguments on the standard output
180
(or on a file if
181
.BI > file
182
or
183
.BI >> file
184
is present or on a pipe if
185
.BI | cmd
186
is present), separated by the current output field separator,
187
and terminated by the output record separator.
188
.I file
189
and
190
.I cmd
191
may be literal names or parenthesized expressions;
192
identical string values in different statements denote
193
the same open file.
194
The
195
.B printf
196
statement formats its expression list according to the format
197
(see
198
.IR fprintf (2)) .
199
The built-in function
200
.BI close( expr )
201
closes the file or pipe
202
.IR expr .
203
The built-in function
204
.BI fflush( expr )
205
flushes any buffered output for the file or pipe
206
.IR expr .
207
If
208
.IR expr
209
is omitted or is a null string, all open files are flushed.
210
.PP
211
The mathematical functions
212
.BR exp ,
213
.BR log ,
214
.BR sqrt ,
215
.BR sin ,
216
.BR cos ,
217
and
218
.BR atan2 
219
are built in.
220
Other built-in functions:
221
.TF length
222
.TP
223
.B length
224
If its argument is a string, the string's length is returned.
225
If its argument is an array, the number of subscripts in the array is returned.
226
If no argument, the length of
227
.B $0
228
is returned.
229
.TP
230
.B rand
231
random number on (0,1)
232
.TP
233
.B srand
234
sets seed for
235
.B rand
236
and returns the previous seed.
237
.TP
238
.B int
239
truncates to an integer value
240
.TP
241
.B utf
242
converts its numerical argument, a character number, to a
243
.SM UTF
244
string
245
.TP
246
.BI substr( s , " m" , " n\fL)
247
the
248
.IR n -character
249
substring of
250
.I s
251
that begins at position
252
.IR m 
253
counted from 1.
254
If
255
.I n
256
is omitted, it is taken to be the length of
257
.I s
258
from
259
.IR m .
260
.TP
261
.BI index( s , " t" )
262
the position in
263
.I s
264
where the string
265
.I t
266
occurs, or 0 if it does not.
267
.TP
268
.BI match( s , " r" )
269
the position in
270
.I s
271
where the regular expression
272
.I r
273
occurs, or 0 if it does not.
274
The variables
275
.B RSTART
276
and
277
.B RLENGTH
278
are set to the position and length of the matched string.
279
.TP
280
.BI split( s , " a" , " fs\fL)
281
splits the string
282
.I s
283
into array elements
284
.IB a [1]\f1,
285
.IB a [2]\f1,
286
\&...,
287
.IB a [ n ]\f1,
288
and returns
289
.IR n .
290
The separation is done with the regular expression
291
.I fs
292
or with the field separator
293
.B FS
294
if
295
.I fs
296
is not given.
297
An empty string as field separator splits the string
298
into one array element per character.
299
.TP
300
.BI sub( r , " t" , " s\fL)
301
substitutes
302
.I t
303
for the first occurrence of the regular expression
304
.I r
305
in the string
306
.IR s .
307
If
308
.I s
309
is not given,
310
.B $0
311
is used.
312
A
313
.L &
314
character in
315
.I t
316
will be replaced by the sub-string of
317
.I s
318
matched by
319
.IR r ;
320
it may be escaped with
321
.L \e
322
to substitute a literal
323
.LR & .
324
.TP
325
.B gsub
326
same as
327
.B sub
328
except that all occurrences of the regular expression
329
are replaced;
330
.B sub
331
and
332
.B gsub
333
return the number of replacements.
334
.TP
335
.BI sprintf( fmt , " expr" , " ...\fL)
336
the string resulting from formatting
337
.I expr ...
338
according to the
339
.I printf
340
format
341
.I fmt
342
.TP
343
.BI system( cmd )
344
executes
345
.I cmd
346
and returns its exit status
347
.TP
348
.BI tolower( str )
349
returns a copy of
350
.I str
351
with all upper-case characters translated to their
352
corresponding lower-case equivalents.
353
.TP
354
.BI toupper( str )
355
returns a copy of
356
.I str
357
with all lower-case characters translated to their
358
corresponding upper-case equivalents.
359
.PD
360
.PP
361
The ``function''
362
.B getline
363
sets
364
.B $0
365
to the next input record from the current input file;
366
.B getline
367
.BI < file
368
sets
369
.B $0
370
to the next record from
371
.IR file .
372
.B getline
373
.I x
374
sets variable
375
.I x
376
instead.
377
Finally,
378
.IB cmd " | getline
379
pipes the output of
380
.I cmd
381
into
382
.BR getline ;
383
each call of
384
.B getline
385
returns the next line of output from
386
.IR cmd .
387
In all cases,
388
.B getline
389
returns 1 for a successful input,
390
 
391
.PP
392
Patterns are arbitrary Boolean combinations
393
(with
394
.BR "! || &&" )
395
of regular expressions and
396
relational expressions.
397
Regular expressions are as in
398
.IR regexp (6).
399
Isolated regular expressions
400
in a pattern apply to the entire line.
401
Regular expressions may also occur in
402
relational expressions, using the operators
403
.BR ~
404
and
405
.BR !~ .
406
.BI / re /
407
is a constant regular expression;
408
any string (constant or variable) may be used
409
as a regular expression, except in the position of an isolated regular expression
410
in a pattern.
411
.PP
412
A pattern may consist of two patterns separated by a comma;
413
in this case, the action is performed for all lines
414
from an occurrence of the first pattern
415
though an occurrence of the second.
416
.PP
417
A relational expression is one of the following:
418
.IP
419
.I expression matchop regular-expression
420
.br
421
.I expression relop expression
422
.br
423
.IB expression " in " array-name
424
.br
425
.BI ( expr , expr,... ") in " array-name
426
.PP
427
where a
428
.I relop
429
is any of the six relational operators in C,
430
and a
431
.I matchop
432
is either
433
.B ~
434
(matches)
435
or
436
.B !~
437
(does not match).
438
A conditional is an arithmetic expression,
439
a relational expression,
440
or a Boolean combination
441
of these.
442
.PP
443
The special patterns
444
.B BEGIN
445
and
446
.B END
447
may be used to capture control before the first input line is read
448
and after the last.
449
.B BEGIN
450
and
451
.B END
452
do not combine with other patterns.
453
.PP
454
Variable names with special meanings:
455
.TF FILENAME
456
.TP
457
.B CONVFMT
458
conversion format used when converting numbers
459
(default
460
.BR "%.6g" )
461
.TP
462
.B FS
463
regular expression used to separate fields; also settable
464
by option
465
.BI \-F fs\f1.
466
.TP
467
.BR NF
468
number of fields in the current record
469
.TP
470
.B NR
471
ordinal number of the current record
472
.TP
473
.B FNR
474
ordinal number of the current record in the current file
475
.TP
476
.B FILENAME
477
the name of the current input file
478
.TP
479
.B RS
480
input record separator (default newline)
481
.TP
482
.B OFS
483
output field separator (default blank)
484
.TP
485
.B ORS
486
output record separator (default newline)
487
.TP
488
.B OFMT
489
output format for numbers (default
490
.BR "%.6g" )
491
.TP
492
.B SUBSEP
493
separates multiple subscripts (default 034)
494
.TP
495
.B ARGC
496
argument count, assignable
497
.TP
498
.B ARGV
499
argument array, assignable;
500
non-null members are taken as file names
501
.TP
502
.B ENVIRON
503
array of environment variables; subscripts are names.
504
.PD
505
.PP
506
Functions may be defined (at the position of a pattern-action statement) thus:
507
.IP
508
.L
509
function foo(a, b, c) { ...; return x }
510
.PP
511
Parameters are passed by value if scalar and by reference if array name;
512
functions may be called recursively.
513
Parameters are local to the function; all other variables are global.
514
Thus local variables may be created by providing excess parameters in
515
the function definition.
516
.SH EXAMPLES
517
.TP
518
.L
519
length($0) > 72
520
Print lines longer than 72 characters.
521
.TP
522
.L
523
{ print $2, $1 }
524
Print first two fields in opposite order.
525
.PP
526
.EX
527
BEGIN { FS = ",[ \et]*|[ \et]+" }
528
      { print $2, $1 }
529
.EE
530
.ns
531
.IP
532
Same, with input fields separated by comma and/or blanks and tabs.
533
.PP
534
.EX
535
	{ s += $1 }
536
END	{ print "sum is", s, " average is", s/NR }
537
.EE
538
.ns
539
.IP
540
Add up first column, print sum and average.
541
.TP
542
.L
543
/start/, /stop/
544
Print all lines between start/stop pairs.
545
.PP
546
.EX
547
BEGIN	{	# Simulate echo(1)
548
	for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
549
	printf "\en"
550
	exit }
551
.EE
552
.SH SOURCE
553
.B /sys/src/cmd/awk
554
.SH SEE ALSO
555
.IR sed (1),
556
.IR regexp (6),
557
.br
558
A. V. Aho, B. W. Kernighan, P. J. Weinberger,
559
.I
560
The AWK Programming Language,
561
Addison-Wesley, 1988.  ISBN 0-201-07981-X
562
.SH BUGS
563
There are no explicit conversions between numbers and strings.
564
To force an expression to be treated as a number add 0 to it;
565
to force it to be treated as a string concatenate
566
\&\fL""\fP to it.
567
.br
568
The scope rules for variables in functions are a botch;
569
the syntax is worse.
570
.br
571
UTF is not always dealt with correctly,
572
though
573
.I awk
574
does make an attempt to do so.
575
The
576
.I split
577
function with an empty string as final argument now copes
578
with UTF in the string being split.