2 |
- |
1 |
.HTML "How to Use the Plan 9 C Compiler
|
|
|
2 |
.TL
|
|
|
3 |
How to Use the Plan 9 C Compiler*
|
|
|
4 |
.AU
|
|
|
5 |
Rob Pike
|
|
|
6 |
rob@plan9.bell-labs.com
|
|
|
7 |
.SH
|
|
|
8 |
Introduction
|
|
|
9 |
.FS
|
|
|
10 |
* This paper has been revised to reflect the move to 21-bit Unicode.
|
|
|
11 |
.FE
|
|
|
12 |
.PP
|
|
|
13 |
The C compiler on Plan 9 is a wholly new program; in fact
|
|
|
14 |
it was the first piece of software written for what would
|
|
|
15 |
eventually become Plan 9 from Bell Labs.
|
|
|
16 |
Programmers familiar with existing C compilers will find
|
|
|
17 |
a number of differences in both the language the Plan 9 compiler
|
|
|
18 |
accepts and in how the compiler is used.
|
|
|
19 |
.PP
|
|
|
20 |
The compiler is really a set of compilers, one for each
|
|
|
21 |
architecture \(em MIPS, SPARC, Intel 386, Power PC, ARM, etc. \(em
|
|
|
22 |
that accept a dialect of ANSI C and efficiently produce
|
|
|
23 |
fairly good code for the target machine.
|
|
|
24 |
There is a packaging of the compiler that accepts strict ANSI C for
|
|
|
25 |
a POSIX environment, but this document focuses on the
|
|
|
26 |
native Plan 9 environment, that in which all the system source and
|
|
|
27 |
almost all the utilities are written.
|
|
|
28 |
.SH
|
|
|
29 |
Source
|
|
|
30 |
.PP
|
|
|
31 |
The language accepted by the compilers is the core 1989 ANSI C language
|
|
|
32 |
with some modest extensions,
|
|
|
33 |
a greatly simplified preprocessor,
|
|
|
34 |
a smaller library that includes system calls and related facilities,
|
|
|
35 |
and a completely different structure for include files.
|
|
|
36 |
.PP
|
|
|
37 |
Official ANSI C accepts the old (K&R) style of declarations for
|
|
|
38 |
functions; the Plan 9 compilers
|
|
|
39 |
are more demanding.
|
|
|
40 |
Without an explicit run-time flag
|
|
|
41 |
.CW -B ) (
|
|
|
42 |
whose use is discouraged, the compilers insist
|
|
|
43 |
on new-style function declarations, that is, prototypes for
|
|
|
44 |
function arguments.
|
|
|
45 |
The function declarations in the libraries' include files are
|
|
|
46 |
all in the new style so the interfaces are checked at compile time.
|
|
|
47 |
For C programmers who have not yet switched to function prototypes
|
|
|
48 |
the clumsy syntax may seem repellent but the payoff in stronger typing
|
|
|
49 |
is substantial.
|
|
|
50 |
Those who wish to import existing software to Plan 9 are urged
|
|
|
51 |
to use the opportunity to update their code.
|
|
|
52 |
.PP
|
|
|
53 |
The compilers include an integrated preprocessor that accepts the familiar
|
|
|
54 |
.CW #include ,
|
|
|
55 |
.CW #define
|
|
|
56 |
for macros both with and without arguments,
|
|
|
57 |
.CW #undef ,
|
|
|
58 |
.CW #line ,
|
|
|
59 |
.CW #ifdef ,
|
|
|
60 |
.CW #ifndef ,
|
|
|
61 |
and
|
|
|
62 |
.CW #endif .
|
|
|
63 |
It
|
|
|
64 |
supports neither
|
|
|
65 |
.CW #if
|
|
|
66 |
nor
|
|
|
67 |
.CW ## ,
|
|
|
68 |
although it does
|
|
|
69 |
honor a few
|
|
|
70 |
.CW #pragmas .
|
|
|
71 |
The
|
|
|
72 |
.CW #if
|
|
|
73 |
directive was omitted because it greatly complicates the
|
|
|
74 |
preprocessor, is never necessary, and is usually abused.
|
|
|
75 |
Conditional compilation in general makes code hard to understand;
|
|
|
76 |
the Plan 9 source uses it sparingly.
|
|
|
77 |
Also, because the compilers remove dead code, regular
|
|
|
78 |
.CW if
|
|
|
79 |
statements with constant conditions are more readable equivalents to many
|
|
|
80 |
.CW #ifs .
|
|
|
81 |
To compile imported code ineluctably fouled by
|
|
|
82 |
.CW #if
|
|
|
83 |
there is a separate command,
|
|
|
84 |
.CW /bin/cpp ,
|
|
|
85 |
that implements the complete ANSI C preprocessor specification.
|
|
|
86 |
.PP
|
|
|
87 |
Include files fall into two groups: machine-dependent and machine-independent.
|
|
|
88 |
The machine-independent files occupy the directory
|
|
|
89 |
.CW /sys/include ;
|
|
|
90 |
the others are placed in a directory appropriate to the machine, such as
|
|
|
91 |
.CW /mips/include .
|
|
|
92 |
The compiler searches for include files
|
|
|
93 |
first in the machine-dependent directory and then
|
|
|
94 |
in the machine-independent directory.
|
|
|
95 |
At the time of writing there are thirty-one machine-independent include
|
|
|
96 |
files and two (per machine) machine-dependent ones:
|
|
|
97 |
.CW <ureg.h>
|
|
|
98 |
and
|
|
|
99 |
.CW <u.h> .
|
|
|
100 |
The first describes the layout of registers on the system stack,
|
|
|
101 |
for use by the debugger.
|
|
|
102 |
The second defines some
|
|
|
103 |
architecture-dependent types such as
|
|
|
104 |
.CW jmp_buf
|
|
|
105 |
for
|
|
|
106 |
.CW setjmp
|
|
|
107 |
and the
|
|
|
108 |
.CW va_arg
|
|
|
109 |
and
|
|
|
110 |
.CW va_list
|
|
|
111 |
macros for handling arguments to variadic functions,
|
|
|
112 |
as well as a set of
|
|
|
113 |
.CW typedef
|
|
|
114 |
abbreviations for
|
|
|
115 |
.CW unsigned
|
|
|
116 |
.CW short
|
|
|
117 |
and so on.
|
|
|
118 |
.PP
|
|
|
119 |
Here is an excerpt from
|
|
|
120 |
.CW /386/include/u.h :
|
|
|
121 |
.P1
|
|
|
122 |
#define nil ((void*)0)
|
|
|
123 |
typedef unsigned short ushort;
|
|
|
124 |
typedef unsigned char uchar;
|
|
|
125 |
typedef unsigned long ulong;
|
|
|
126 |
typedef unsigned int uint;
|
|
|
127 |
typedef signed char schar;
|
|
|
128 |
typedef long long vlong;
|
|
|
129 |
|
|
|
130 |
typedef long jmp_buf[2];
|
|
|
131 |
#define JMPBUFSP 0
|
|
|
132 |
#define JMPBUFPC 1
|
|
|
133 |
#define JMPBUFDPC 0
|
|
|
134 |
.P2
|
|
|
135 |
Plan 9 programs use
|
|
|
136 |
.CW nil
|
|
|
137 |
for the name of the zero-valued pointer.
|
|
|
138 |
The type
|
|
|
139 |
.CW vlong
|
|
|
140 |
is the largest integer type available; on most architectures it
|
|
|
141 |
is a 64-bit value.
|
|
|
142 |
A couple of other types in
|
|
|
143 |
.CW <u.h>
|
|
|
144 |
are
|
|
|
145 |
.CW u32int ,
|
|
|
146 |
which is guaranteed to have exactly 32 bits (a possibility on all the supported architectures) and
|
|
|
147 |
.CW mpdigit ,
|
|
|
148 |
which is used by the multiprecision math package
|
|
|
149 |
.CW <mp.h> .
|
|
|
150 |
The
|
|
|
151 |
.CW #define
|
|
|
152 |
constants permit an architecture-independent (but compiler-dependent)
|
|
|
153 |
implementation of stack-switching using
|
|
|
154 |
.CW setjmp
|
|
|
155 |
and
|
|
|
156 |
.CW longjmp .
|
|
|
157 |
.PP
|
|
|
158 |
Every Plan 9 C program begins
|
|
|
159 |
.P1
|
|
|
160 |
#include <u.h>
|
|
|
161 |
.P2
|
|
|
162 |
because all the other installed header files use the
|
|
|
163 |
.CW typedefs
|
|
|
164 |
declared in
|
|
|
165 |
.CW <u.h> .
|
|
|
166 |
.PP
|
|
|
167 |
In strict ANSI C, include files are grouped to collect related functions
|
|
|
168 |
in a single file: one for string functions, one for memory functions,
|
|
|
169 |
one for I/O, and none for system calls.
|
|
|
170 |
Each include file is protected by an
|
|
|
171 |
.CW #ifdef
|
|
|
172 |
to guarantee its contents are seen by the compiler only once.
|
|
|
173 |
Plan 9 takes a different approach. Other than a few include
|
|
|
174 |
files that define external formats such as archives, the files in
|
|
|
175 |
.CW /sys/include
|
|
|
176 |
correspond to
|
|
|
177 |
.I libraries.
|
|
|
178 |
If a program is using a library, it includes the corresponding header.
|
|
|
179 |
The default C library comprises string functions, memory functions, and
|
|
|
180 |
so on, largely as in ANSI C, some formatted I/O routines,
|
|
|
181 |
plus all the system calls and related functions.
|
|
|
182 |
To use these functions, one must
|
|
|
183 |
.CW #include
|
|
|
184 |
the file
|
|
|
185 |
.CW <libc.h> ,
|
|
|
186 |
which in turn must follow
|
|
|
187 |
.CW <u.h> ,
|
|
|
188 |
to define their prototypes for the compiler.
|
|
|
189 |
Here is the complete source to the traditional first C program:
|
|
|
190 |
.P1
|
|
|
191 |
#include <u.h>
|
|
|
192 |
#include <libc.h>
|
|
|
193 |
|
|
|
194 |
void
|
|
|
195 |
main(void)
|
|
|
196 |
{
|
|
|
197 |
print("hello world\en");
|
|
|
198 |
exits(0);
|
|
|
199 |
}
|
|
|
200 |
.P2
|
|
|
201 |
The
|
|
|
202 |
.CW print
|
|
|
203 |
routine and its relatives
|
|
|
204 |
.CW fprint
|
|
|
205 |
and
|
|
|
206 |
.CW sprint
|
|
|
207 |
resemble the similarly-named functions in Standard I/O but are not
|
|
|
208 |
attached to a specific I/O library.
|
|
|
209 |
In Plan 9
|
|
|
210 |
.CW main
|
|
|
211 |
is not integer-valued; it should call
|
|
|
212 |
.CW exits ,
|
|
|
213 |
which takes a string argument (or null; here ANSI C promotes the 0 to a
|
|
|
214 |
.CW char* ).
|
|
|
215 |
All these functions are, of course, documented in the Programmer's Manual.
|
|
|
216 |
.PP
|
|
|
217 |
To use
|
|
|
218 |
.CW printf ,
|
|
|
219 |
.CW <stdio.h>
|
|
|
220 |
must be included to define the function prototype for
|
|
|
221 |
.CW printf :
|
|
|
222 |
.P1
|
|
|
223 |
#include <u.h>
|
|
|
224 |
#include <libc.h>
|
|
|
225 |
#include <stdio.h>
|
|
|
226 |
|
|
|
227 |
void
|
|
|
228 |
main(int argc, char *argv[])
|
|
|
229 |
{
|
|
|
230 |
printf("%s: hello world; argc = %d\en", argv[0], argc);
|
|
|
231 |
exits(0);
|
|
|
232 |
}
|
|
|
233 |
.P2
|
|
|
234 |
In practice, Standard I/O is not used much in Plan 9. I/O libraries are
|
|
|
235 |
discussed in a later section of this document.
|
|
|
236 |
.PP
|
|
|
237 |
There are libraries for handling regular expressions, raster graphics,
|
|
|
238 |
windows, and so on, and each has an associated include file.
|
|
|
239 |
The manual for each library states which include files are needed.
|
|
|
240 |
The files are not protected against multiple inclusion and themselves
|
|
|
241 |
contain no nested
|
|
|
242 |
.CW #includes .
|
|
|
243 |
Instead the
|
|
|
244 |
programmer is expected to sort out the requirements
|
|
|
245 |
and to
|
|
|
246 |
.CW #include
|
|
|
247 |
the necessary files once at the top of each source file. In practice this is
|
|
|
248 |
trivial: this way of handling include files is so straightforward
|
|
|
249 |
that it is rare for a source file to contain more than half a dozen
|
|
|
250 |
.CW #includes .
|
|
|
251 |
.PP
|
|
|
252 |
The compilers do their own register allocation so the
|
|
|
253 |
.CW register
|
|
|
254 |
keyword is ignored.
|
|
|
255 |
For different reasons,
|
|
|
256 |
.CW volatile
|
|
|
257 |
and
|
|
|
258 |
.CW const
|
|
|
259 |
are also ignored.
|
|
|
260 |
.PP
|
|
|
261 |
To make it easier to share code with other systems, Plan 9 has a version
|
|
|
262 |
of the compiler,
|
|
|
263 |
.CW pcc ,
|
|
|
264 |
that provides the standard ANSI C preprocessor, headers, and libraries
|
|
|
265 |
with POSIX extensions.
|
|
|
266 |
.CW Pcc
|
|
|
267 |
is recommended only
|
|
|
268 |
when broad external portability is mandated. It compiles slower,
|
|
|
269 |
produces slower code (it takes extra work to simulate POSIX on Plan 9),
|
|
|
270 |
eliminates those parts of the Plan 9 interface
|
|
|
271 |
not related to POSIX, and illustrates the clumsiness of an environment
|
|
|
272 |
designed by committee.
|
|
|
273 |
.CW Pcc
|
|
|
274 |
is described in more detail in
|
|
|
275 |
.I
|
|
|
276 |
APE\(emThe ANSI/POSIX Environment,
|
|
|
277 |
.R
|
|
|
278 |
by Howard Trickey.
|
|
|
279 |
.SH
|
|
|
280 |
Process
|
|
|
281 |
.PP
|
|
|
282 |
Each CPU architecture supported by Plan 9 is identified by a single,
|
|
|
283 |
arbitrary, alphanumeric character:
|
|
|
284 |
.CW k
|
|
|
285 |
for SPARC,
|
|
|
286 |
.CW q
|
|
|
287 |
for 32-bit Power PC,
|
|
|
288 |
.CW v
|
|
|
289 |
for MIPS,
|
|
|
290 |
.CW 0
|
|
|
291 |
for little-endian MIPS,
|
|
|
292 |
.CW 5
|
|
|
293 |
for ARM v5 and later 32-bit architectures,
|
|
|
294 |
.CW 6
|
|
|
295 |
for AMD64,
|
|
|
296 |
.CW 8
|
|
|
297 |
for Intel 386, and
|
|
|
298 |
.CW 9
|
|
|
299 |
for 64-bit Power PC.
|
|
|
300 |
The character labels the support tools and files for that architecture.
|
|
|
301 |
For instance, for the 386 the compiler is
|
|
|
302 |
.CW 8c ,
|
|
|
303 |
the assembler is
|
|
|
304 |
.CW 8a ,
|
|
|
305 |
the link editor/loader is
|
|
|
306 |
.CW 8l ,
|
|
|
307 |
the object files are suffixed
|
|
|
308 |
.CW \&.8 ,
|
|
|
309 |
and the default name for an executable file is
|
|
|
310 |
.CW 8.out .
|
|
|
311 |
Before we can use the compiler we therefore need to know which
|
|
|
312 |
machine we are compiling for.
|
|
|
313 |
The next section explains how this decision is made; for the moment
|
|
|
314 |
assume we are building 386 binaries and make the mental substitution for
|
|
|
315 |
.CW 8
|
|
|
316 |
appropriate to the machine you are actually using.
|
|
|
317 |
.PP
|
|
|
318 |
To convert source to an executable binary is a two-step process.
|
|
|
319 |
First run the compiler,
|
|
|
320 |
.CW 8c ,
|
|
|
321 |
on the source, say
|
|
|
322 |
.CW file.c ,
|
|
|
323 |
to generate an object file
|
|
|
324 |
.CW file.8 .
|
|
|
325 |
Then run the loader,
|
|
|
326 |
.CW 8l ,
|
|
|
327 |
to generate an executable
|
|
|
328 |
.CW 8.out
|
|
|
329 |
that may be run (on a 386 machine):
|
|
|
330 |
.P1
|
|
|
331 |
8c file.c
|
|
|
332 |
8l file.8
|
|
|
333 |
8.out
|
|
|
334 |
.P2
|
|
|
335 |
The loader automatically links with whatever libraries the program
|
|
|
336 |
needs, usually including the standard C library as defined by
|
|
|
337 |
.CW <libc.h> .
|
|
|
338 |
Of course the compiler and loader have lots of options, both familiar and new;
|
|
|
339 |
see the manual for details.
|
|
|
340 |
The compiler does not generate an executable automatically;
|
|
|
341 |
the output of the compiler must be given to the loader.
|
|
|
342 |
Since most compilation is done under the control of
|
|
|
343 |
.CW mk
|
|
|
344 |
(see below), this is rarely an inconvenience.
|
|
|
345 |
.PP
|
|
|
346 |
The distribution of work between the compiler and loader is unusual.
|
|
|
347 |
The compiler integrates preprocessing, parsing, register allocation,
|
|
|
348 |
code generation and some assembly.
|
|
|
349 |
Combining these tasks in a single program is part of the reason for
|
|
|
350 |
the compiler's efficiency.
|
|
|
351 |
The loader does instruction selection, branch folding,
|
|
|
352 |
instruction scheduling,
|
|
|
353 |
and writes the final executable.
|
|
|
354 |
There is no separate C preprocessor and no assembler in the usual pipeline.
|
|
|
355 |
Instead the intermediate object file
|
|
|
356 |
(here a
|
|
|
357 |
.CW \&.8
|
|
|
358 |
file) is a type of binary assembly language.
|
|
|
359 |
The instructions in the intermediate format are not exactly those in
|
|
|
360 |
the machine. For example, on the 68020 the object file may specify
|
|
|
361 |
a MOVE instruction but the loader will decide just which variant of
|
|
|
362 |
the MOVE instruction \(em MOVE immediate, MOVE quick, MOVE address,
|
|
|
363 |
etc. \(em is most efficient.
|
|
|
364 |
.PP
|
|
|
365 |
The assembler,
|
|
|
366 |
.CW 8a ,
|
|
|
367 |
is just a translator between the textual and binary
|
|
|
368 |
representations of the object file format.
|
|
|
369 |
It is not an assembler in the traditional sense. It has limited
|
|
|
370 |
macro capabilities (the same as the integral C preprocessor in the compiler),
|
|
|
371 |
clumsy syntax, and minimal error checking. For instance, the assembler
|
|
|
372 |
will accept an instruction (such as memory-to-memory MOVE on the MIPS) that the
|
|
|
373 |
machine does not actually support; only when the output of the assembler
|
|
|
374 |
is passed to the loader will the error be discovered.
|
|
|
375 |
The assembler is intended only for writing things that need access to instructions
|
|
|
376 |
invisible from C,
|
|
|
377 |
such as the machine-dependent
|
|
|
378 |
part of an operating system;
|
|
|
379 |
very little code in Plan 9 is in assembly language.
|
|
|
380 |
.PP
|
|
|
381 |
The compilers take an option
|
|
|
382 |
.CW -S
|
|
|
383 |
that causes them to print on their standard output the generated code
|
|
|
384 |
in a format acceptable as input to the assemblers.
|
|
|
385 |
This is of course merely a formatting of the
|
|
|
386 |
data in the object file; therefore the assembler is just
|
|
|
387 |
an
|
|
|
388 |
ASCII-to-binary converter for this format.
|
|
|
389 |
Other than the specific instructions, the input to the assemblers
|
|
|
390 |
is largely architecture-independent; see
|
|
|
391 |
``A Manual for the Plan 9 Assembler'',
|
|
|
392 |
by Rob Pike,
|
|
|
393 |
for more information.
|
|
|
394 |
.PP
|
|
|
395 |
The loader is an integral part of the compilation process.
|
|
|
396 |
Each library header file contains a
|
|
|
397 |
.CW #pragma
|
|
|
398 |
that tells the loader the name of the associated archive; it is
|
|
|
399 |
not necessary to tell the loader which libraries a program uses.
|
|
|
400 |
The C run-time startup is found, by default, in the C library.
|
|
|
401 |
The loader starts with an undefined
|
|
|
402 |
symbol,
|
|
|
403 |
.CW _main ,
|
|
|
404 |
that is resolved by pulling in the run-time startup code from the library.
|
|
|
405 |
(The loader undefines
|
|
|
406 |
.CW _mainp
|
|
|
407 |
when profiling is enabled, to force loading of the profiling start-up
|
|
|
408 |
instead.)
|
|
|
409 |
.PP
|
|
|
410 |
Unlike its counterpart on other systems, the Plan 9 loader rearranges
|
|
|
411 |
data to optimize access. This means the order of variables in the
|
|
|
412 |
loaded program is unrelated to its order in the source.
|
|
|
413 |
Most programs don't care, but some assume that, for example, the
|
|
|
414 |
variables declared by
|
|
|
415 |
.P1
|
|
|
416 |
int a;
|
|
|
417 |
int b;
|
|
|
418 |
.P2
|
|
|
419 |
will appear at adjacent addresses in memory. On Plan 9, they won't.
|
|
|
420 |
.SH
|
|
|
421 |
Heterogeneity
|
|
|
422 |
.PP
|
|
|
423 |
When the system starts or a user logs in the environment is configured
|
|
|
424 |
so the appropriate binaries are available in
|
|
|
425 |
.CW /bin .
|
|
|
426 |
The configuration process is controlled by an environment variable,
|
|
|
427 |
.CW $cputype ,
|
|
|
428 |
with value such as
|
|
|
429 |
.CW mips ,
|
|
|
430 |
.CW 386 ,
|
|
|
431 |
.CW arm ,
|
|
|
432 |
or
|
|
|
433 |
.CW sparc .
|
|
|
434 |
For each architecture there is a directory in the root,
|
|
|
435 |
with the appropriate name,
|
|
|
436 |
that holds the binary and library files for that architecture.
|
|
|
437 |
Thus
|
|
|
438 |
.CW /mips/lib
|
|
|
439 |
contains the object code libraries for MIPS programs,
|
|
|
440 |
.CW /mips/include
|
|
|
441 |
holds MIPS-specific include files, and
|
|
|
442 |
.CW /mips/bin
|
|
|
443 |
has the MIPS binaries.
|
|
|
444 |
These binaries are attached to
|
|
|
445 |
.CW /bin
|
|
|
446 |
at boot time by binding
|
|
|
447 |
.CW /$cputype/bin
|
|
|
448 |
to
|
|
|
449 |
.CW /bin ,
|
|
|
450 |
so
|
|
|
451 |
.CW /bin
|
|
|
452 |
always contains the correct files.
|
|
|
453 |
.PP
|
|
|
454 |
The MIPS compiler,
|
|
|
455 |
.CW vc ,
|
|
|
456 |
by definition
|
|
|
457 |
produces object files for the MIPS architecture,
|
|
|
458 |
regardless of the architecture of the machine on which the compiler is running.
|
|
|
459 |
There is a version of
|
|
|
460 |
.CW vc
|
|
|
461 |
compiled for each architecture:
|
|
|
462 |
.CW /mips/bin/vc ,
|
|
|
463 |
.CW /arm/bin/vc ,
|
|
|
464 |
.CW /sparc/bin/vc ,
|
|
|
465 |
and so on,
|
|
|
466 |
each capable of producing MIPS object files regardless of the native
|
|
|
467 |
instruction set.
|
|
|
468 |
If one is running on a SPARC,
|
|
|
469 |
.CW /sparc/bin/vc
|
|
|
470 |
will compile programs for the MIPS;
|
|
|
471 |
if one is running on machine
|
|
|
472 |
.CW $cputype ,
|
|
|
473 |
.CW /$cputype/bin/vc
|
|
|
474 |
will compile programs for the MIPS.
|
|
|
475 |
.PP
|
|
|
476 |
Because of the bindings that assemble
|
|
|
477 |
.CW /bin ,
|
|
|
478 |
the shell always looks for a command, say
|
|
|
479 |
.CW date ,
|
|
|
480 |
in
|
|
|
481 |
.CW /bin
|
|
|
482 |
and automatically finds the file
|
|
|
483 |
.CW /$cputype/bin/date .
|
|
|
484 |
Therefore the MIPS compiler is known as just
|
|
|
485 |
.CW vc ;
|
|
|
486 |
the shell will invoke
|
|
|
487 |
.CW /bin/vc
|
|
|
488 |
and that is guaranteed to be the version of the MIPS compiler
|
|
|
489 |
appropriate for the machine running the command.
|
|
|
490 |
Regardless of the architecture of the compiling machine,
|
|
|
491 |
.CW /bin/vc
|
|
|
492 |
is
|
|
|
493 |
.I always
|
|
|
494 |
the MIPS compiler.
|
|
|
495 |
.PP
|
|
|
496 |
Also, the output of
|
|
|
497 |
.CW vc
|
|
|
498 |
and
|
|
|
499 |
.CW vl
|
|
|
500 |
is completely independent of the machine type on which they are executed:
|
|
|
501 |
.CW \&.v
|
|
|
502 |
files compiled (with
|
|
|
503 |
.CW vc )
|
|
|
504 |
on a SPARC may be linked (with
|
|
|
505 |
.CW vl )
|
|
|
506 |
on a 386.
|
|
|
507 |
(The resulting
|
|
|
508 |
.CW v.out
|
|
|
509 |
will run, of course, only on a MIPS.)
|
|
|
510 |
Similarly, the MIPS libraries in
|
|
|
511 |
.CW /mips/lib
|
|
|
512 |
are suitable for loading with
|
|
|
513 |
.CW vl
|
|
|
514 |
on any machine; there is only one set of MIPS libraries, not one
|
|
|
515 |
set for each architecture that supports the MIPS compiler.
|
|
|
516 |
.SH
|
|
|
517 |
Heterogeneity and \f(CWmk\fP
|
|
|
518 |
.PP
|
|
|
519 |
Most software on Plan 9 is compiled under the control of
|
|
|
520 |
.CW mk ,
|
|
|
521 |
a descendant of
|
|
|
522 |
.CW make
|
|
|
523 |
that is documented in the Programmer's Manual.
|
|
|
524 |
A convention used throughout the
|
|
|
525 |
.CW mkfiles
|
|
|
526 |
makes it easy to compile the source into binary suitable for any architecture.
|
|
|
527 |
.PP
|
|
|
528 |
The variable
|
|
|
529 |
.CW $cputype
|
|
|
530 |
is advisory: it reports the architecture of the current environment, and should
|
|
|
531 |
not be modified. A second variable,
|
|
|
532 |
.CW $objtype ,
|
|
|
533 |
is used to set which architecture is being
|
|
|
534 |
.I compiled
|
|
|
535 |
for.
|
|
|
536 |
The value of
|
|
|
537 |
.CW $objtype
|
|
|
538 |
can be used by a
|
|
|
539 |
.CW mkfile
|
|
|
540 |
to configure the compilation environment.
|
|
|
541 |
.PP
|
|
|
542 |
In each machine's root directory there is a short
|
|
|
543 |
.CW mkfile
|
|
|
544 |
that defines a set of macros for the compiler, loader, etc.
|
|
|
545 |
Here is
|
|
|
546 |
.CW /mips/mkfile :
|
|
|
547 |
.P1
|
|
|
548 |
</sys/src/mkfile.proto
|
|
|
549 |
|
|
|
550 |
CC=vc
|
|
|
551 |
LD=vl
|
|
|
552 |
O=v
|
|
|
553 |
AS=va
|
|
|
554 |
.P2
|
|
|
555 |
The line
|
|
|
556 |
.P1
|
|
|
557 |
</sys/src/mkfile.proto
|
|
|
558 |
.P2
|
|
|
559 |
causes
|
|
|
560 |
.CW mk
|
|
|
561 |
to include the file
|
|
|
562 |
.CW /sys/src/mkfile.proto ,
|
|
|
563 |
which contains general definitions:
|
|
|
564 |
.P1
|
|
|
565 |
#
|
|
|
566 |
# common mkfile parameters shared by all architectures
|
|
|
567 |
#
|
|
|
568 |
|
|
|
569 |
OS=5689qv
|
|
|
570 |
CPUS=arm amd64 386 power mips
|
|
|
571 |
CFLAGS=-FTVw
|
|
|
572 |
LEX=lex
|
|
|
573 |
YACC=yacc
|
|
|
574 |
MK=/bin/mk
|
|
|
575 |
.P2
|
|
|
576 |
.CW CC
|
|
|
577 |
is obviously the compiler,
|
|
|
578 |
.CW AS
|
|
|
579 |
the assembler, and
|
|
|
580 |
.CW LD
|
|
|
581 |
the loader.
|
|
|
582 |
.CW O
|
|
|
583 |
is the suffix for the object files and
|
|
|
584 |
.CW CPUS
|
|
|
585 |
and
|
|
|
586 |
.CW OS
|
|
|
587 |
are used in special rules described below.
|
|
|
588 |
.PP
|
|
|
589 |
Here is a
|
|
|
590 |
.CW mkfile
|
|
|
591 |
to build the installed source for
|
|
|
592 |
.CW sam :
|
|
|
593 |
.P1
|
|
|
594 |
</$objtype/mkfile
|
|
|
595 |
OBJ=sam.$O address.$O buffer.$O cmd.$O disc.$O error.$O \e
|
|
|
596 |
file.$O io.$O list.$O mesg.$O moveto.$O multi.$O \e
|
|
|
597 |
plan9.$O rasp.$O regexp.$O string.$O sys.$O xec.$O
|
|
|
598 |
|
|
|
599 |
$O.out: $OBJ
|
|
|
600 |
$LD $OBJ
|
|
|
601 |
|
|
|
602 |
install: $O.out
|
|
|
603 |
cp $O.out /$objtype/bin/sam
|
|
|
604 |
|
|
|
605 |
installall:
|
|
|
606 |
for(objtype in $CPUS) mk install
|
|
|
607 |
|
|
|
608 |
%.$O: %.c
|
|
|
609 |
$CC $CFLAGS $stem.c
|
|
|
610 |
|
|
|
611 |
$OBJ: sam.h errors.h mesg.h
|
|
|
612 |
address.$O cmd.$O parse.$O xec.$O unix.$O: parse.h
|
|
|
613 |
|
|
|
614 |
clean:V:
|
|
|
615 |
rm -f [$OS].out *.[$OS] y.tab.?
|
|
|
616 |
.P2
|
|
|
617 |
(The actual
|
|
|
618 |
.CW mkfile
|
|
|
619 |
imports most of its rules from other secondary files, but
|
|
|
620 |
this example works and is not misleading.)
|
|
|
621 |
The first line causes
|
|
|
622 |
.CW mk
|
|
|
623 |
to include the contents of
|
|
|
624 |
.CW /$objtype/mkfile
|
|
|
625 |
in the current
|
|
|
626 |
.CW mkfile .
|
|
|
627 |
If
|
|
|
628 |
.CW $objtype
|
|
|
629 |
is
|
|
|
630 |
.CW mips ,
|
|
|
631 |
this inserts the MIPS macro definitions into the
|
|
|
632 |
.CW mkfile .
|
|
|
633 |
In this case the rule for
|
|
|
634 |
.CW $O.out
|
|
|
635 |
uses the MIPS tools to build
|
|
|
636 |
.CW v.out .
|
|
|
637 |
The
|
|
|
638 |
.CW %.$O
|
|
|
639 |
rule in the file uses
|
|
|
640 |
.CW mk 's
|
|
|
641 |
pattern matching facilities to convert the source files to the object
|
|
|
642 |
files through the compiler.
|
|
|
643 |
(The text of the rules is passed directly to the shell,
|
|
|
644 |
.CW rc ,
|
|
|
645 |
without further translation.
|
|
|
646 |
See the
|
|
|
647 |
.CW mk
|
|
|
648 |
manual if any of this is unfamiliar.)
|
|
|
649 |
Because the default rule builds
|
|
|
650 |
.CW $O.out
|
|
|
651 |
rather than
|
|
|
652 |
.CW sam ,
|
|
|
653 |
it is possible to maintain binaries for multiple machines in the
|
|
|
654 |
same source directory without conflict.
|
|
|
655 |
This is also, of course, why the output files from the various
|
|
|
656 |
compilers and loaders
|
|
|
657 |
have distinct names.
|
|
|
658 |
.PP
|
|
|
659 |
The rest of the
|
|
|
660 |
.CW mkfile
|
|
|
661 |
should be easy to follow; notice how the rules for
|
|
|
662 |
.CW clean
|
|
|
663 |
and
|
|
|
664 |
.CW installall
|
|
|
665 |
(that is, install versions for all architectures) use other macros
|
|
|
666 |
defined in
|
|
|
667 |
.CW /$objtype/mkfile .
|
|
|
668 |
In Plan 9,
|
|
|
669 |
.CW mkfiles
|
|
|
670 |
for commands conventionally contain rules to
|
|
|
671 |
.CW install
|
|
|
672 |
(compile and install the version for
|
|
|
673 |
.CW $objtype ),
|
|
|
674 |
.CW installall
|
|
|
675 |
(compile and install for all
|
|
|
676 |
.CW $objtypes ),
|
|
|
677 |
and
|
|
|
678 |
.CW clean
|
|
|
679 |
(remove all object files, binaries, etc.).
|
|
|
680 |
.PP
|
|
|
681 |
The
|
|
|
682 |
.CW mkfile
|
|
|
683 |
is easy to use. To build a MIPS binary,
|
|
|
684 |
.CW v.out :
|
|
|
685 |
.P1
|
|
|
686 |
% objtype=mips
|
|
|
687 |
% mk
|
|
|
688 |
.P2
|
|
|
689 |
To build and install a MIPS binary:
|
|
|
690 |
.P1
|
|
|
691 |
% objtype=mips
|
|
|
692 |
% mk install
|
|
|
693 |
.P2
|
|
|
694 |
To build and install all versions:
|
|
|
695 |
.P1
|
|
|
696 |
% mk installall
|
|
|
697 |
.P2
|
|
|
698 |
These conventions make cross-compilation as easy to manage
|
|
|
699 |
as traditional native compilation.
|
|
|
700 |
Plan 9 programs compile and run without change on machines from
|
|
|
701 |
large multiprocessors to laptops. For more information about this process, see
|
|
|
702 |
``Plan 9 Mkfiles'',
|
|
|
703 |
by Bob Flandrena.
|
|
|
704 |
.SH
|
|
|
705 |
Portability
|
|
|
706 |
.PP
|
|
|
707 |
Within Plan 9, it is painless to write portable programs, programs whose
|
|
|
708 |
source is independent of the machine on which they execute.
|
|
|
709 |
The operating system is fixed and the compiler, headers and libraries
|
|
|
710 |
are constant so most of the stumbling blocks to portability are removed.
|
|
|
711 |
Attention to a few details can avoid those that remain.
|
|
|
712 |
.PP
|
|
|
713 |
Plan 9 is a heterogeneous environment, so programs must
|
|
|
714 |
.I expect
|
|
|
715 |
that external files will be written by programs on machines of different
|
|
|
716 |
architectures.
|
|
|
717 |
The compilers, for instance, must handle without confusion
|
|
|
718 |
object files written by other machines.
|
|
|
719 |
The traditional approach to this problem is to pepper the source with
|
|
|
720 |
.CW #ifdefs
|
|
|
721 |
to turn byte-swapping on and off.
|
|
|
722 |
Plan 9 takes a different approach: of the handful of machine-dependent
|
|
|
723 |
.CW #ifdefs
|
|
|
724 |
in all the source, almost all are deep in the libraries.
|
|
|
725 |
Instead programs read and write files in a defined format,
|
|
|
726 |
either (for low volume applications) as formatted text, or
|
|
|
727 |
(for high volume applications) as binary in a known byte order.
|
|
|
728 |
If the external data were written with the most significant
|
|
|
729 |
byte first, the following code reads a 4-byte integer correctly
|
|
|
730 |
regardless of the architecture of the executing machine (assuming
|
|
|
731 |
an unsigned long holds 4 bytes):
|
|
|
732 |
.P1
|
|
|
733 |
ulong
|
|
|
734 |
getlong(void)
|
|
|
735 |
{
|
|
|
736 |
ulong l;
|
|
|
737 |
|
|
|
738 |
l = (getchar()&0xFF)<<24;
|
|
|
739 |
l |= (getchar()&0xFF)<<16;
|
|
|
740 |
l |= (getchar()&0xFF)<<8;
|
|
|
741 |
l |= (getchar()&0xFF)<<0;
|
|
|
742 |
return l;
|
|
|
743 |
}
|
|
|
744 |
.P2
|
|
|
745 |
Note that this code does not `swap' the bytes; instead it just reads
|
|
|
746 |
them in the correct order.
|
|
|
747 |
Variations of this code will handle any binary format
|
|
|
748 |
and also avoid problems
|
|
|
749 |
involving how structures are padded, how words are aligned,
|
|
|
750 |
and other impediments to portability.
|
|
|
751 |
Be aware, though, that extra care is needed to handle floating point data.
|
|
|
752 |
.PP
|
|
|
753 |
Efficiency hounds will argue that this method is unnecessarily slow and clumsy
|
|
|
754 |
when the executing machine has the same byte order (and padding and alignment)
|
|
|
755 |
as the data.
|
|
|
756 |
The CPU cost of I/O processing
|
|
|
757 |
is rarely the bottleneck for an application, however,
|
|
|
758 |
and the gain in simplicity of porting and maintaining the code greatly outweighs
|
|
|
759 |
the minor speed loss from handling data in this general way.
|
|
|
760 |
This method is how the Plan 9 compilers, the window system, and even the file
|
|
|
761 |
servers transmit data between programs.
|
|
|
762 |
.PP
|
|
|
763 |
To port programs beyond Plan 9, where the system interface is more variable,
|
|
|
764 |
it is probably necessary to use
|
|
|
765 |
.CW pcc
|
|
|
766 |
and hope that the target machine supports ANSI C and POSIX.
|
|
|
767 |
.SH
|
|
|
768 |
I/O
|
|
|
769 |
.PP
|
|
|
770 |
The default C library, defined by the include file
|
|
|
771 |
.CW <libc.h> ,
|
|
|
772 |
contains no buffered I/O package.
|
|
|
773 |
It does have several entry points for printing formatted text:
|
|
|
774 |
.CW print
|
|
|
775 |
outputs text to the standard output,
|
|
|
776 |
.CW fprint
|
|
|
777 |
outputs text to a specified integer file descriptor, and
|
|
|
778 |
.CW sprint
|
|
|
779 |
places text in a character array.
|
|
|
780 |
To access library routines for buffered I/O, a program must
|
|
|
781 |
explicitly include the header file associated with an appropriate library.
|
|
|
782 |
.PP
|
|
|
783 |
The recommended I/O library, used by most Plan 9 utilities, is
|
|
|
784 |
.CW bio
|
|
|
785 |
(buffered I/O), defined by
|
|
|
786 |
.CW <bio.h> .
|
|
|
787 |
There also exists an implementation of ANSI Standard I/O,
|
|
|
788 |
.CW stdio .
|
|
|
789 |
.PP
|
|
|
790 |
.CW Bio
|
|
|
791 |
is small and efficient, particularly for buffer-at-a-time or
|
|
|
792 |
line-at-a-time I/O.
|
|
|
793 |
Even for character-at-a-time I/O, however, it is significantly faster than
|
|
|
794 |
the Standard I/O library,
|
|
|
795 |
.CW stdio .
|
|
|
796 |
Its interface is compact and regular, although it lacks a few conveniences.
|
|
|
797 |
The most noticeable is that one must explicitly define buffers for standard
|
|
|
798 |
input and output;
|
|
|
799 |
.CW bio
|
|
|
800 |
does not predefine them. Here is a program to copy input to output a byte
|
|
|
801 |
at a time using
|
|
|
802 |
.CW bio :
|
|
|
803 |
.P1
|
|
|
804 |
#include <u.h>
|
|
|
805 |
#include <libc.h>
|
|
|
806 |
#include <bio.h>
|
|
|
807 |
|
|
|
808 |
Biobuf bin;
|
|
|
809 |
Biobuf bout;
|
|
|
810 |
|
|
|
811 |
main(void)
|
|
|
812 |
{
|
|
|
813 |
int c;
|
|
|
814 |
|
|
|
815 |
Binit(&bin, 0, OREAD);
|
|
|
816 |
Binit(&bout, 1, OWRITE);
|
|
|
817 |
|
|
|
818 |
while((c=Bgetc(&bin)) != Beof)
|
|
|
819 |
Bputc(&bout, c);
|
|
|
820 |
exits(0);
|
|
|
821 |
}
|
|
|
822 |
.P2
|
|
|
823 |
For peak performance, we could replace
|
|
|
824 |
.CW Bgetc
|
|
|
825 |
and
|
|
|
826 |
.CW Bputc
|
|
|
827 |
by their equivalent in-line macros
|
|
|
828 |
.CW BGETC
|
|
|
829 |
and
|
|
|
830 |
.CW BPUTC
|
|
|
831 |
but
|
|
|
832 |
the performance gain would be modest.
|
|
|
833 |
For more information on
|
|
|
834 |
.CW bio ,
|
|
|
835 |
see the Programmer's Manual.
|
|
|
836 |
.PP
|
|
|
837 |
Perhaps the most dramatic difference in the I/O interface of Plan 9 from other
|
|
|
838 |
systems' is that text is not ASCII.
|
|
|
839 |
The format for
|
|
|
840 |
text in Plan 9 is a byte-stream encoding of 21-bit characters.
|
|
|
841 |
The character set is based on the Unicode Standard and is backward compatible with
|
|
|
842 |
ASCII:
|
|
|
843 |
characters with value 0 through 127 are the same in both sets.
|
|
|
844 |
The 21-bit characters, called
|
|
|
845 |
.I runes
|
|
|
846 |
in Plan 9, are encoded using a representation called
|
|
|
847 |
UTF,
|
|
|
848 |
an encoding that is becoming accepted as a standard.
|
|
|
849 |
(ISO calls it UTF-8;
|
|
|
850 |
throughout Plan 9 it's just called
|
|
|
851 |
UTF.)
|
|
|
852 |
UTF
|
|
|
853 |
defines multibyte sequences to
|
|
|
854 |
represent character values from 0 to 1,114,111.
|
|
|
855 |
In
|
|
|
856 |
UTF,
|
|
|
857 |
character values up to 127 decimal, 7F hexadecimal, represent themselves,
|
|
|
858 |
so straight
|
|
|
859 |
ASCII
|
|
|
860 |
files are also valid
|
|
|
861 |
UTF.
|
|
|
862 |
Also,
|
|
|
863 |
UTF
|
|
|
864 |
guarantees that bytes with values 0 to 127 (NUL to DEL, inclusive)
|
|
|
865 |
will appear only when they represent themselves, so programs that read bytes
|
|
|
866 |
looking for plain ASCII characters will continue to work.
|
|
|
867 |
Any program that expects a one-to-one correspondence between bytes and
|
|
|
868 |
characters will, however, need to be modified.
|
|
|
869 |
An example is parsing file names.
|
|
|
870 |
File names, like all text, are in
|
|
|
871 |
UTF,
|
|
|
872 |
so it is incorrect to search for a character in a string by
|
|
|
873 |
.CW strchr(filename,
|
|
|
874 |
.CW c)
|
|
|
875 |
because the character might have a multi-byte encoding.
|
|
|
876 |
The correct method is to call
|
|
|
877 |
.CW utfrune(filename,
|
|
|
878 |
.CW c) ,
|
|
|
879 |
defined in
|
|
|
880 |
.I rune (2),
|
|
|
881 |
which interprets the file name as a sequence of encoded characters
|
|
|
882 |
rather than bytes.
|
|
|
883 |
In fact, even when you know the character is a single byte
|
|
|
884 |
that can represent only itself,
|
|
|
885 |
it is safer to use
|
|
|
886 |
.CW utfrune
|
|
|
887 |
because that assumes nothing about the character set
|
|
|
888 |
and its representation.
|
|
|
889 |
.PP
|
|
|
890 |
The library defines several symbols relevant to the representation of characters.
|
|
|
891 |
Any byte with unsigned value less than
|
|
|
892 |
.CW Runesync
|
|
|
893 |
will not appear in any multi-byte encoding of a character.
|
|
|
894 |
.CW Utfrune
|
|
|
895 |
compares the character being searched against
|
|
|
896 |
.CW Runesync
|
|
|
897 |
to see if it is sufficient to call
|
|
|
898 |
.CW strchr
|
|
|
899 |
or if the byte stream must be interpreted.
|
|
|
900 |
Any byte with unsigned value less than
|
|
|
901 |
.CW Runeself
|
|
|
902 |
is represented by a single byte with the same value.
|
|
|
903 |
Finally, when errors are encountered converting
|
|
|
904 |
to runes from a byte stream, the library returns the rune value
|
|
|
905 |
.CW Runeerror
|
|
|
906 |
and advances a single byte. This permits programs to find runes
|
|
|
907 |
embedded in binary data.
|
|
|
908 |
.PP
|
|
|
909 |
.CW Bio
|
|
|
910 |
includes routines
|
|
|
911 |
.CW Bgetrune
|
|
|
912 |
and
|
|
|
913 |
.CW Bputrune
|
|
|
914 |
to transform the external byte stream
|
|
|
915 |
UTF
|
|
|
916 |
format to and from
|
|
|
917 |
internal 21-bit runes.
|
|
|
918 |
Also, the
|
|
|
919 |
.CW %s
|
|
|
920 |
format to
|
|
|
921 |
.CW print
|
|
|
922 |
accepts
|
|
|
923 |
UTF;
|
|
|
924 |
.CW %c
|
|
|
925 |
prints a character after narrowing it to 8 bits.
|
|
|
926 |
The
|
|
|
927 |
.CW %S
|
|
|
928 |
format prints a null-terminated sequence of runes;
|
|
|
929 |
.CW %C
|
|
|
930 |
prints a character after narrowing it to 21 bits.
|
|
|
931 |
For more information, see the Programmer's Manual, in particular
|
|
|
932 |
.I utf (6)
|
|
|
933 |
and
|
|
|
934 |
.I rune (2),
|
|
|
935 |
and the paper,
|
|
|
936 |
``Hello world, or
|
|
|
937 |
Καλημέρα κόσμε, or\
|
|
|
938 |
\f(Jpこんにちは 世界\f1'',
|
|
|
939 |
by Rob Pike and
|
|
|
940 |
Ken Thompson;
|
|
|
941 |
there is not room for the full story here.
|
|
|
942 |
.PP
|
|
|
943 |
These issues affect the compiler in several ways.
|
|
|
944 |
First, the C source is in
|
|
|
945 |
UTF.
|
|
|
946 |
ANSI says C variables are formed from
|
|
|
947 |
ASCII
|
|
|
948 |
alphanumerics, but comments and literal strings may contain any characters
|
|
|
949 |
encoded in the native encoding, here
|
|
|
950 |
UTF.
|
|
|
951 |
The declaration
|
|
|
952 |
.P1
|
|
|
953 |
char *cp = "abcÿ";
|
|
|
954 |
.P2
|
|
|
955 |
initializes the variable
|
|
|
956 |
.CW cp
|
|
|
957 |
to point to an array of bytes holding the
|
|
|
958 |
UTF
|
|
|
959 |
representation of the characters
|
|
|
960 |
.CW abcÿ.
|
|
|
961 |
The type
|
|
|
962 |
.CW Rune
|
|
|
963 |
is defined in
|
|
|
964 |
.CW <u.h>
|
|
|
965 |
to be
|
|
|
966 |
.CW ushort ,
|
|
|
967 |
which is also the `wide character' type in the compiler.
|
|
|
968 |
Therefore the declaration
|
|
|
969 |
.P1
|
|
|
970 |
Rune *rp = L"abcÿ";
|
|
|
971 |
.P2
|
|
|
972 |
initializes the variable
|
|
|
973 |
.CW rp
|
|
|
974 |
to point to an array of unsigned long integers holding the 21-bit
|
|
|
975 |
values of the characters
|
|
|
976 |
.CW abcÿ .
|
|
|
977 |
Note that in both these declarations the characters in the source
|
|
|
978 |
that represent
|
|
|
979 |
.CW "abcÿ"
|
|
|
980 |
are the same; what changes is how those characters are represented
|
|
|
981 |
in memory in the program.
|
|
|
982 |
The following two lines:
|
|
|
983 |
.P1
|
|
|
984 |
print("%s\en", "abcÿ");
|
|
|
985 |
print("%S\en", L"abcÿ");
|
|
|
986 |
.P2
|
|
|
987 |
produce the same
|
|
|
988 |
UTF
|
|
|
989 |
string on their output, the first by copying the bytes, the second
|
|
|
990 |
by converting from runes to bytes.
|
|
|
991 |
.PP
|
|
|
992 |
In C, character constants are integers but narrowed through the
|
|
|
993 |
.CW char
|
|
|
994 |
type.
|
|
|
995 |
The Unicode character
|
|
|
996 |
.CW ÿ
|
|
|
997 |
has value 255, so if the
|
|
|
998 |
.CW char
|
|
|
999 |
type is signed,
|
|
|
1000 |
the constant
|
|
|
1001 |
.CW 'ÿ'
|
|
|
1002 |
has value \-1 (which is equal to EOF).
|
|
|
1003 |
On the other hand,
|
|
|
1004 |
.CW L'ÿ'
|
|
|
1005 |
narrows through the wide character type,
|
|
|
1006 |
.CW ushort ,
|
|
|
1007 |
and therefore has value 255.
|
|
|
1008 |
.PP
|
|
|
1009 |
Finally, although it's not ANSI C, the Plan 9 C compilers
|
|
|
1010 |
assume any character with value above
|
|
|
1011 |
.CW Runeself
|
|
|
1012 |
is an alphanumeric,
|
|
|
1013 |
so α is a legal, if non-portable, variable name.
|
|
|
1014 |
.SH
|
|
|
1015 |
Arguments
|
|
|
1016 |
.PP
|
|
|
1017 |
Some macros are defined
|
|
|
1018 |
in
|
|
|
1019 |
.CW <libc.h>
|
|
|
1020 |
for parsing the arguments to
|
|
|
1021 |
.CW main() .
|
|
|
1022 |
They are described in
|
|
|
1023 |
.I ARG (2)
|
|
|
1024 |
but are fairly self-explanatory.
|
|
|
1025 |
There are four macros:
|
|
|
1026 |
.CW ARGBEGIN
|
|
|
1027 |
and
|
|
|
1028 |
.CW ARGEND
|
|
|
1029 |
are used to bracket a hidden
|
|
|
1030 |
.CW switch
|
|
|
1031 |
statement within which
|
|
|
1032 |
.CW ARGC
|
|
|
1033 |
returns the current option character (rune) being processed and
|
|
|
1034 |
.CW ARGF
|
|
|
1035 |
returns the argument to the option, as in the loader option
|
|
|
1036 |
.CW -o
|
|
|
1037 |
.CW file .
|
|
|
1038 |
Here, for example, is the code at the beginning of
|
|
|
1039 |
.CW main()
|
|
|
1040 |
in
|
|
|
1041 |
.CW ramfs.c
|
|
|
1042 |
(see
|
|
|
1043 |
.I ramfs (1))
|
|
|
1044 |
that cracks its arguments:
|
|
|
1045 |
.P1
|
|
|
1046 |
void
|
|
|
1047 |
main(int argc, char *argv[])
|
|
|
1048 |
{
|
|
|
1049 |
char *defmnt;
|
|
|
1050 |
int p[2];
|
|
|
1051 |
int mfd[2];
|
|
|
1052 |
int stdio = 0;
|
|
|
1053 |
|
|
|
1054 |
defmnt = "/tmp";
|
|
|
1055 |
ARGBEGIN{
|
|
|
1056 |
case 'i':
|
|
|
1057 |
defmnt = 0;
|
|
|
1058 |
stdio = 1;
|
|
|
1059 |
mfd[0] = 0;
|
|
|
1060 |
mfd[1] = 1;
|
|
|
1061 |
break;
|
|
|
1062 |
case 's':
|
|
|
1063 |
defmnt = 0;
|
|
|
1064 |
break;
|
|
|
1065 |
case 'm':
|
|
|
1066 |
defmnt = ARGF();
|
|
|
1067 |
break;
|
|
|
1068 |
default:
|
|
|
1069 |
usage();
|
|
|
1070 |
}ARGEND
|
|
|
1071 |
.P2
|
|
|
1072 |
.SH
|
|
|
1073 |
Extensions
|
|
|
1074 |
.PP
|
|
|
1075 |
The compiler has several extensions to 1989 ANSI C, all of which are used
|
|
|
1076 |
extensively in the system source.
|
|
|
1077 |
Some of these have been adopted in later ANSI C standards.
|
|
|
1078 |
First,
|
|
|
1079 |
.I structure
|
|
|
1080 |
.I displays
|
|
|
1081 |
permit
|
|
|
1082 |
.CW struct
|
|
|
1083 |
expressions to be formed dynamically.
|
|
|
1084 |
Given these declarations:
|
|
|
1085 |
.P1
|
|
|
1086 |
typedef struct Point Point;
|
|
|
1087 |
typedef struct Rectangle Rectangle;
|
|
|
1088 |
|
|
|
1089 |
struct Point
|
|
|
1090 |
{
|
|
|
1091 |
int x, y;
|
|
|
1092 |
};
|
|
|
1093 |
|
|
|
1094 |
struct Rectangle
|
|
|
1095 |
{
|
|
|
1096 |
Point min, max;
|
|
|
1097 |
};
|
|
|
1098 |
|
|
|
1099 |
Point p, q, add(Point, Point);
|
|
|
1100 |
Rectangle r;
|
|
|
1101 |
int x, y;
|
|
|
1102 |
.P2
|
|
|
1103 |
this assignment may appear anywhere an assignment is legal:
|
|
|
1104 |
.P1
|
|
|
1105 |
r = (Rectangle){add(p, q), (Point){x, y+3}};
|
|
|
1106 |
.P2
|
|
|
1107 |
The syntax is the same as for initializing a structure but with
|
|
|
1108 |
a leading cast.
|
|
|
1109 |
.PP
|
|
|
1110 |
If an
|
|
|
1111 |
.I anonymous
|
|
|
1112 |
.I structure
|
|
|
1113 |
or
|
|
|
1114 |
.I union
|
|
|
1115 |
is declared within another structure or union, the members of the internal
|
|
|
1116 |
structure or union are addressable without prefix in the outer structure.
|
|
|
1117 |
This feature eliminates the clumsy naming of nested structures and,
|
|
|
1118 |
particularly, unions.
|
|
|
1119 |
For example, after these declarations,
|
|
|
1120 |
.P1
|
|
|
1121 |
struct Lock
|
|
|
1122 |
{
|
|
|
1123 |
int locked;
|
|
|
1124 |
};
|
|
|
1125 |
|
|
|
1126 |
struct Node
|
|
|
1127 |
{
|
|
|
1128 |
int type;
|
|
|
1129 |
union{
|
|
|
1130 |
double dval;
|
|
|
1131 |
double fval;
|
|
|
1132 |
long lval;
|
|
|
1133 |
}; /* anonymous union */
|
|
|
1134 |
struct Lock; /* anonymous structure */
|
|
|
1135 |
} *node;
|
|
|
1136 |
|
|
|
1137 |
void lock(struct Lock*);
|
|
|
1138 |
.P2
|
|
|
1139 |
one may refer to
|
|
|
1140 |
.CW node->type ,
|
|
|
1141 |
.CW node->dval ,
|
|
|
1142 |
.CW node->fval ,
|
|
|
1143 |
.CW node->lval ,
|
|
|
1144 |
and
|
|
|
1145 |
.CW node->locked .
|
|
|
1146 |
Moreover, the address of a
|
|
|
1147 |
.CW struct
|
|
|
1148 |
.CW Node
|
|
|
1149 |
may be used without a cast anywhere that the address of a
|
|
|
1150 |
.CW struct
|
|
|
1151 |
.CW Lock
|
|
|
1152 |
is used, such as in argument lists.
|
|
|
1153 |
The compiler automatically promotes the type and adjusts the address.
|
|
|
1154 |
Thus one may invoke
|
|
|
1155 |
.CW lock(node) .
|
|
|
1156 |
.PP
|
|
|
1157 |
Anonymous structures and unions may be accessed by type name
|
|
|
1158 |
if (and only if) they are declared using a
|
|
|
1159 |
.CW typedef
|
|
|
1160 |
name.
|
|
|
1161 |
For example, using the above declaration for
|
|
|
1162 |
.CW Point ,
|
|
|
1163 |
one may declare
|
|
|
1164 |
.P1
|
|
|
1165 |
struct
|
|
|
1166 |
{
|
|
|
1167 |
int type;
|
|
|
1168 |
Point;
|
|
|
1169 |
} p;
|
|
|
1170 |
.P2
|
|
|
1171 |
and refer to
|
|
|
1172 |
.CW p.Point .
|
|
|
1173 |
.PP
|
|
|
1174 |
In the initialization of arrays, a number in square brackets before an
|
|
|
1175 |
element sets the index for the initialization. For example, to initialize
|
|
|
1176 |
some elements in
|
|
|
1177 |
a table of function pointers indexed by
|
|
|
1178 |
ASCII
|
|
|
1179 |
character,
|
|
|
1180 |
.P1
|
|
|
1181 |
void percent(void), slash(void);
|
|
|
1182 |
|
|
|
1183 |
void (*func[128])(void) =
|
|
|
1184 |
{
|
|
|
1185 |
['%'] percent,
|
|
|
1186 |
['/'] slash,
|
|
|
1187 |
};
|
|
|
1188 |
.P2
|
|
|
1189 |
.LP
|
|
|
1190 |
A similar syntax allows one to initialize structure elements:
|
|
|
1191 |
.P1
|
|
|
1192 |
Point p =
|
|
|
1193 |
{
|
|
|
1194 |
.y 100,
|
|
|
1195 |
.x 200
|
|
|
1196 |
};
|
|
|
1197 |
.P2
|
|
|
1198 |
These initialization syntaxes were later added to ANSI C, with the addition of an
|
|
|
1199 |
equals sign between the index or tag and the value.
|
|
|
1200 |
The Plan 9 compiler accepts either form.
|
|
|
1201 |
.PP
|
|
|
1202 |
Finally, the declaration
|
|
|
1203 |
.P1
|
|
|
1204 |
extern register reg;
|
|
|
1205 |
.P2
|
|
|
1206 |
.I this "" (
|
|
|
1207 |
appearance of the register keyword is not ignored)
|
|
|
1208 |
allocates a global register to hold the variable
|
|
|
1209 |
.CW reg .
|
|
|
1210 |
External registers must be used carefully: they need to be declared in
|
|
|
1211 |
.I all
|
|
|
1212 |
source files and libraries in the program to guarantee the register
|
|
|
1213 |
is not allocated temporarily for other purposes.
|
|
|
1214 |
Especially on machines with few registers, such as the i386,
|
|
|
1215 |
it is easy to link accidentally with code that has already usurped
|
|
|
1216 |
the global registers and there is no diagnostic when this happens.
|
|
|
1217 |
Used wisely, though, external registers are powerful.
|
|
|
1218 |
The Plan 9 operating system uses them to access per-process and
|
|
|
1219 |
per-machine data structures on a multiprocessor. The storage class they provide
|
|
|
1220 |
is hard to create in other ways.
|
|
|
1221 |
.SH
|
|
|
1222 |
The compile-time environment
|
|
|
1223 |
.PP
|
|
|
1224 |
The code generated by the compilers is `optimized' by default:
|
|
|
1225 |
variables are placed in registers and peephole optimizations are
|
|
|
1226 |
performed.
|
|
|
1227 |
The compiler flag
|
|
|
1228 |
.CW -N
|
|
|
1229 |
disables these optimizations.
|
|
|
1230 |
Registerization is done locally rather than throughout a function:
|
|
|
1231 |
whether a variable occupies a register or
|
|
|
1232 |
the memory location identified in the symbol
|
|
|
1233 |
table depends on the activity of the variable and may change
|
|
|
1234 |
throughout the life of the variable.
|
|
|
1235 |
The
|
|
|
1236 |
.CW -N
|
|
|
1237 |
flag is rarely needed;
|
|
|
1238 |
its main use is to simplify debugging.
|
|
|
1239 |
There is no information in the symbol table to identify the
|
|
|
1240 |
registerization of a variable, so
|
|
|
1241 |
.CW -N
|
|
|
1242 |
guarantees the variable is always where the symbol table says it is.
|
|
|
1243 |
.PP
|
|
|
1244 |
Another flag,
|
|
|
1245 |
.CW -w ,
|
|
|
1246 |
turns
|
|
|
1247 |
.I on
|
|
|
1248 |
warnings about portability and problems detected in flow analysis.
|
|
|
1249 |
Most code in Plan 9 is compiled with warnings enabled;
|
|
|
1250 |
these warnings plus the type checking offered by function prototypes
|
|
|
1251 |
provide most of the support of the Unix tool
|
|
|
1252 |
.CW lint
|
|
|
1253 |
more accurately and with less chatter.
|
|
|
1254 |
Two of the warnings,
|
|
|
1255 |
`used and not set' and `set and not used', are almost always accurate but
|
|
|
1256 |
may be triggered spuriously by code with invisible control flow,
|
|
|
1257 |
such as in routines that call
|
|
|
1258 |
.CW longjmp .
|
|
|
1259 |
The compiler statements
|
|
|
1260 |
.P1
|
|
|
1261 |
SET(v1);
|
|
|
1262 |
USED(v2);
|
|
|
1263 |
.P2
|
|
|
1264 |
decorate the flow graph to silence the compiler.
|
|
|
1265 |
Either statement accepts a comma-separated list of variables.
|
|
|
1266 |
Use them carefully: they may silence real errors.
|
|
|
1267 |
For the common case of unused parameters to a function,
|
|
|
1268 |
leaving the name off the declaration silences the warnings.
|
|
|
1269 |
That is, listing the type of a parameter but giving it no
|
|
|
1270 |
associated variable name does the trick.
|
|
|
1271 |
.SH
|
|
|
1272 |
Debugging
|
|
|
1273 |
.PP
|
|
|
1274 |
There are two debuggers available on Plan 9.
|
|
|
1275 |
The first, and older, is
|
|
|
1276 |
.CW db ,
|
|
|
1277 |
a revision of Unix
|
|
|
1278 |
.CW adb .
|
|
|
1279 |
The other,
|
|
|
1280 |
.CW acid ,
|
|
|
1281 |
is a source-level debugger whose commands are statements in
|
|
|
1282 |
a true programming language.
|
|
|
1283 |
.CW Acid
|
|
|
1284 |
is the preferred debugger, but since it
|
|
|
1285 |
borrows some elements of
|
|
|
1286 |
.CW db ,
|
|
|
1287 |
notably the formats for displaying values, it is worth knowing a little bit about
|
|
|
1288 |
.CW db .
|
|
|
1289 |
.PP
|
|
|
1290 |
Both debuggers support multiple architectures in a single program; that is,
|
|
|
1291 |
the programs are
|
|
|
1292 |
.CW db
|
|
|
1293 |
and
|
|
|
1294 |
.CW acid ,
|
|
|
1295 |
not for example
|
|
|
1296 |
.CW vdb
|
|
|
1297 |
and
|
|
|
1298 |
.CW vacid .
|
|
|
1299 |
They also support cross-architecture debugging comfortably:
|
|
|
1300 |
one may debug a 386 binary on a MIPS.
|
|
|
1301 |
.PP
|
|
|
1302 |
Imagine a program has crashed mysteriously:
|
|
|
1303 |
.P1
|
|
|
1304 |
% X11/X
|
|
|
1305 |
Fatal server bug!
|
|
|
1306 |
failed to create default stipple
|
|
|
1307 |
X 106: suicide: sys: trap: fault read addr=0x0 pc=0x00105fb8
|
|
|
1308 |
%
|
|
|
1309 |
.P2
|
|
|
1310 |
When a process dies on Plan 9 it hangs in the `broken' state
|
|
|
1311 |
for debugging.
|
|
|
1312 |
Attach a debugger to the process by naming its process id:
|
|
|
1313 |
.P1
|
|
|
1314 |
% acid 106
|
|
|
1315 |
/proc/106/text:mips plan 9 executable
|
|
|
1316 |
|
|
|
1317 |
/sys/lib/acid/port
|
|
|
1318 |
/sys/lib/acid/mips
|
|
|
1319 |
acid:
|
|
|
1320 |
.P2
|
|
|
1321 |
The
|
|
|
1322 |
.CW acid
|
|
|
1323 |
function
|
|
|
1324 |
.CW stk()
|
|
|
1325 |
reports the stack traceback:
|
|
|
1326 |
.P1
|
|
|
1327 |
acid: stk()
|
|
|
1328 |
At pc:0x105fb8:abort+0x24 /sys/src/ape/lib/ap/stdio/abort.c:6
|
|
|
1329 |
abort() /sys/src/ape/lib/ap/stdio/abort.c:4
|
|
|
1330 |
called from FatalError+#4e
|
|
|
1331 |
/sys/src/X/mit/server/dix/misc.c:421
|
|
|
1332 |
FatalError(s9=#e02, s8=#4901d200, s7=#2, s6=#72701, s5=#1,
|
|
|
1333 |
s4=#7270d, s3=#6, s2=#12, s1=#ff37f1c, s0=#6, f=#7270f)
|
|
|
1334 |
/sys/src/X/mit/server/dix/misc.c:416
|
|
|
1335 |
called from gnotscreeninit+#4ce
|
|
|
1336 |
/sys/src/X/mit/server/ddx/gnot/gnot.c:792
|
|
|
1337 |
gnotscreeninit(snum=#0, sc=#80db0)
|
|
|
1338 |
/sys/src/X/mit/server/ddx/gnot/gnot.c:766
|
|
|
1339 |
called from AddScreen+#16e
|
|
|
1340 |
/n/bootes/sys/src/X/mit/server/dix/main.c:610
|
|
|
1341 |
AddScreen(pfnInit=0x0000129c,argc=0x00000001,argv=0x7fffffe4)
|
|
|
1342 |
/sys/src/X/mit/server/dix/main.c:530
|
|
|
1343 |
called from InitOutput+0x80
|
|
|
1344 |
/sys/src/X/mit/server/ddx/brazil/brddx.c:522
|
|
|
1345 |
InitOutput(argc=0x00000001,argv=0x7fffffe4)
|
|
|
1346 |
/sys/src/X/mit/server/ddx/brazil/brddx.c:511
|
|
|
1347 |
called from main+0x294
|
|
|
1348 |
/sys/src/X/mit/server/dix/main.c:225
|
|
|
1349 |
main(argc=0x00000001,argv=0x7fffffe4)
|
|
|
1350 |
/sys/src/X/mit/server/dix/main.c:136
|
|
|
1351 |
called from _main+0x24
|
|
|
1352 |
/sys/src/ape/lib/ap/mips/main9.s:8
|
|
|
1353 |
.P2
|
|
|
1354 |
The function
|
|
|
1355 |
.CW lstk()
|
|
|
1356 |
is similar but
|
|
|
1357 |
also reports the values of local variables.
|
|
|
1358 |
Note that the traceback includes full file names; this is a boon to debugging,
|
|
|
1359 |
although it makes the output much noisier.
|
|
|
1360 |
.PP
|
|
|
1361 |
To use
|
|
|
1362 |
.CW acid
|
|
|
1363 |
well you will need to learn its input language; see the
|
|
|
1364 |
``Acid Manual'',
|
|
|
1365 |
by Phil Winterbottom,
|
|
|
1366 |
for details. For simple debugging, however, the information in the manual page is
|
|
|
1367 |
sufficient. In particular, it describes the most useful functions
|
|
|
1368 |
for examining a process.
|
|
|
1369 |
.PP
|
|
|
1370 |
The compiler does not place
|
|
|
1371 |
information describing the types of variables in the executable,
|
|
|
1372 |
but a compile-time flag provides crude support for symbolic debugging.
|
|
|
1373 |
The
|
|
|
1374 |
.CW -a
|
|
|
1375 |
flag to the compiler suppresses code generation
|
|
|
1376 |
and instead emits source text in the
|
|
|
1377 |
.CW acid
|
|
|
1378 |
language to format and display data structure types defined in the program.
|
|
|
1379 |
The easiest way to use this feature is to put a rule in the
|
|
|
1380 |
.CW mkfile :
|
|
|
1381 |
.P1
|
|
|
1382 |
syms: main.$O
|
|
|
1383 |
$CC -a main.c > syms
|
|
|
1384 |
.P2
|
|
|
1385 |
Then from within
|
|
|
1386 |
.CW acid ,
|
|
|
1387 |
.P1
|
|
|
1388 |
acid: include("sourcedirectory/syms")
|
|
|
1389 |
.P2
|
|
|
1390 |
to read in the relevant definitions.
|
|
|
1391 |
(For multi-file source, you need to be a little fancier;
|
|
|
1392 |
see
|
|
|
1393 |
.I 8c (1)).
|
|
|
1394 |
This text includes, for each defined compound
|
|
|
1395 |
type, a function with that name that may be called with the address of a structure
|
|
|
1396 |
of that type to display its contents.
|
|
|
1397 |
For example, if
|
|
|
1398 |
.CW rect
|
|
|
1399 |
is a global variable of type
|
|
|
1400 |
.CW Rectangle ,
|
|
|
1401 |
one may execute
|
|
|
1402 |
.P1
|
|
|
1403 |
Rectangle(*rect)
|
|
|
1404 |
.P2
|
|
|
1405 |
to display it.
|
|
|
1406 |
The
|
|
|
1407 |
.CW *
|
|
|
1408 |
(indirection) operator is necessary because
|
|
|
1409 |
of the way
|
|
|
1410 |
.CW acid
|
|
|
1411 |
works: each global symbol in the program is defined as a variable by
|
|
|
1412 |
.CW acid ,
|
|
|
1413 |
with value equal to the
|
|
|
1414 |
.I address
|
|
|
1415 |
of the symbol.
|
|
|
1416 |
.PP
|
|
|
1417 |
Another common technique is to write by hand special
|
|
|
1418 |
.CW acid
|
|
|
1419 |
code to define functions to aid debugging, initialize the debugger, and so on.
|
|
|
1420 |
Conventionally, this is placed in a file called
|
|
|
1421 |
.CW acid
|
|
|
1422 |
in the source directory; it has a line
|
|
|
1423 |
.P1
|
|
|
1424 |
include("sourcedirectory/syms");
|
|
|
1425 |
.P2
|
|
|
1426 |
to load the compiler-produced symbols. One may edit the compiler output directly but
|
|
|
1427 |
it is wiser to keep the hand-generated
|
|
|
1428 |
.CW acid
|
|
|
1429 |
separate from the machine-generated.
|
|
|
1430 |
.PP
|
|
|
1431 |
To make things simple, the default rules in the system
|
|
|
1432 |
.CW mkfiles
|
|
|
1433 |
include entries to make
|
|
|
1434 |
.CW foo.acid
|
|
|
1435 |
from
|
|
|
1436 |
.CW foo.c ,
|
|
|
1437 |
so one may use
|
|
|
1438 |
.CW mk
|
|
|
1439 |
to automate the production of
|
|
|
1440 |
.CW acid
|
|
|
1441 |
definitions for a given C source file.
|
|
|
1442 |
.PP
|
|
|
1443 |
There is much more to say here. See
|
|
|
1444 |
.CW acid
|
|
|
1445 |
manual page, the reference manual, or the paper
|
|
|
1446 |
``Acid: A Debugger Built From A Language'',
|
|
|
1447 |
also by Phil Winterbottom.
|