2 |
- |
1 |
.HTML "Plan 9 C Compilers
|
|
|
2 |
.TL
|
|
|
3 |
Plan 9 C Compilers
|
|
|
4 |
.AU
|
|
|
5 |
Ken Thompson
|
|
|
6 |
ken@plan9.bell-labs.com
|
|
|
7 |
.AB
|
|
|
8 |
.FS
|
|
|
9 |
Originally appeared, in a different form, in
|
|
|
10 |
.I
|
|
|
11 |
Proceedings of the Summer 1990 UKUUG Conference,
|
|
|
12 |
.R
|
|
|
13 |
pp. 41-51,
|
|
|
14 |
London, 1990.
|
|
|
15 |
.FE
|
|
|
16 |
This paper describes the overall structure and function of the Plan 9 C compilers.
|
|
|
17 |
A more detailed implementation document
|
|
|
18 |
for any one of the compilers
|
|
|
19 |
is yet to be written.
|
|
|
20 |
.AE
|
|
|
21 |
.NH
|
|
|
22 |
Introduction
|
|
|
23 |
.LP
|
|
|
24 |
There are many compilers in the series.
|
|
|
25 |
Six of the compilers
|
|
|
26 |
(Intel 386, AMD64, PowerPC, PowerPC 64-bit, ARM, MIPS R3000)
|
|
|
27 |
are considered active and are used to compile
|
|
|
28 |
current versions of Plan 9.
|
|
|
29 |
One of the compilers (SPARC)
|
|
|
30 |
is maintained but is for older machines
|
|
|
31 |
for which we have no current ports of Plan 9;
|
|
|
32 |
we are unlikely to port to any SPARC machines.
|
|
|
33 |
The DEC Alpha and Motorola 68020 compilers have been retired.
|
|
|
34 |
Several others (Motorola 68000, Intel 960, AMD 29000)
|
|
|
35 |
have had only limited use, such as
|
|
|
36 |
to program peripherals or experimental devices.
|
|
|
37 |
Any of the disused compilers could be restored if needed.
|
|
|
38 |
.NH
|
|
|
39 |
Structure
|
|
|
40 |
.LP
|
|
|
41 |
The compiler is a single program that produces an
|
|
|
42 |
object file.
|
|
|
43 |
Combined in the compiler are the traditional
|
|
|
44 |
roles of preprocessor, lexical analyzer, parser, code generator,
|
|
|
45 |
local optimizer,
|
|
|
46 |
and first half of the assembler.
|
|
|
47 |
The object files are binary forms of assembly
|
|
|
48 |
language,
|
|
|
49 |
similar to what might be passed between
|
|
|
50 |
the first and second passes of an assembler.
|
|
|
51 |
.LP
|
|
|
52 |
Object files and libraries
|
|
|
53 |
are combined by a loader
|
|
|
54 |
program to produce the executable binary.
|
|
|
55 |
The loader combines the roles of second half
|
|
|
56 |
of the assembler, global optimizer, and loader.
|
|
|
57 |
The names of the compilers, loaders, and assemblers
|
|
|
58 |
are as follows:
|
|
|
59 |
.DS
|
|
|
60 |
.ta 1.5i
|
|
|
61 |
.de Ta
|
|
|
62 |
\\$1 \f(CW\\$2\fP \f(CW\\$3\fP \f(CW\\$4\fP
|
|
|
63 |
..
|
|
|
64 |
.Ta SPARC kc kl ka
|
|
|
65 |
.Ta PowerPC qc ql qa
|
|
|
66 |
.Ta MIPS vc vl va
|
|
|
67 |
.Ta MIPS\ little-endian 0c 0l 0a
|
|
|
68 |
.Ta ARM 5c 5l 5a
|
|
|
69 |
.Ta AMD64 6c 6l 6a
|
|
|
70 |
.Ta Intel\ 386 8c 8l 8a
|
|
|
71 |
.Ta PowerPC\ 64-bit 9c 9l 9a
|
|
|
72 |
.DE
|
|
|
73 |
There is a further breakdown
|
|
|
74 |
in the source of the compilers into
|
|
|
75 |
object-independent and
|
|
|
76 |
object-dependent
|
|
|
77 |
parts.
|
|
|
78 |
All of the object-independent parts
|
|
|
79 |
are combined into source files in the
|
|
|
80 |
directory
|
|
|
81 |
.CW /sys/src/cmd/cc .
|
|
|
82 |
The object-dependent parts are collected
|
|
|
83 |
in a separate directory for each compiler,
|
|
|
84 |
for example
|
|
|
85 |
.CW /sys/src/cmd/vc .
|
|
|
86 |
All of the code,
|
|
|
87 |
both object-independent and
|
|
|
88 |
object-dependent,
|
|
|
89 |
is machine-independent
|
|
|
90 |
and may be cross-compiled and executed on any
|
|
|
91 |
of the architectures.
|
|
|
92 |
.NH
|
|
|
93 |
The Language
|
|
|
94 |
.LP
|
|
|
95 |
The compiler implements ANSI C with some
|
|
|
96 |
restrictions and extensions
|
|
|
97 |
[ANSI90].
|
|
|
98 |
Most of the restrictions are due to
|
|
|
99 |
personal preference, while
|
|
|
100 |
most of the extensions were to help in
|
|
|
101 |
the implementation of Plan 9.
|
|
|
102 |
There are other departures from the standard,
|
|
|
103 |
particularly in the libraries,
|
|
|
104 |
that are beyond the scope of this
|
|
|
105 |
paper.
|
|
|
106 |
.NH 2
|
|
|
107 |
Register, volatile, const
|
|
|
108 |
.LP
|
|
|
109 |
The keyword
|
|
|
110 |
.CW register
|
|
|
111 |
is recognized syntactically
|
|
|
112 |
but is semantically ignored.
|
|
|
113 |
Thus taking the address of a
|
|
|
114 |
.CW register
|
|
|
115 |
variable is not diagnosed.
|
|
|
116 |
The keyword
|
|
|
117 |
.CW volatile
|
|
|
118 |
disables all optimizations, in particular registerization, of the corresponding variable.
|
|
|
119 |
The keyword
|
|
|
120 |
.CW const
|
|
|
121 |
generates warnings (if warnings are enabled by the compiler's
|
|
|
122 |
.CW -w
|
|
|
123 |
option) of non-constant use of the variable,
|
|
|
124 |
but does not affect the generated code.
|
|
|
125 |
.NH 2
|
|
|
126 |
The preprocessor
|
|
|
127 |
.LP
|
|
|
128 |
The C preprocessor is probably the
|
|
|
129 |
biggest departure from the ANSI standard.
|
|
|
130 |
.LP
|
|
|
131 |
The preprocessor built into the Plan 9 compilers does not support
|
|
|
132 |
.CW #if ,
|
|
|
133 |
although it does handle
|
|
|
134 |
.CW #ifdef
|
|
|
135 |
and
|
|
|
136 |
.CW #include .
|
|
|
137 |
If it is necessary to be more standard,
|
|
|
138 |
the source text can first be run through the separate ANSI C
|
|
|
139 |
preprocessor,
|
|
|
140 |
.CW cpp .
|
|
|
141 |
.NH 2
|
|
|
142 |
Unnamed substructures
|
|
|
143 |
.LP
|
|
|
144 |
The most important and most heavily used of the
|
|
|
145 |
extensions is the declaration of an
|
|
|
146 |
unnamed substructure or subunion.
|
|
|
147 |
For example:
|
|
|
148 |
.DS
|
|
|
149 |
.CW
|
|
|
150 |
.ta .1i .6i 1.1i 1.6i
|
|
|
151 |
typedef
|
|
|
152 |
struct lock
|
|
|
153 |
{
|
|
|
154 |
int locked;
|
|
|
155 |
} Lock;
|
|
|
156 |
|
|
|
157 |
typedef
|
|
|
158 |
struct node
|
|
|
159 |
{
|
|
|
160 |
int type;
|
|
|
161 |
union
|
|
|
162 |
{
|
|
|
163 |
double dval;
|
|
|
164 |
float fval;
|
|
|
165 |
long lval;
|
|
|
166 |
};
|
|
|
167 |
Lock;
|
|
|
168 |
} Node;
|
|
|
169 |
|
|
|
170 |
Lock* lock;
|
|
|
171 |
Node* node;
|
|
|
172 |
.DE
|
|
|
173 |
The declaration of
|
|
|
174 |
.CW Node
|
|
|
175 |
has an unnamed substructure of type
|
|
|
176 |
.CW Lock
|
|
|
177 |
and an unnamed subunion.
|
|
|
178 |
One use of this feature allows references to elements of the
|
|
|
179 |
subunit to be accessed as if they were in
|
|
|
180 |
the outer structure.
|
|
|
181 |
Thus
|
|
|
182 |
.CW node->dval
|
|
|
183 |
and
|
|
|
184 |
.CW node->locked
|
|
|
185 |
are legitimate references.
|
|
|
186 |
.LP
|
|
|
187 |
When an outer structure is used
|
|
|
188 |
in a context that is only legal for
|
|
|
189 |
an unnamed substructure,
|
|
|
190 |
the compiler promotes the reference to the
|
|
|
191 |
unnamed substructure.
|
|
|
192 |
This is true for references to structures and
|
|
|
193 |
to references to pointers to structures.
|
|
|
194 |
This happens in assignment statements and
|
|
|
195 |
in argument passing where prototypes have been
|
|
|
196 |
declared.
|
|
|
197 |
Thus, continuing with the example,
|
|
|
198 |
.DS
|
|
|
199 |
.CW
|
|
|
200 |
.ta .1i .6i 1.1i 1.6i
|
|
|
201 |
lock = node;
|
|
|
202 |
.DE
|
|
|
203 |
would assign a pointer to the unnamed
|
|
|
204 |
.CW Lock
|
|
|
205 |
in
|
|
|
206 |
the
|
|
|
207 |
.CW Node
|
|
|
208 |
to the variable
|
|
|
209 |
.CW lock .
|
|
|
210 |
Another example,
|
|
|
211 |
.DS
|
|
|
212 |
.CW
|
|
|
213 |
.ta .1i .6i 1.1i 1.6i
|
|
|
214 |
extern void lock(Lock*);
|
|
|
215 |
func(...)
|
|
|
216 |
{
|
|
|
217 |
...
|
|
|
218 |
lock(node);
|
|
|
219 |
...
|
|
|
220 |
}
|
|
|
221 |
.DE
|
|
|
222 |
will pass a pointer to the
|
|
|
223 |
.CW Lock
|
|
|
224 |
substructure.
|
|
|
225 |
.LP
|
|
|
226 |
Finally, in places where context is insufficient to identify the unnamed structure,
|
|
|
227 |
the type name (it must be a
|
|
|
228 |
.CW typedef )
|
|
|
229 |
of the unnamed structure can be used as an identifier.
|
|
|
230 |
In our example,
|
|
|
231 |
.CW &node->Lock
|
|
|
232 |
gives the address of the anonymous
|
|
|
233 |
.CW Lock
|
|
|
234 |
structure.
|
|
|
235 |
.NH 2
|
|
|
236 |
Structure displays
|
|
|
237 |
.LP
|
|
|
238 |
A structure cast followed by a list of expressions in braces is
|
|
|
239 |
an expression with the type of the structure and elements assigned from
|
|
|
240 |
the corresponding list.
|
|
|
241 |
Structures are now almost first-class citizens of the language.
|
|
|
242 |
It is common to see code like this:
|
|
|
243 |
.DS
|
|
|
244 |
.CW
|
|
|
245 |
.ta .1i
|
|
|
246 |
r = (Rectangle){point1, (Point){x,y+2}};
|
|
|
247 |
.DE
|
|
|
248 |
.NH 2
|
|
|
249 |
Initialization indexes
|
|
|
250 |
.LP
|
|
|
251 |
In initializers of arrays,
|
|
|
252 |
one may place a constant expression
|
|
|
253 |
in square brackets before an initializer.
|
|
|
254 |
This causes the next initializer to assign
|
|
|
255 |
the indicated element.
|
|
|
256 |
For example:
|
|
|
257 |
.DS
|
|
|
258 |
.CW
|
|
|
259 |
.ta .1i .6i 1.6i
|
|
|
260 |
enum errors
|
|
|
261 |
{
|
|
|
262 |
Etoobig,
|
|
|
263 |
Ealarm,
|
|
|
264 |
Egreg
|
|
|
265 |
};
|
|
|
266 |
char* errstrings[] =
|
|
|
267 |
{
|
|
|
268 |
[Ealarm] "Alarm call",
|
|
|
269 |
[Egreg] "Panic: out of mbufs",
|
|
|
270 |
[Etoobig] "Arg list too long",
|
|
|
271 |
};
|
|
|
272 |
.DE
|
|
|
273 |
In the same way,
|
|
|
274 |
individual structures members may
|
|
|
275 |
be initialized in any order by preceding the initialization with
|
|
|
276 |
.CW .tagname .
|
|
|
277 |
Both forms allow an optional
|
|
|
278 |
.CW = ,
|
|
|
279 |
to be compatible with a proposed
|
|
|
280 |
extension to ANSI C.
|
|
|
281 |
.NH 2
|
|
|
282 |
External register
|
|
|
283 |
.LP
|
|
|
284 |
The declaration
|
|
|
285 |
.CW extern
|
|
|
286 |
.CW register
|
|
|
287 |
will dedicate a register to
|
|
|
288 |
a variable on a global basis.
|
|
|
289 |
It can be used only under special circumstances.
|
|
|
290 |
External register variables must be identically
|
|
|
291 |
declared in all modules and
|
|
|
292 |
libraries.
|
|
|
293 |
The feature is not intended for efficiency,
|
|
|
294 |
although it can produce efficient code;
|
|
|
295 |
rather it represents a unique storage class that
|
|
|
296 |
would be hard to get any other way.
|
|
|
297 |
On a shared-memory multi-processor,
|
|
|
298 |
an external register is
|
|
|
299 |
one-per-processor and neither one-per-procedure (automatic)
|
|
|
300 |
or one-per-system (external).
|
|
|
301 |
It is used for two variables in the Plan 9 kernel,
|
|
|
302 |
.CW u
|
|
|
303 |
and
|
|
|
304 |
.CW m .
|
|
|
305 |
.CW U
|
|
|
306 |
is a pointer to the structure representing the currently running process
|
|
|
307 |
and
|
|
|
308 |
.CW m
|
|
|
309 |
is a pointer to the per-machine data structure.
|
|
|
310 |
.NH 2
|
|
|
311 |
Long long
|
|
|
312 |
.LP
|
|
|
313 |
The compilers accept
|
|
|
314 |
.CW long
|
|
|
315 |
.CW long
|
|
|
316 |
as a basic type meaning 64-bit integer.
|
|
|
317 |
On some of the machines
|
|
|
318 |
this type is synthesized from 32-bit instructions.
|
|
|
319 |
.NH 2
|
|
|
320 |
Pragma
|
|
|
321 |
.LP
|
|
|
322 |
The compilers accept
|
|
|
323 |
.CW #pragma
|
|
|
324 |
.CW lib
|
|
|
325 |
.I libname
|
|
|
326 |
and pass the
|
|
|
327 |
library name string uninterpreted
|
|
|
328 |
to the loader.
|
|
|
329 |
The loader uses the library name to
|
|
|
330 |
find libraries to load.
|
|
|
331 |
If the name contains
|
|
|
332 |
.CW $O ,
|
|
|
333 |
it is replaced with
|
|
|
334 |
the single character object type of the compiler
|
|
|
335 |
(e.g.,
|
|
|
336 |
.CW v
|
|
|
337 |
for the MIPS).
|
|
|
338 |
If the name contains
|
|
|
339 |
.CW $M ,
|
|
|
340 |
it is replaced with
|
|
|
341 |
the architecture type for the compiler
|
|
|
342 |
(e.g.,
|
|
|
343 |
.CW mips
|
|
|
344 |
for the MIPS).
|
|
|
345 |
If the name starts with
|
|
|
346 |
.CW /
|
|
|
347 |
it is an absolute pathname;
|
|
|
348 |
if it starts with
|
|
|
349 |
.CW .
|
|
|
350 |
then it is searched for in the loader's current directory.
|
|
|
351 |
Otherwise, the name is searched from
|
|
|
352 |
.CW /$M/lib .
|
|
|
353 |
Such
|
|
|
354 |
.CW #pragma
|
|
|
355 |
statements in header files guarantee that the correct
|
|
|
356 |
libraries are always linked with a program without the
|
|
|
357 |
need to specify them explicitly at link time.
|
|
|
358 |
.LP
|
|
|
359 |
They also accept
|
|
|
360 |
.CW #pragma
|
|
|
361 |
.CW packed
|
|
|
362 |
.CW on
|
|
|
363 |
(or
|
|
|
364 |
.CW yes
|
|
|
365 |
or
|
|
|
366 |
.CW 1 )
|
|
|
367 |
to cause subsequently declared data, until
|
|
|
368 |
.CW #pragma
|
|
|
369 |
.CW packed
|
|
|
370 |
.CW off
|
|
|
371 |
(or
|
|
|
372 |
.CW no
|
|
|
373 |
or
|
|
|
374 |
.CW 0 ),
|
|
|
375 |
to be laid out in memory tightly packed in successive bytes, disregarding
|
|
|
376 |
the usual alignment rules.
|
|
|
377 |
Accessing such data can cause faults.
|
|
|
378 |
.LP
|
|
|
379 |
Similarly,
|
|
|
380 |
.CW #pragma
|
|
|
381 |
.CW profile
|
|
|
382 |
.CW off
|
|
|
383 |
(or
|
|
|
384 |
.CW no
|
|
|
385 |
or
|
|
|
386 |
.CW 0 )
|
|
|
387 |
causes subsequently declared functions, until
|
|
|
388 |
.CW #pragma
|
|
|
389 |
.CW profile
|
|
|
390 |
.CW on
|
|
|
391 |
(or
|
|
|
392 |
.CW yes
|
|
|
393 |
or
|
|
|
394 |
.CW 1 ),
|
|
|
395 |
to be marked as unprofiled.
|
|
|
396 |
Such functions will not be profiled when
|
|
|
397 |
profiling is enabled for the rest of the program.
|
|
|
398 |
.LP
|
|
|
399 |
Two
|
|
|
400 |
.CW #pragma
|
|
|
401 |
statements allow type-checking of
|
|
|
402 |
.CW print -like
|
|
|
403 |
functions.
|
|
|
404 |
The first, of the form
|
|
|
405 |
.P1
|
|
|
406 |
#pragma varargck argpos error 2
|
|
|
407 |
.P2
|
|
|
408 |
tells the compiler that the second argument to
|
|
|
409 |
.CW error
|
|
|
410 |
is a
|
|
|
411 |
.CW print
|
|
|
412 |
format string (see the manual page
|
|
|
413 |
.I print (2))
|
|
|
414 |
that specifies how to format
|
|
|
415 |
.CW error 's
|
|
|
416 |
subsequent arguments.
|
|
|
417 |
The second, of the form
|
|
|
418 |
.P1
|
|
|
419 |
#pragma varargck type "s" char*
|
|
|
420 |
.P2
|
|
|
421 |
says that the
|
|
|
422 |
.CW print
|
|
|
423 |
format verb
|
|
|
424 |
.CW s
|
|
|
425 |
processes an argument of
|
|
|
426 |
type
|
|
|
427 |
.CW char* .
|
|
|
428 |
If the compiler's
|
|
|
429 |
.CW -F
|
|
|
430 |
option is enabled, the compiler will use this information
|
|
|
431 |
to report type violations in the arguments to
|
|
|
432 |
.CW print ,
|
|
|
433 |
.CW error ,
|
|
|
434 |
and similar routines.
|
|
|
435 |
.NH
|
|
|
436 |
Object module conventions
|
|
|
437 |
.LP
|
|
|
438 |
The overall conventions of the runtime environment
|
|
|
439 |
are important
|
|
|
440 |
to runtime efficiency.
|
|
|
441 |
In this section,
|
|
|
442 |
several of these conventions are discussed.
|
|
|
443 |
.NH 2
|
|
|
444 |
Register saving
|
|
|
445 |
.LP
|
|
|
446 |
In the Plan 9 compilers,
|
|
|
447 |
the caller of a procedure saves the registers.
|
|
|
448 |
With caller-saves,
|
|
|
449 |
the leaf procedures can use all the
|
|
|
450 |
registers and never save them.
|
|
|
451 |
If you spend a lot of time at the leaves,
|
|
|
452 |
this seems preferable.
|
|
|
453 |
With callee-saves,
|
|
|
454 |
the saving of the registers is done
|
|
|
455 |
in the single point of entry and return.
|
|
|
456 |
If you are interested in space,
|
|
|
457 |
this seems preferable.
|
|
|
458 |
In both,
|
|
|
459 |
there is a degree of uncertainty
|
|
|
460 |
about what registers need to be saved.
|
|
|
461 |
Callee-saved registers make it difficult to
|
|
|
462 |
find variables in registers in debuggers.
|
|
|
463 |
Callee-saved registers also complicate
|
|
|
464 |
the implementation of
|
|
|
465 |
.CW longjmp .
|
|
|
466 |
The convincing argument is
|
|
|
467 |
that with caller-saves,
|
|
|
468 |
the decision to registerize a variable
|
|
|
469 |
can include the cost of saving the register
|
|
|
470 |
across calls.
|
|
|
471 |
For a further discussion of caller- vs. callee-saves,
|
|
|
472 |
see the paper by Davidson and Whalley [Dav91].
|
|
|
473 |
.LP
|
|
|
474 |
In the Plan 9 operating system,
|
|
|
475 |
calls to the kernel look like normal procedure
|
|
|
476 |
calls, which means
|
|
|
477 |
the caller
|
|
|
478 |
has saved the registers and the system
|
|
|
479 |
entry does not have to.
|
|
|
480 |
This makes system calls considerably faster.
|
|
|
481 |
Since this is a potential security hole,
|
|
|
482 |
and can lead to non-determinism,
|
|
|
483 |
the system may eventually save the registers
|
|
|
484 |
on entry,
|
|
|
485 |
or more likely clear the registers on return.
|
|
|
486 |
.NH 2
|
|
|
487 |
Calling convention
|
|
|
488 |
.LP
|
|
|
489 |
Older C compilers maintain a frame pointer, which is at a known constant
|
|
|
490 |
offset from the stack pointer within each function.
|
|
|
491 |
For machines where the stack grows towards zero,
|
|
|
492 |
the argument pointer is at a known constant offset
|
|
|
493 |
from the frame pointer.
|
|
|
494 |
Since the stack grows down in Plan 9,
|
|
|
495 |
the Plan 9 compilers
|
|
|
496 |
keep neither an
|
|
|
497 |
explicit frame pointer nor
|
|
|
498 |
an explicit argument pointer;
|
|
|
499 |
instead they generate addresses relative to the stack pointer.
|
|
|
500 |
.LP
|
|
|
501 |
On some architectures, the first argument to a subroutine is passed in a register.
|
|
|
502 |
.NH 2
|
|
|
503 |
Functions returning structures
|
|
|
504 |
.LP
|
|
|
505 |
Structures longer than one word are awkward to implement
|
|
|
506 |
since they do not fit in registers and must
|
|
|
507 |
be passed around in memory.
|
|
|
508 |
Functions that return structures
|
|
|
509 |
are particularly clumsy.
|
|
|
510 |
The Plan 9 compilers pass the return address of
|
|
|
511 |
a structure as the first argument of a
|
|
|
512 |
function that has a structure return value.
|
|
|
513 |
Thus
|
|
|
514 |
.DS
|
|
|
515 |
.CW
|
|
|
516 |
.ta .1i .6i 1.1i 1.6i
|
|
|
517 |
x = f(...)
|
|
|
518 |
.DE
|
|
|
519 |
is rewritten as
|
|
|
520 |
.DS
|
|
|
521 |
.CW
|
|
|
522 |
.ta .1i .6i 1.1i 1.6i
|
|
|
523 |
f(&x, ...)\f1.
|
|
|
524 |
.DE
|
|
|
525 |
This saves a copy and makes the compilation
|
|
|
526 |
much less clumsy.
|
|
|
527 |
A disadvantage is that if you call this
|
|
|
528 |
function without an assignment,
|
|
|
529 |
a dummy location must be invented.
|
|
|
530 |
.LP
|
|
|
531 |
There is also a danger of calling a function
|
|
|
532 |
that returns a structure without declaring
|
|
|
533 |
it as such.
|
|
|
534 |
With ANSI C function prototypes,
|
|
|
535 |
this error need never occur.
|
|
|
536 |
.NH
|
|
|
537 |
Implementation
|
|
|
538 |
.LP
|
|
|
539 |
The compiler is divided internally into
|
|
|
540 |
four machine-independent passes,
|
|
|
541 |
four machine-dependent passes,
|
|
|
542 |
and an output pass.
|
|
|
543 |
The next nine sections describe each pass in order.
|
|
|
544 |
.NH 2
|
|
|
545 |
Parsing
|
|
|
546 |
.LP
|
|
|
547 |
The first pass is a YACC-based parser
|
|
|
548 |
[Joh79].
|
|
|
549 |
Declarations are interpreted immediately,
|
|
|
550 |
building a block structured symbol table.
|
|
|
551 |
Executable statements are put into a parse tree
|
|
|
552 |
and collected,
|
|
|
553 |
without interpretation.
|
|
|
554 |
At the end of each procedure,
|
|
|
555 |
the parse tree for the function is
|
|
|
556 |
examined by the other passes of the compiler.
|
|
|
557 |
.LP
|
|
|
558 |
The input stream of the parser is
|
|
|
559 |
a pushdown list of input activations.
|
|
|
560 |
The preprocessor
|
|
|
561 |
expansions of
|
|
|
562 |
macros
|
|
|
563 |
and
|
|
|
564 |
.CW #include
|
|
|
565 |
are implemented as pushdowns.
|
|
|
566 |
Thus there is no separate
|
|
|
567 |
pass for preprocessing.
|
|
|
568 |
.NH 2
|
|
|
569 |
Typing
|
|
|
570 |
.LP
|
|
|
571 |
The next pass distributes typing information
|
|
|
572 |
to every node of the tree.
|
|
|
573 |
Implicit operations on the tree are added,
|
|
|
574 |
such as type promotions and taking the
|
|
|
575 |
address of arrays and functions.
|
|
|
576 |
.NH 2
|
|
|
577 |
Machine-independent optimization
|
|
|
578 |
.LP
|
|
|
579 |
The next pass performs optimizations
|
|
|
580 |
and transformations of the tree, such as converting
|
|
|
581 |
.CW &*x
|
|
|
582 |
and
|
|
|
583 |
.CW *&x
|
|
|
584 |
into
|
|
|
585 |
.CW x .
|
|
|
586 |
Constant expressions are converted to constants in this pass.
|
|
|
587 |
.NH 2
|
|
|
588 |
Arithmetic rewrites
|
|
|
589 |
.LP
|
|
|
590 |
This is another machine-independent optimization.
|
|
|
591 |
Subtrees of add, subtract, and multiply of integers are
|
|
|
592 |
rewritten for easier compilation.
|
|
|
593 |
The major transformation is factoring:
|
|
|
594 |
.CW 4+8*a+16*b+5
|
|
|
595 |
is transformed into
|
|
|
596 |
.CW 9+8*(a+2*b) .
|
|
|
597 |
Such expressions arise from address
|
|
|
598 |
manipulation and array indexing.
|
|
|
599 |
.NH 2
|
|
|
600 |
Addressability
|
|
|
601 |
.LP
|
|
|
602 |
This is the first of the machine-dependent passes.
|
|
|
603 |
The addressability of a processor is defined as the set of
|
|
|
604 |
expressions that is legal in the address field
|
|
|
605 |
of a machine language instruction.
|
|
|
606 |
The addressability of different processors varies widely.
|
|
|
607 |
At one end of the spectrum are the 68020 and VAX,
|
|
|
608 |
which allow a complex mix of incrementing,
|
|
|
609 |
decrementing,
|
|
|
610 |
indexing, and relative addressing.
|
|
|
611 |
At the other end is the MIPS,
|
|
|
612 |
which allows only registers and constant offsets from the
|
|
|
613 |
contents of a register.
|
|
|
614 |
The addressability can be different for different instructions
|
|
|
615 |
within the same processor.
|
|
|
616 |
.LP
|
|
|
617 |
It is important to the code generator to know when a
|
|
|
618 |
subtree represents an address of a particular type.
|
|
|
619 |
This is done with a bottom-up walk of the tree.
|
|
|
620 |
In this pass, the leaves are labeled with small integers.
|
|
|
621 |
When an internal node is encountered,
|
|
|
622 |
it is labeled by consulting a table indexed by the
|
|
|
623 |
labels on the left and right subtrees.
|
|
|
624 |
For example,
|
|
|
625 |
on the 68020 processor,
|
|
|
626 |
it is possible to address an
|
|
|
627 |
offset from a named location.
|
|
|
628 |
In C, this is represented by the expression
|
|
|
629 |
.CW *(&name+constant) .
|
|
|
630 |
This is marked addressable by the following table.
|
|
|
631 |
In the table,
|
|
|
632 |
a node represented by the left column is marked
|
|
|
633 |
with a small integer from the right column.
|
|
|
634 |
Marks of the form
|
|
|
635 |
.CW A\s-2\di\u\s0
|
|
|
636 |
are addressable while
|
|
|
637 |
marks of the form
|
|
|
638 |
.CW N\s-2\di\u\s0
|
|
|
639 |
are not addressable.
|
|
|
640 |
.DS
|
|
|
641 |
.B
|
|
|
642 |
.ta .1i 1.1i
|
|
|
643 |
Node Marked
|
|
|
644 |
.CW
|
|
|
645 |
name A\s-2\d1\u\s0
|
|
|
646 |
const A\s-2\d2\u\s0
|
|
|
647 |
&A\s-2\d1\u\s0 A\s-2\d3\u\s0
|
|
|
648 |
A\s-2\d3\u\s0+A\s-2\d1\u\s0 N\s-2\d1\u\s0 \fR(note that this is not addressable)\fP
|
|
|
649 |
*N\s-2\d1\u\s0 A\s-2\d4\u\s0
|
|
|
650 |
.DE
|
|
|
651 |
Here there is a distinction between
|
|
|
652 |
a node marked
|
|
|
653 |
.CW A\s-2\d1\u\s0
|
|
|
654 |
and a node marked
|
|
|
655 |
.CW A\s-2\d4\u\s0
|
|
|
656 |
because the address operator of an
|
|
|
657 |
.CW A\s-2\d4\u\s0
|
|
|
658 |
node is not addressable.
|
|
|
659 |
So to extend the table:
|
|
|
660 |
.DS
|
|
|
661 |
.B
|
|
|
662 |
.ta .1i 1.1i
|
|
|
663 |
Node Marked
|
|
|
664 |
.CW
|
|
|
665 |
&A\s-2\d4\u\s0 N\s-2\d2\u\s0
|
|
|
666 |
N\s-2\d2\u\s0+N\s-2\d1\u\s0 N\s-2\d1\u\s0
|
|
|
667 |
.DE
|
|
|
668 |
The full addressability of the 68020 is expressed
|
|
|
669 |
in 18 rules like this,
|
|
|
670 |
while the addressability of the MIPS is expressed
|
|
|
671 |
in 11 rules.
|
|
|
672 |
When one ports the compiler,
|
|
|
673 |
this table is usually initialized
|
|
|
674 |
so that leaves are labeled as addressable and nothing else.
|
|
|
675 |
The code produced is poor,
|
|
|
676 |
but porting is easy.
|
|
|
677 |
The table can be extended later.
|
|
|
678 |
.LP
|
|
|
679 |
This pass also rewrites some complex operators
|
|
|
680 |
into procedure calls.
|
|
|
681 |
Examples include 64-bit multiply and divide.
|
|
|
682 |
.LP
|
|
|
683 |
In the same bottom-up pass of the tree,
|
|
|
684 |
the nodes are labeled with a Sethi-Ullman complexity
|
|
|
685 |
[Set70].
|
|
|
686 |
This number is roughly the number of registers required
|
|
|
687 |
to compile the tree on an ideal machine.
|
|
|
688 |
An addressable node is marked 0.
|
|
|
689 |
A function call is marked infinite.
|
|
|
690 |
A unary operator is marked as the
|
|
|
691 |
maximum of 1 and the mark of its subtree.
|
|
|
692 |
A binary operator with equal marks on its subtrees is
|
|
|
693 |
marked with a subtree mark plus 1.
|
|
|
694 |
A binary operator with unequal marks on its subtrees is
|
|
|
695 |
marked with the maximum mark of its subtrees.
|
|
|
696 |
The actual values of the marks are not too important,
|
|
|
697 |
but the relative values are.
|
|
|
698 |
The goal is to compile the harder
|
|
|
699 |
(larger mark)
|
|
|
700 |
subtree first.
|
|
|
701 |
.NH 2
|
|
|
702 |
Code generation
|
|
|
703 |
.LP
|
|
|
704 |
Code is generated by recursive
|
|
|
705 |
descent.
|
|
|
706 |
The Sethi-Ullman complexity completely guides the
|
|
|
707 |
order.
|
|
|
708 |
The addressability defines the leaves.
|
|
|
709 |
The only difficult part is compiling a tree
|
|
|
710 |
that has two infinite (function call)
|
|
|
711 |
subtrees.
|
|
|
712 |
In this case,
|
|
|
713 |
one subtree is compiled into the return register
|
|
|
714 |
(usually the most convenient place for a function call)
|
|
|
715 |
and then stored on the stack.
|
|
|
716 |
The other subtree is compiled into the return register
|
|
|
717 |
and then the operation is compiled with
|
|
|
718 |
operands from the stack and the return register.
|
|
|
719 |
.LP
|
|
|
720 |
There is a separate boolean code generator that compiles
|
|
|
721 |
conditional expressions.
|
|
|
722 |
This is fundamentally different from compiling an arithmetic expression.
|
|
|
723 |
The result of the boolean code generator is the
|
|
|
724 |
position of the program counter and not an expression.
|
|
|
725 |
The boolean code generator makes extensive use of De Morgan's rule.
|
|
|
726 |
The boolean code generator is an expanded version of that described
|
|
|
727 |
in chapter 8 of Aho, Sethi, and Ullman
|
|
|
728 |
[Aho87].
|
|
|
729 |
.LP
|
|
|
730 |
There is a considerable amount of talk in the literature
|
|
|
731 |
about automating this part of a compiler with a machine
|
|
|
732 |
description.
|
|
|
733 |
Since this code generator is so small
|
|
|
734 |
(less than 500 lines of C)
|
|
|
735 |
and easy,
|
|
|
736 |
it hardly seems worth the effort.
|
|
|
737 |
.NH 2
|
|
|
738 |
Registerization
|
|
|
739 |
.LP
|
|
|
740 |
Up to now,
|
|
|
741 |
the compiler has operated on syntax trees
|
|
|
742 |
that are roughly equivalent to the original source language.
|
|
|
743 |
The previous pass has produced machine language in an internal
|
|
|
744 |
format.
|
|
|
745 |
The next two passes operate on the internal machine language
|
|
|
746 |
structures.
|
|
|
747 |
The purpose of the next pass is to reintroduce
|
|
|
748 |
registers for heavily used variables.
|
|
|
749 |
.LP
|
|
|
750 |
All of the variables that can be
|
|
|
751 |
potentially registerized within a procedure are
|
|
|
752 |
placed in a table.
|
|
|
753 |
(Suitable variables are any automatic or external
|
|
|
754 |
scalars that do not have their addresses extracted.
|
|
|
755 |
Some constants that are hard to reference are also
|
|
|
756 |
considered for registerization.)
|
|
|
757 |
Four separate data flow equations are evaluated
|
|
|
758 |
over the procedure on all of these variables.
|
|
|
759 |
Two of the equations are the normal set-behind
|
|
|
760 |
and used-ahead
|
|
|
761 |
bits that define the life of a variable.
|
|
|
762 |
The two new bits tell if a variable life
|
|
|
763 |
crosses a function call ahead or behind.
|
|
|
764 |
By examining a variable over its lifetime,
|
|
|
765 |
it is possible to get a cost
|
|
|
766 |
for registerizing.
|
|
|
767 |
Loops are detected and the costs are multiplied
|
|
|
768 |
by three for every level of loop nesting.
|
|
|
769 |
Costs are sorted and the variables
|
|
|
770 |
are replaced by available registers on a greedy basis.
|
|
|
771 |
.LP
|
|
|
772 |
The 68020 has two different
|
|
|
773 |
types of registers.
|
|
|
774 |
For the 68020,
|
|
|
775 |
two different costs are calculated for
|
|
|
776 |
each variable life and the register type that
|
|
|
777 |
affords the better cost is used.
|
|
|
778 |
Ties are broken by counting the number of available
|
|
|
779 |
registers of each type.
|
|
|
780 |
.LP
|
|
|
781 |
Note that externals are registerized together with automatics.
|
|
|
782 |
This is done by evaluating the semantics of a ``call'' instruction
|
|
|
783 |
differently for externals and automatics.
|
|
|
784 |
Since a call goes outside the local procedure,
|
|
|
785 |
it is assumed that a call references all externals.
|
|
|
786 |
Similarly,
|
|
|
787 |
externals are assumed to be set before an ``entry'' instruction
|
|
|
788 |
and assumed to be referenced after a ``return'' instruction.
|
|
|
789 |
This makes sure that externals are in memory across calls.
|
|
|
790 |
.LP
|
|
|
791 |
The overall results are satisfactory.
|
|
|
792 |
It would be nice to be able to do this processing in
|
|
|
793 |
a machine-independent way,
|
|
|
794 |
but it is impossible to get all of the costs and
|
|
|
795 |
side effects of different choices by examining the parse tree.
|
|
|
796 |
.LP
|
|
|
797 |
Most of the code in the registerization pass is machine-independent.
|
|
|
798 |
The major machine-dependency is in
|
|
|
799 |
examining a machine instruction to ask if it sets or references
|
|
|
800 |
a variable.
|
|
|
801 |
.NH 2
|
|
|
802 |
Machine code optimization
|
|
|
803 |
.LP
|
|
|
804 |
The next pass walks the machine code
|
|
|
805 |
for opportunistic optimizations.
|
|
|
806 |
For the most part,
|
|
|
807 |
this is highly specific to a particular
|
|
|
808 |
processor.
|
|
|
809 |
One optimization that is performed
|
|
|
810 |
on all of the processors is the
|
|
|
811 |
removal of unnecessary ``move''
|
|
|
812 |
instructions.
|
|
|
813 |
Ironically,
|
|
|
814 |
most of these instructions were inserted by
|
|
|
815 |
the previous pass.
|
|
|
816 |
There are two patterns that are repetitively
|
|
|
817 |
matched and replaced until no more matches are
|
|
|
818 |
found.
|
|
|
819 |
The first tries to remove ``move'' instructions
|
|
|
820 |
by relabeling variables.
|
|
|
821 |
.LP
|
|
|
822 |
When a ``move'' instruction is encountered,
|
|
|
823 |
if the destination variable is set before the
|
|
|
824 |
source variable is referenced,
|
|
|
825 |
then all of the references to the destination
|
|
|
826 |
variable can be renamed to the source and the ``move''
|
|
|
827 |
can be deleted.
|
|
|
828 |
This transformation uses the reverse data flow
|
|
|
829 |
set up in the previous pass.
|
|
|
830 |
.LP
|
|
|
831 |
An example of this pattern is depicted in the following
|
|
|
832 |
table.
|
|
|
833 |
The pattern is in the left column and the
|
|
|
834 |
replacement action is in the right column.
|
|
|
835 |
.DS
|
|
|
836 |
.CW
|
|
|
837 |
.ta .1i .6i 1.6i 2.1i 2.6i
|
|
|
838 |
MOVE a->b \fR(remove)\fP
|
|
|
839 |
.R
|
|
|
840 |
(sequence with no mention of \f(CWa\fP)
|
|
|
841 |
.CW
|
|
|
842 |
USE b USE a
|
|
|
843 |
.R
|
|
|
844 |
(sequence with no mention of \f(CWa\fP)
|
|
|
845 |
.CW
|
|
|
846 |
SET b SET b
|
|
|
847 |
.DE
|
|
|
848 |
.LP
|
|
|
849 |
Experiments have shown that it is marginally
|
|
|
850 |
worthwhile to rename uses of the destination variable
|
|
|
851 |
with uses of the source variable up to
|
|
|
852 |
the first use of the source variable.
|
|
|
853 |
.LP
|
|
|
854 |
The second transform will do relabeling
|
|
|
855 |
without deleting instructions.
|
|
|
856 |
When a ``move'' instruction is encountered,
|
|
|
857 |
if the source variable has been set prior
|
|
|
858 |
to the use of the destination variable
|
|
|
859 |
then all of the references to the source
|
|
|
860 |
variable are replaced by the destination and
|
|
|
861 |
the ``move'' is inverted.
|
|
|
862 |
Typically,
|
|
|
863 |
this transformation will alter two ``move''
|
|
|
864 |
instructions and allow the first transformation
|
|
|
865 |
another chance to remove code.
|
|
|
866 |
This transformation uses the forward data flow
|
|
|
867 |
set up in the previous pass.
|
|
|
868 |
.LP
|
|
|
869 |
Again,
|
|
|
870 |
the following is a depiction of the transformation where
|
|
|
871 |
the pattern is in the left column and the
|
|
|
872 |
rewrite is in the right column.
|
|
|
873 |
.DS
|
|
|
874 |
.CW
|
|
|
875 |
.ta .1i .6i 1.6i 2.1i 2.6i
|
|
|
876 |
SET a SET b
|
|
|
877 |
.R
|
|
|
878 |
(sequence with no use of \f(CWb\fP)
|
|
|
879 |
.CW
|
|
|
880 |
USE a USE b
|
|
|
881 |
.R
|
|
|
882 |
(sequence with no use of \f(CWb\fP)
|
|
|
883 |
.CW
|
|
|
884 |
MOVE a->b MOVE b->a
|
|
|
885 |
.DE
|
|
|
886 |
Iterating these transformations
|
|
|
887 |
will usually get rid of all redundant ``move'' instructions.
|
|
|
888 |
.LP
|
|
|
889 |
A problem with this organization is that the costs
|
|
|
890 |
of registerization calculated in the previous pass
|
|
|
891 |
must depend on how well this pass can detect and remove
|
|
|
892 |
redundant instructions.
|
|
|
893 |
Often,
|
|
|
894 |
a fine candidate for registerization is rejected
|
|
|
895 |
because of the cost of instructions that are later
|
|
|
896 |
removed.
|
|
|
897 |
.NH 2
|
|
|
898 |
Writing the object file
|
|
|
899 |
.LP
|
|
|
900 |
The last pass walks the internal assembly language
|
|
|
901 |
and writes the object file.
|
|
|
902 |
The object file is reduced in size by about a factor
|
|
|
903 |
of three with simple compression
|
|
|
904 |
techniques.
|
|
|
905 |
The most important aspect of the object file
|
|
|
906 |
format is that it is independent of the compiling machine.
|
|
|
907 |
All integer and floating numbers in the object
|
|
|
908 |
code are converted to known formats and byte
|
|
|
909 |
orders.
|
|
|
910 |
.NH
|
|
|
911 |
The loader
|
|
|
912 |
.LP
|
|
|
913 |
The loader is a multiple pass program that
|
|
|
914 |
reads object files and libraries and produces
|
|
|
915 |
an executable binary.
|
|
|
916 |
The loader also does some minimal
|
|
|
917 |
optimizations and code rewriting.
|
|
|
918 |
Many of the operations performed by the
|
|
|
919 |
loader are machine-dependent.
|
|
|
920 |
.LP
|
|
|
921 |
The first pass of the loader reads the
|
|
|
922 |
object modules into an internal data
|
|
|
923 |
structure that looks like binary assembly language.
|
|
|
924 |
As the instructions are read,
|
|
|
925 |
code is reordered to remove
|
|
|
926 |
unconditional branch instructions.
|
|
|
927 |
Conditional branch instructions are inverted
|
|
|
928 |
to prevent the insertion of unconditional branches.
|
|
|
929 |
The loader will also make a copy of a few instructions
|
|
|
930 |
to remove an unconditional branch.
|
|
|
931 |
.LP
|
|
|
932 |
The next pass allocates addresses for
|
|
|
933 |
all external data.
|
|
|
934 |
Typical of processors is the MIPS,
|
|
|
935 |
which can reference ±32K bytes from a
|
|
|
936 |
register.
|
|
|
937 |
The loader allocates the register
|
|
|
938 |
.CW R30
|
|
|
939 |
as the static pointer.
|
|
|
940 |
The value placed in
|
|
|
941 |
.CW R30
|
|
|
942 |
is the base of the data segment plus 32K.
|
|
|
943 |
It is then cheap to reference all data in the
|
|
|
944 |
first 64K of the data segment.
|
|
|
945 |
External variables are allocated to
|
|
|
946 |
the data segment
|
|
|
947 |
with the smallest variables allocated first.
|
|
|
948 |
If all of the data cannot fit into the first
|
|
|
949 |
64K of the data segment,
|
|
|
950 |
then usually only a few large arrays
|
|
|
951 |
need more expensive addressing modes.
|
|
|
952 |
.LP
|
|
|
953 |
For the MIPS processor,
|
|
|
954 |
the loader makes a pass over the internal
|
|
|
955 |
structures,
|
|
|
956 |
exchanging instructions to try
|
|
|
957 |
to fill ``delay slots'' with useful work.
|
|
|
958 |
If a useful instruction cannot be found
|
|
|
959 |
to fill a delay slot,
|
|
|
960 |
the loader will insert
|
|
|
961 |
``noop''
|
|
|
962 |
instructions.
|
|
|
963 |
This pass is very expensive and does not
|
|
|
964 |
do a good job.
|
|
|
965 |
About 40% of all instructions are in
|
|
|
966 |
delay slots.
|
|
|
967 |
About 65% of these are useful instructions and
|
|
|
968 |
35% are ``noops.''
|
|
|
969 |
The vendor-supplied assembler does this job
|
|
|
970 |
more effectively,
|
|
|
971 |
filling about 80%
|
|
|
972 |
of the delay slots with useful instructions.
|
|
|
973 |
.LP
|
|
|
974 |
On the 68020 processor,
|
|
|
975 |
branch instructions come in a variety of
|
|
|
976 |
sizes depending on the relative distance
|
|
|
977 |
of the branch.
|
|
|
978 |
Thus the size of branch instructions
|
|
|
979 |
can be mutually dependent.
|
|
|
980 |
The loader uses a multiple pass algorithm
|
|
|
981 |
to resolve the branch lengths
|
|
|
982 |
[Szy78].
|
|
|
983 |
Initially, all branches are assumed minimal length.
|
|
|
984 |
On each subsequent pass,
|
|
|
985 |
the branches are reassessed
|
|
|
986 |
and expanded if necessary.
|
|
|
987 |
When no more expansions occur,
|
|
|
988 |
the locations of the instructions in
|
|
|
989 |
the text segment are known.
|
|
|
990 |
.LP
|
|
|
991 |
On the MIPS processor,
|
|
|
992 |
all instructions are one size.
|
|
|
993 |
A single pass over the instructions will
|
|
|
994 |
determine the locations of all addresses
|
|
|
995 |
in the text segment.
|
|
|
996 |
.LP
|
|
|
997 |
The last pass of the loader produces the
|
|
|
998 |
executable binary.
|
|
|
999 |
A symbol table and other tables are
|
|
|
1000 |
produced to help the debugger to
|
|
|
1001 |
interpret the binary symbolically.
|
|
|
1002 |
.LP
|
|
|
1003 |
The loader places absolute source line numbers in the symbol table.
|
|
|
1004 |
The name and absolute line number of all
|
|
|
1005 |
.CW #include
|
|
|
1006 |
files is also placed in the
|
|
|
1007 |
symbol table so that the debuggers can
|
|
|
1008 |
associate object code to source files.
|
|
|
1009 |
.NH
|
|
|
1010 |
Performance
|
|
|
1011 |
.LP
|
|
|
1012 |
The following is a table of the source size of the MIPS
|
|
|
1013 |
compiler.
|
|
|
1014 |
.DS
|
|
|
1015 |
.ta .1i .6i
|
|
|
1016 |
lines module
|
|
|
1017 |
\0509 machine-independent headers
|
|
|
1018 |
1070 machine-independent YACC source
|
|
|
1019 |
6090 machine-independent C source
|
|
|
1020 |
|
|
|
1021 |
\0545 machine-dependent headers
|
|
|
1022 |
6532 machine-dependent C source
|
|
|
1023 |
|
|
|
1024 |
\0298 loader headers
|
|
|
1025 |
5215 loader C source
|
|
|
1026 |
.DE
|
|
|
1027 |
.LP
|
|
|
1028 |
The following table shows timing
|
|
|
1029 |
of a test program
|
|
|
1030 |
that plays checkers, running on a MIPS R4000.
|
|
|
1031 |
The test program is 26 files totaling 12600 lines of C.
|
|
|
1032 |
The execution time does not significantly
|
|
|
1033 |
depend on library implementation.
|
|
|
1034 |
Since no other compiler runs on Plan 9,
|
|
|
1035 |
the Plan 9 tests were done with the Plan 9 operating system;
|
|
|
1036 |
the other tests were done on the vendor's operating system.
|
|
|
1037 |
The hardware was identical in both cases.
|
|
|
1038 |
The optimizer in the vendor's compiler
|
|
|
1039 |
is reputed to be extremely good.
|
|
|
1040 |
.DS
|
|
|
1041 |
.ta .1i .9i
|
|
|
1042 |
\0\04.49s Plan 9 \f(CWvc\fP \f(CW-N\fP compile time (opposite of \f(CW-O\fP)
|
|
|
1043 |
\0\01.72s Plan 9 \f(CWvc\fP \f(CW-N\fP load time
|
|
|
1044 |
148.69s Plan 9 \f(CWvc\fP \f(CW-N\fP run time
|
|
|
1045 |
|
|
|
1046 |
\015.07s Plan 9 \f(CWvc\fP compile time (\f(CW-O\fP implicit)
|
|
|
1047 |
\0\01.66s Plan 9 \f(CWvc\fP load time
|
|
|
1048 |
\089.96s Plan 9 \f(CWvc\fP run time
|
|
|
1049 |
|
|
|
1050 |
\014.83s vendor \f(CWcc\fP compile time
|
|
|
1051 |
\0\00.38s vendor \f(CWcc\fP load time
|
|
|
1052 |
104.75s vendor \f(CWcc\fP run time
|
|
|
1053 |
|
|
|
1054 |
\043.59s vendor \f(CWcc\fP \f(CW-O\fP compile time
|
|
|
1055 |
\0\00.38s vendor \f(CWcc\fP \f(CW-O\fP load time
|
|
|
1056 |
\076.19s vendor \f(CWcc\fP \f(CW-O\fP run time
|
|
|
1057 |
|
|
|
1058 |
\0\08.19s vendor \f(CWcc\fP \f(CW-O3\fP compile time
|
|
|
1059 |
\035.97s vendor \f(CWcc\fP \f(CW-O3\fP load time
|
|
|
1060 |
\071.16s vendor \f(CWcc\fP \f(CW-O3\fP run time
|
|
|
1061 |
.DE
|
|
|
1062 |
.LP
|
|
|
1063 |
To compare the Intel compiler,
|
|
|
1064 |
a program that is about 40% bit manipulation and
|
|
|
1065 |
about 60% single precision floating point was
|
|
|
1066 |
run on the same 33 MHz 486, once under Windows
|
|
|
1067 |
compiled with the Watcom compiler, version 10.0,
|
|
|
1068 |
in 16-bit mode and once under
|
|
|
1069 |
Plan 9 in 32-bit mode.
|
|
|
1070 |
The Plan 9 execution time was 27 sec while the Windows
|
|
|
1071 |
execution time was 31 sec.
|
|
|
1072 |
.NH
|
|
|
1073 |
Conclusions
|
|
|
1074 |
.LP
|
|
|
1075 |
The new compilers compile
|
|
|
1076 |
quickly,
|
|
|
1077 |
load slowly,
|
|
|
1078 |
and produce
|
|
|
1079 |
medium quality
|
|
|
1080 |
object code.
|
|
|
1081 |
The compilers are relatively
|
|
|
1082 |
portable,
|
|
|
1083 |
requiring but a couple of weeks' work to
|
|
|
1084 |
produce a compiler for a different computer.
|
|
|
1085 |
For Plan 9,
|
|
|
1086 |
where we needed several compilers
|
|
|
1087 |
with specialized features and
|
|
|
1088 |
our own object formats,
|
|
|
1089 |
this project was indispensable.
|
|
|
1090 |
It is also necessary for us to
|
|
|
1091 |
be able to freely distribute our compilers
|
|
|
1092 |
with the Plan 9 distribution.
|
|
|
1093 |
.LP
|
|
|
1094 |
Two problems have come up in retrospect.
|
|
|
1095 |
The first has to do with the
|
|
|
1096 |
division of labor between compiler and loader.
|
|
|
1097 |
Plan 9 runs on multi-processors and as such
|
|
|
1098 |
compilations are often done in parallel.
|
|
|
1099 |
Unfortunately,
|
|
|
1100 |
all compilations must be complete before loading
|
|
|
1101 |
can begin.
|
|
|
1102 |
The load is then single-threaded.
|
|
|
1103 |
With this model,
|
|
|
1104 |
any shift of work from compile to load
|
|
|
1105 |
results in a significant increase in real time.
|
|
|
1106 |
The same is true of libraries that are compiled
|
|
|
1107 |
infrequently and loaded often.
|
|
|
1108 |
In the future,
|
|
|
1109 |
we may try to put some of the loader work
|
|
|
1110 |
back into the compiler.
|
|
|
1111 |
.LP
|
|
|
1112 |
The second problem comes from
|
|
|
1113 |
the various optimizations performed over several
|
|
|
1114 |
passes.
|
|
|
1115 |
Often optimizations in different passes depend
|
|
|
1116 |
on each other.
|
|
|
1117 |
Iterating the passes could compromise efficiency,
|
|
|
1118 |
or even loop.
|
|
|
1119 |
We see no real solution to this problem.
|
|
|
1120 |
.NH
|
|
|
1121 |
References
|
|
|
1122 |
.LP
|
|
|
1123 |
[Aho87] A. V. Aho, R. Sethi, and J. D. Ullman,
|
|
|
1124 |
.I
|
|
|
1125 |
Compilers \- Principles, Techniques, and Tools,
|
|
|
1126 |
.R
|
|
|
1127 |
Addison Wesley,
|
|
|
1128 |
Reading, MA,
|
|
|
1129 |
1987.
|
|
|
1130 |
.LP
|
|
|
1131 |
[ANSI90] \f2American National Standard for Information Systems \-
|
|
|
1132 |
Programming Language C\f1, American National Standards Institute, Inc.,
|
|
|
1133 |
New York, 1990.
|
|
|
1134 |
.LP
|
|
|
1135 |
[Dav91] J. W. Davidson and D. B. Whalley,
|
|
|
1136 |
``Methods for Saving and Restoring Register Values across Function Calls'',
|
|
|
1137 |
.I
|
|
|
1138 |
Software\-Practice and Experience,
|
|
|
1139 |
.R
|
|
|
1140 |
Vol 21(2), pp. 149-165, February 1991.
|
|
|
1141 |
.LP
|
|
|
1142 |
[Joh79] S. C. Johnson,
|
|
|
1143 |
``YACC \- Yet Another Compiler Compiler'',
|
|
|
1144 |
.I
|
|
|
1145 |
UNIX Programmer's Manual, Seventh Ed., Vol. 2A,
|
|
|
1146 |
.R
|
|
|
1147 |
AT&T Bell Laboratories,
|
|
|
1148 |
Murray Hill, NJ,
|
|
|
1149 |
1979.
|
|
|
1150 |
.LP
|
|
|
1151 |
[Set70] R. Sethi and J. D. Ullman,
|
|
|
1152 |
``The Generation of Optimal Code for Arithmetic Expressions'',
|
|
|
1153 |
.I
|
|
|
1154 |
Journal of the ACM,
|
|
|
1155 |
.R
|
|
|
1156 |
Vol 17(4), pp. 715-728, 1970.
|
|
|
1157 |
.LP
|
|
|
1158 |
[Szy78] T. G. Szymanski,
|
|
|
1159 |
``Assembling Code for Machines with Span-dependent Instructions'',
|
|
|
1160 |
.I
|
|
|
1161 |
Communications of the ACM,
|
|
|
1162 |
.R
|
|
|
1163 |
Vol 21(4), pp. 300-308, 1978.
|