WebSVN – planix.SVN – Blame – /os/trunk/sys/doc/compiler.ms

Rev	Author	Line No.	Line
2	-	1	`.HTML "Plan 9 C Compilers`
		2	`.TL`
		3	`Plan 9 C Compilers`
		4	`.AU`
		5	`Ken Thompson`
		6	`ken@plan9.bell-labs.com`
		7	`.AB`
		8	`.FS`
		9	`Originally appeared, in a different form, in`
		10	`.I`
		11	`Proceedings of the Summer 1990 UKUUG Conference,`
		12	`.R`
		13	`pp. 41-51,`
		14	`London, 1990.`
		15	`.FE`
		16	`This paper describes the overall structure and function of the Plan 9 C compilers.`
		17	`A more detailed implementation document`
		18	`for any one of the compilers`
		19	`is yet to be written.`
		20	`.AE`
		21	`.NH`
		22	`Introduction`
		23	`.LP`
		24	`There are many compilers in the series.`
		25	`Six of the compilers`
		26	`(Intel 386, AMD64, PowerPC, PowerPC 64-bit, ARM, MIPS R3000)`
		27	`are considered active and are used to compile`
		28	`current versions of Plan 9.`
		29	`One of the compilers (SPARC)`
		30	`is maintained but is for older machines`
		31	`for which we have no current ports of Plan 9;`
		32	`we are unlikely to port to any SPARC machines.`
		33	`The DEC Alpha and Motorola 68020 compilers have been retired.`
		34	`Several others (Motorola 68000, Intel 960, AMD 29000)`
		35	`have had only limited use, such as`
		36	`to program peripherals or experimental devices.`
		37	`Any of the disused compilers could be restored if needed.`
		38	`.NH`
		39	`Structure`
		40	`.LP`
		41	`The compiler is a single program that produces an`
		42	`object file.`
		43	`Combined in the compiler are the traditional`
		44	`roles of preprocessor, lexical analyzer, parser, code generator,`
		45	`local optimizer,`
		46	`and first half of the assembler.`
		47	`The object files are binary forms of assembly`
		48	`language,`
		49	`similar to what might be passed between`
		50	`the first and second passes of an assembler.`
		51	`.LP`
		52	`Object files and libraries`
		53	`are combined by a loader`
		54	`program to produce the executable binary.`
		55	`The loader combines the roles of second half`
		56	`of the assembler, global optimizer, and loader.`
		57	`The names of the compilers, loaders, and assemblers`
		58	`are as follows:`
		59	`.DS`
		60	`.ta 1.5i`
		61	`.de Ta`
		62	`\\$1 \f(CW\\$2\fP \f(CW\\$3\fP \f(CW\\$4\fP`
		63	`..`
		64	`.Ta SPARC kc kl ka`
		65	`.Ta PowerPC qc ql qa`
		66	`.Ta MIPS vc vl va`
		67	`.Ta MIPS\ little-endian 0c 0l 0a`
		68	`.Ta ARM 5c 5l 5a`
		69	`.Ta AMD64 6c 6l 6a`
		70	`.Ta Intel\ 386 8c 8l 8a`
		71	`.Ta PowerPC\ 64-bit 9c 9l 9a`
		72	`.DE`
		73	`There is a further breakdown`
		74	`in the source of the compilers into`
		75	`object-independent and`
		76	`object-dependent`
		77	`parts.`
		78	`All of the object-independent parts`
		79	`are combined into source files in the`
		80	`directory`
		81	`.CW /sys/src/cmd/cc .`
		82	`The object-dependent parts are collected`
		83	`in a separate directory for each compiler,`
		84	`for example`
		85	`.CW /sys/src/cmd/vc .`
		86	`All of the code,`
		87	`both object-independent and`
		88	`object-dependent,`
		89	`is machine-independent`
		90	`and may be cross-compiled and executed on any`
		91	`of the architectures.`
		92	`.NH`
		93	`The Language`
		94	`.LP`
		95	`The compiler implements ANSI C with some`
		96	`restrictions and extensions`
		97	`[ANSI90].`
		98	`Most of the restrictions are due to`
		99	`personal preference, while`
		100	`most of the extensions were to help in`
		101	`the implementation of Plan 9.`
		102	`There are other departures from the standard,`
		103	`particularly in the libraries,`
		104	`that are beyond the scope of this`
		105	`paper.`
		106	`.NH 2`
		107	`Register, volatile, const`
		108	`.LP`
		109	`The keyword`
		110	`.CW register`
		111	`is recognized syntactically`
		112	`but is semantically ignored.`
		113	`Thus taking the address of a`
		114	`.CW register`
		115	`variable is not diagnosed.`
		116	`The keyword`
		117	`.CW volatile`
		118	`disables all optimizations, in particular registerization, of the corresponding variable.`
		119	`The keyword`
		120	`.CW const`
		121	`generates warnings (if warnings are enabled by the compiler's`
		122	`.CW -w`
		123	`option) of non-constant use of the variable,`
		124	`but does not affect the generated code.`
		125	`.NH 2`
		126	`The preprocessor`
		127	`.LP`
		128	`The C preprocessor is probably the`
		129	`biggest departure from the ANSI standard.`
		130	`.LP`
		131	`The preprocessor built into the Plan 9 compilers does not support`
		132	`.CW #if ,`
		133	`although it does handle`
		134	`.CW #ifdef`
		135	`and`
		136	`.CW #include .`
		137	`If it is necessary to be more standard,`
		138	`the source text can first be run through the separate ANSI C`
		139	`preprocessor,`
		140	`.CW cpp .`
		141	`.NH 2`
		142	`Unnamed substructures`
		143	`.LP`
		144	`The most important and most heavily used of the`
		145	`extensions is the declaration of an`
		146	`unnamed substructure or subunion.`
		147	`For example:`
		148	`.DS`
		149	`.CW`
		150	`.ta .1i .6i 1.1i 1.6i`
		151	`typedef`
		152	`struct lock`
		153	`{`
		154	`int locked;`
		155	`} Lock;`
		156
		157	`typedef`
		158	`struct node`
		159	`{`
		160	`int type;`
		161	`union`
		162	`{`
		163	`double dval;`
		164	`float fval;`
		165	`long lval;`
		166	`};`
		167	`Lock;`
		168	`} Node;`
		169
		170	`Lock* lock;`
		171	`Node* node;`
		172	`.DE`
		173	`The declaration of`
		174	`.CW Node`
		175	`has an unnamed substructure of type`
		176	`.CW Lock`
		177	`and an unnamed subunion.`
		178	`One use of this feature allows references to elements of the`
		179	`subunit to be accessed as if they were in`
		180	`the outer structure.`
		181	`Thus`
		182	`.CW node->dval`
		183	`and`
		184	`.CW node->locked`
		185	`are legitimate references.`
		186	`.LP`
		187	`When an outer structure is used`
		188	`in a context that is only legal for`
		189	`an unnamed substructure,`
		190	`the compiler promotes the reference to the`
		191	`unnamed substructure.`
		192	`This is true for references to structures and`
		193	`to references to pointers to structures.`
		194	`This happens in assignment statements and`
		195	`in argument passing where prototypes have been`
		196	`declared.`
		197	`Thus, continuing with the example,`
		198	`.DS`
		199	`.CW`
		200	`.ta .1i .6i 1.1i 1.6i`
		201	`lock = node;`
		202	`.DE`
		203	`would assign a pointer to the unnamed`
		204	`.CW Lock`
		205	`in`
		206	`the`
		207	`.CW Node`
		208	`to the variable`
		209	`.CW lock .`
		210	`Another example,`
		211	`.DS`
		212	`.CW`
		213	`.ta .1i .6i 1.1i 1.6i`
		214	`extern void lock(Lock*);`
		215	`func(...)`
		216	`{`
		217	`...`
		218	`lock(node);`
		219	`...`
		220	`}`
		221	`.DE`
		222	`will pass a pointer to the`
		223	`.CW Lock`
		224	`substructure.`
		225	`.LP`
		226	`Finally, in places where context is insufficient to identify the unnamed structure,`
		227	`the type name (it must be a`
		228	`.CW typedef )`
		229	`of the unnamed structure can be used as an identifier.`
		230	`In our example,`
		231	`.CW &node->Lock`
		232	`gives the address of the anonymous`
		233	`.CW Lock`
		234	`structure.`
		235	`.NH 2`
		236	`Structure displays`
		237	`.LP`
		238	`A structure cast followed by a list of expressions in braces is`
		239	`an expression with the type of the structure and elements assigned from`
		240	`the corresponding list.`
		241	`Structures are now almost first-class citizens of the language.`
		242	`It is common to see code like this:`
		243	`.DS`
		244	`.CW`
		245	`.ta .1i`
		246	`r = (Rectangle){point1, (Point){x,y+2}};`
		247	`.DE`
		248	`.NH 2`
		249	`Initialization indexes`
		250	`.LP`
		251	`In initializers of arrays,`
		252	`one may place a constant expression`
		253	`in square brackets before an initializer.`
		254	`This causes the next initializer to assign`
		255	`the indicated element.`
		256	`For example:`
		257	`.DS`
		258	`.CW`
		259	`.ta .1i .6i 1.6i`
		260	`enum errors`
		261	`{`
		262	`Etoobig,`
		263	`Ealarm,`
		264	`Egreg`
		265	`};`
		266	`char* errstrings[] =`
		267	`{`
		268	`[Ealarm] "Alarm call",`
		269	`[Egreg] "Panic: out of mbufs",`
		270	`[Etoobig] "Arg list too long",`
		271	`};`
		272	`.DE`
		273	`In the same way,`
		274	`individual structures members may`
		275	`be initialized in any order by preceding the initialization with`
		276	`.CW .tagname .`
		277	`Both forms allow an optional`
		278	`.CW = ,`
		279	`to be compatible with a proposed`
		280	`extension to ANSI C.`
		281	`.NH 2`
		282	`External register`
		283	`.LP`
		284	`The declaration`
		285	`.CW extern`
		286	`.CW register`
		287	`will dedicate a register to`
		288	`a variable on a global basis.`
		289	`It can be used only under special circumstances.`
		290	`External register variables must be identically`
		291	`declared in all modules and`
		292	`libraries.`
		293	`The feature is not intended for efficiency,`
		294	`although it can produce efficient code;`
		295	`rather it represents a unique storage class that`
		296	`would be hard to get any other way.`
		297	`On a shared-memory multi-processor,`
		298	`an external register is`
		299	`one-per-processor and neither one-per-procedure (automatic)`
		300	`or one-per-system (external).`
		301	`It is used for two variables in the Plan 9 kernel,`
		302	`.CW u`
		303	`and`
		304	`.CW m .`
		305	`.CW U`
		306	`is a pointer to the structure representing the currently running process`
		307	`and`
		308	`.CW m`
		309	`is a pointer to the per-machine data structure.`
		310	`.NH 2`
		311	`Long long`
		312	`.LP`
		313	`The compilers accept`
		314	`.CW long`
		315	`.CW long`
		316	`as a basic type meaning 64-bit integer.`
		317	`On some of the machines`
		318	`this type is synthesized from 32-bit instructions.`
		319	`.NH 2`
		320	`Pragma`
		321	`.LP`
		322	`The compilers accept`
		323	`.CW #pragma`
		324	`.CW lib`
		325	`.I libname`
		326	`and pass the`
		327	`library name string uninterpreted`
		328	`to the loader.`
		329	`The loader uses the library name to`
		330	`find libraries to load.`
		331	`If the name contains`
		332	`.CW $O ,`
		333	`it is replaced with`
		334	`the single character object type of the compiler`
		335	`(e.g.,`
		336	`.CW v`
		337	`for the MIPS).`
		338	`If the name contains`
		339	`.CW $M ,`
		340	`it is replaced with`
		341	`the architecture type for the compiler`
		342	`(e.g.,`
		343	`.CW mips`
		344	`for the MIPS).`
		345	`If the name starts with`
		346	`.CW /`
		347	`it is an absolute pathname;`
		348	`if it starts with`
		349	`.CW .`
		350	`then it is searched for in the loader's current directory.`
		351	`Otherwise, the name is searched from`
		352	`.CW /$M/lib .`
		353	`Such`
		354	`.CW #pragma`
		355	`statements in header files guarantee that the correct`
		356	`libraries are always linked with a program without the`
		357	`need to specify them explicitly at link time.`
		358	`.LP`
		359	`They also accept`
		360	`.CW #pragma`
		361	`.CW packed`
		362	`.CW on`
		363	`(or`
		364	`.CW yes`
		365	`or`
		366	`.CW 1 )`
		367	`to cause subsequently declared data, until`
		368	`.CW #pragma`
		369	`.CW packed`
		370	`.CW off`
		371	`(or`
		372	`.CW no`
		373	`or`
		374	`.CW 0 ),`
		375	`to be laid out in memory tightly packed in successive bytes, disregarding`
		376	`the usual alignment rules.`
		377	`Accessing such data can cause faults.`
		378	`.LP`
		379	`Similarly,`
		380	`.CW #pragma`
		381	`.CW profile`
		382	`.CW off`
		383	`(or`
		384	`.CW no`
		385	`or`
		386	`.CW 0 )`
		387	`causes subsequently declared functions, until`
		388	`.CW #pragma`
		389	`.CW profile`
		390	`.CW on`
		391	`(or`
		392	`.CW yes`
		393	`or`
		394	`.CW 1 ),`
		395	`to be marked as unprofiled.`
		396	`Such functions will not be profiled when`
		397	`profiling is enabled for the rest of the program.`
		398	`.LP`
		399	`Two`
		400	`.CW #pragma`
		401	`statements allow type-checking of`
		402	`.CW print -like`
		403	`functions.`
		404	`The first, of the form`
		405	`.P1`
		406	`#pragma varargck argpos error 2`
		407	`.P2`
		408	`tells the compiler that the second argument to`
		409	`.CW error`
		410	`is a`
		411	`.CW print`
		412	`format string (see the manual page`
		413	`.I print (2))`
		414	`that specifies how to format`
		415	`.CW error 's`
		416	`subsequent arguments.`
		417	`The second, of the form`
		418	`.P1`
		419	`#pragma varargck type "s" char*`
		420	`.P2`
		421	`says that the`
		422	`.CW print`
		423	`format verb`
		424	`.CW s`
		425	`processes an argument of`
		426	`type`
		427	`.CW char* .`
		428	`If the compiler's`
		429	`.CW -F`
		430	`option is enabled, the compiler will use this information`
		431	`to report type violations in the arguments to`
		432	`.CW print ,`
		433	`.CW error ,`
		434	`and similar routines.`
		435	`.NH`
		436	`Object module conventions`
		437	`.LP`
		438	`The overall conventions of the runtime environment`
		439	`are important`
		440	`to runtime efficiency.`
		441	`In this section,`
		442	`several of these conventions are discussed.`
		443	`.NH 2`
		444	`Register saving`
		445	`.LP`
		446	`In the Plan 9 compilers,`
		447	`the caller of a procedure saves the registers.`
		448	`With caller-saves,`
		449	`the leaf procedures can use all the`
		450	`registers and never save them.`
		451	`If you spend a lot of time at the leaves,`
		452	`this seems preferable.`
		453	`With callee-saves,`
		454	`the saving of the registers is done`
		455	`in the single point of entry and return.`
		456	`If you are interested in space,`
		457	`this seems preferable.`
		458	`In both,`
		459	`there is a degree of uncertainty`
		460	`about what registers need to be saved.`
		461	`Callee-saved registers make it difficult to`
		462	`find variables in registers in debuggers.`
		463	`Callee-saved registers also complicate`
		464	`the implementation of`
		465	`.CW longjmp .`
		466	`The convincing argument is`
		467	`that with caller-saves,`
		468	`the decision to registerize a variable`
		469	`can include the cost of saving the register`
		470	`across calls.`
		471	`For a further discussion of caller- vs. callee-saves,`
		472	`see the paper by Davidson and Whalley [Dav91].`
		473	`.LP`
		474	`In the Plan 9 operating system,`
		475	`calls to the kernel look like normal procedure`
		476	`calls, which means`
		477	`the caller`
		478	`has saved the registers and the system`
		479	`entry does not have to.`
		480	`This makes system calls considerably faster.`
		481	`Since this is a potential security hole,`
		482	`and can lead to non-determinism,`
		483	`the system may eventually save the registers`
		484	`on entry,`
		485	`or more likely clear the registers on return.`
		486	`.NH 2`
		487	`Calling convention`
		488	`.LP`
		489	`Older C compilers maintain a frame pointer, which is at a known constant`
		490	`offset from the stack pointer within each function.`
		491	`For machines where the stack grows towards zero,`
		492	`the argument pointer is at a known constant offset`
		493	`from the frame pointer.`
		494	`Since the stack grows down in Plan 9,`
		495	`the Plan 9 compilers`
		496	`keep neither an`
		497	`explicit frame pointer nor`
		498	`an explicit argument pointer;`
		499	`instead they generate addresses relative to the stack pointer.`
		500	`.LP`
		501	`On some architectures, the first argument to a subroutine is passed in a register.`
		502	`.NH 2`
		503	`Functions returning structures`
		504	`.LP`
		505	`Structures longer than one word are awkward to implement`
		506	`since they do not fit in registers and must`
		507	`be passed around in memory.`
		508	`Functions that return structures`
		509	`are particularly clumsy.`
		510	`The Plan 9 compilers pass the return address of`
		511	`a structure as the first argument of a`
		512	`function that has a structure return value.`
		513	`Thus`
		514	`.DS`
		515	`.CW`
		516	`.ta .1i .6i 1.1i 1.6i`
		517	`x = f(...)`
		518	`.DE`
		519	`is rewritten as`
		520	`.DS`
		521	`.CW`
		522	`.ta .1i .6i 1.1i 1.6i`
		523	`f(&x, ...)\f1.`
		524	`.DE`
		525	`This saves a copy and makes the compilation`
		526	`much less clumsy.`
		527	`A disadvantage is that if you call this`
		528	`function without an assignment,`
		529	`a dummy location must be invented.`
		530	`.LP`
		531	`There is also a danger of calling a function`
		532	`that returns a structure without declaring`
		533	`it as such.`
		534	`With ANSI C function prototypes,`
		535	`this error need never occur.`
		536	`.NH`
		537	`Implementation`
		538	`.LP`
		539	`The compiler is divided internally into`
		540	`four machine-independent passes,`
		541	`four machine-dependent passes,`
		542	`and an output pass.`
		543	`The next nine sections describe each pass in order.`
		544	`.NH 2`
		545	`Parsing`
		546	`.LP`
		547	`The first pass is a YACC-based parser`
		548	`[Joh79].`
		549	`Declarations are interpreted immediately,`
		550	`building a block structured symbol table.`
		551	`Executable statements are put into a parse tree`
		552	`and collected,`
		553	`without interpretation.`
		554	`At the end of each procedure,`
		555	`the parse tree for the function is`
		556	`examined by the other passes of the compiler.`
		557	`.LP`
		558	`The input stream of the parser is`
		559	`a pushdown list of input activations.`
		560	`The preprocessor`
		561	`expansions of`
		562	`macros`
		563	`and`
		564	`.CW #include`
		565	`are implemented as pushdowns.`
		566	`Thus there is no separate`
		567	`pass for preprocessing.`
		568	`.NH 2`
		569	`Typing`
		570	`.LP`
		571	`The next pass distributes typing information`
		572	`to every node of the tree.`
		573	`Implicit operations on the tree are added,`
		574	`such as type promotions and taking the`
		575	`address of arrays and functions.`
		576	`.NH 2`
		577	`Machine-independent optimization`
		578	`.LP`
		579	`The next pass performs optimizations`
		580	`and transformations of the tree, such as converting`
		581	`.CW &*x`
		582	`and`
		583	`.CW *&x`
		584	`into`
		585	`.CW x .`
		586	`Constant expressions are converted to constants in this pass.`
		587	`.NH 2`
		588	`Arithmetic rewrites`
		589	`.LP`
		590	`This is another machine-independent optimization.`
		591	`Subtrees of add, subtract, and multiply of integers are`
		592	`rewritten for easier compilation.`
		593	`The major transformation is factoring:`
		594	`.CW 4+8a+16b+5`
		595	`is transformed into`
		596	`.CW 9+8(a+2b) .`
		597	`Such expressions arise from address`
		598	`manipulation and array indexing.`
		599	`.NH 2`
		600	`Addressability`
		601	`.LP`
		602	`This is the first of the machine-dependent passes.`
		603	`The addressability of a processor is defined as the set of`
		604	`expressions that is legal in the address field`
		605	`of a machine language instruction.`
		606	`The addressability of different processors varies widely.`
		607	`At one end of the spectrum are the 68020 and VAX,`
		608	`which allow a complex mix of incrementing,`
		609	`decrementing,`
		610	`indexing, and relative addressing.`
		611	`At the other end is the MIPS,`
		612	`which allows only registers and constant offsets from the`
		613	`contents of a register.`
		614	`The addressability can be different for different instructions`
		615	`within the same processor.`
		616	`.LP`
		617	`It is important to the code generator to know when a`
		618	`subtree represents an address of a particular type.`
		619	`This is done with a bottom-up walk of the tree.`
		620	`In this pass, the leaves are labeled with small integers.`
		621	`When an internal node is encountered,`
		622	`it is labeled by consulting a table indexed by the`
		623	`labels on the left and right subtrees.`
		624	`For example,`
		625	`on the 68020 processor,`
		626	`it is possible to address an`
		627	`offset from a named location.`
		628	`In C, this is represented by the expression`
		629	`.CW *(&name+constant) .`
		630	`This is marked addressable by the following table.`
		631	`In the table,`
		632	`a node represented by the left column is marked`
		633	`with a small integer from the right column.`
		634	`Marks of the form`
		635	`.CW A\s-2\di\u\s0`
		636	`are addressable while`
		637	`marks of the form`
		638	`.CW N\s-2\di\u\s0`
		639	`are not addressable.`
		640	`.DS`
		641	`.B`
		642	`.ta .1i 1.1i`
		643	`Node Marked`
		644	`.CW`
		645	`name A\s-2\d1\u\s0`
		646	`const A\s-2\d2\u\s0`
		647	`&A\s-2\d1\u\s0 A\s-2\d3\u\s0`
		648	`A\s-2\d3\u\s0+A\s-2\d1\u\s0 N\s-2\d1\u\s0 \fR(note that this is not addressable)\fP`
		649	`*N\s-2\d1\u\s0 A\s-2\d4\u\s0`
		650	`.DE`
		651	`Here there is a distinction between`
		652	`a node marked`
		653	`.CW A\s-2\d1\u\s0`
		654	`and a node marked`
		655	`.CW A\s-2\d4\u\s0`
		656	`because the address operator of an`
		657	`.CW A\s-2\d4\u\s0`
		658	`node is not addressable.`
		659	`So to extend the table:`
		660	`.DS`
		661	`.B`
		662	`.ta .1i 1.1i`
		663	`Node Marked`
		664	`.CW`
		665	`&A\s-2\d4\u\s0 N\s-2\d2\u\s0`
		666	`N\s-2\d2\u\s0+N\s-2\d1\u\s0 N\s-2\d1\u\s0`
		667	`.DE`
		668	`The full addressability of the 68020 is expressed`
		669	`in 18 rules like this,`
		670	`while the addressability of the MIPS is expressed`
		671	`in 11 rules.`
		672	`When one ports the compiler,`
		673	`this table is usually initialized`
		674	`so that leaves are labeled as addressable and nothing else.`
		675	`The code produced is poor,`
		676	`but porting is easy.`
		677	`The table can be extended later.`
		678	`.LP`
		679	`This pass also rewrites some complex operators`
		680	`into procedure calls.`
		681	`Examples include 64-bit multiply and divide.`
		682	`.LP`
		683	`In the same bottom-up pass of the tree,`
		684	`the nodes are labeled with a Sethi-Ullman complexity`
		685	`[Set70].`
		686	`This number is roughly the number of registers required`
		687	`to compile the tree on an ideal machine.`
		688	`An addressable node is marked 0.`
		689	`A function call is marked infinite.`
		690	`A unary operator is marked as the`
		691	`maximum of 1 and the mark of its subtree.`
		692	`A binary operator with equal marks on its subtrees is`
		693	`marked with a subtree mark plus 1.`
		694	`A binary operator with unequal marks on its subtrees is`
		695	`marked with the maximum mark of its subtrees.`
		696	`The actual values of the marks are not too important,`
		697	`but the relative values are.`
		698	`The goal is to compile the harder`
		699	`(larger mark)`
		700	`subtree first.`
		701	`.NH 2`
		702	`Code generation`
		703	`.LP`
		704	`Code is generated by recursive`
		705	`descent.`
		706	`The Sethi-Ullman complexity completely guides the`
		707	`order.`
		708	`The addressability defines the leaves.`
		709	`The only difficult part is compiling a tree`
		710	`that has two infinite (function call)`
		711	`subtrees.`
		712	`In this case,`
		713	`one subtree is compiled into the return register`
		714	`(usually the most convenient place for a function call)`
		715	`and then stored on the stack.`
		716	`The other subtree is compiled into the return register`
		717	`and then the operation is compiled with`
		718	`operands from the stack and the return register.`
		719	`.LP`
		720	`There is a separate boolean code generator that compiles`
		721	`conditional expressions.`
		722	`This is fundamentally different from compiling an arithmetic expression.`
		723	`The result of the boolean code generator is the`
		724	`position of the program counter and not an expression.`
		725	`The boolean code generator makes extensive use of De Morgan's rule.`
		726	`The boolean code generator is an expanded version of that described`
		727	`in chapter 8 of Aho, Sethi, and Ullman`
		728	`[Aho87].`
		729	`.LP`
		730	`There is a considerable amount of talk in the literature`
		731	`about automating this part of a compiler with a machine`
		732	`description.`
		733	`Since this code generator is so small`
		734	`(less than 500 lines of C)`
		735	`and easy,`
		736	`it hardly seems worth the effort.`
		737	`.NH 2`
		738	`Registerization`
		739	`.LP`
		740	`Up to now,`
		741	`the compiler has operated on syntax trees`
		742	`that are roughly equivalent to the original source language.`
		743	`The previous pass has produced machine language in an internal`
		744	`format.`
		745	`The next two passes operate on the internal machine language`
		746	`structures.`
		747	`The purpose of the next pass is to reintroduce`
		748	`registers for heavily used variables.`
		749	`.LP`
		750	`All of the variables that can be`
		751	`potentially registerized within a procedure are`
		752	`placed in a table.`
		753	`(Suitable variables are any automatic or external`
		754	`scalars that do not have their addresses extracted.`
		755	`Some constants that are hard to reference are also`
		756	`considered for registerization.)`
		757	`Four separate data flow equations are evaluated`
		758	`over the procedure on all of these variables.`
		759	`Two of the equations are the normal set-behind`
		760	`and used-ahead`
		761	`bits that define the life of a variable.`
		762	`The two new bits tell if a variable life`
		763	`crosses a function call ahead or behind.`
		764	`By examining a variable over its lifetime,`
		765	`it is possible to get a cost`
		766	`for registerizing.`
		767	`Loops are detected and the costs are multiplied`
		768	`by three for every level of loop nesting.`
		769	`Costs are sorted and the variables`
		770	`are replaced by available registers on a greedy basis.`
		771	`.LP`
		772	`The 68020 has two different`
		773	`types of registers.`
		774	`For the 68020,`
		775	`two different costs are calculated for`
		776	`each variable life and the register type that`
		777	`affords the better cost is used.`
		778	`Ties are broken by counting the number of available`
		779	`registers of each type.`
		780	`.LP`
		781	`Note that externals are registerized together with automatics.`
		782	This is done by evaluating the semantics of a ``call'' instruction
		783	`differently for externals and automatics.`
		784	`Since a call goes outside the local procedure,`
		785	`it is assumed that a call references all externals.`
		786	`Similarly,`
		787	externals are assumed to be set before an ``entry'' instruction
		788	and assumed to be referenced after a ``return'' instruction.
		789	`This makes sure that externals are in memory across calls.`
		790	`.LP`
		791	`The overall results are satisfactory.`
		792	`It would be nice to be able to do this processing in`
		793	`a machine-independent way,`
		794	`but it is impossible to get all of the costs and`
		795	`side effects of different choices by examining the parse tree.`
		796	`.LP`
		797	`Most of the code in the registerization pass is machine-independent.`
		798	`The major machine-dependency is in`
		799	`examining a machine instruction to ask if it sets or references`
		800	`a variable.`
		801	`.NH 2`
		802	`Machine code optimization`
		803	`.LP`
		804	`The next pass walks the machine code`
		805	`for opportunistic optimizations.`
		806	`For the most part,`
		807	`this is highly specific to a particular`
		808	`processor.`
		809	`One optimization that is performed`
		810	`on all of the processors is the`
		811	removal of unnecessary ``move''
		812	`instructions.`
		813	`Ironically,`
		814	`most of these instructions were inserted by`
		815	`the previous pass.`
		816	`There are two patterns that are repetitively`
		817	`matched and replaced until no more matches are`
		818	`found.`
		819	The first tries to remove ``move'' instructions
		820	`by relabeling variables.`
		821	`.LP`
		822	When a ``move'' instruction is encountered,
		823	`if the destination variable is set before the`
		824	`source variable is referenced,`
		825	`then all of the references to the destination`
		826	variable can be renamed to the source and the ``move''
		827	`can be deleted.`
		828	`This transformation uses the reverse data flow`
		829	`set up in the previous pass.`
		830	`.LP`
		831	`An example of this pattern is depicted in the following`
		832	`table.`
		833	`The pattern is in the left column and the`
		834	`replacement action is in the right column.`
		835	`.DS`
		836	`.CW`
		837	`.ta .1i .6i 1.6i 2.1i 2.6i`
		838	`MOVE a->b \fR(remove)\fP`
		839	`.R`
		840	`(sequence with no mention of \f(CWa\fP)`
		841	`.CW`
		842	`USE b USE a`
		843	`.R`
		844	`(sequence with no mention of \f(CWa\fP)`
		845	`.CW`
		846	`SET b SET b`
		847	`.DE`
		848	`.LP`
		849	`Experiments have shown that it is marginally`
		850	`worthwhile to rename uses of the destination variable`
		851	`with uses of the source variable up to`
		852	`the first use of the source variable.`
		853	`.LP`
		854	`The second transform will do relabeling`
		855	`without deleting instructions.`
		856	When a ``move'' instruction is encountered,
		857	`if the source variable has been set prior`
		858	`to the use of the destination variable`
		859	`then all of the references to the source`
		860	`variable are replaced by the destination and`
		861	the ``move'' is inverted.
		862	`Typically,`
		863	this transformation will alter two ``move''
		864	`instructions and allow the first transformation`
		865	`another chance to remove code.`
		866	`This transformation uses the forward data flow`
		867	`set up in the previous pass.`
		868	`.LP`
		869	`Again,`
		870	`the following is a depiction of the transformation where`
		871	`the pattern is in the left column and the`
		872	`rewrite is in the right column.`
		873	`.DS`
		874	`.CW`
		875	`.ta .1i .6i 1.6i 2.1i 2.6i`
		876	`SET a SET b`
		877	`.R`
		878	`(sequence with no use of \f(CWb\fP)`
		879	`.CW`
		880	`USE a USE b`
		881	`.R`
		882	`(sequence with no use of \f(CWb\fP)`
		883	`.CW`
		884	`MOVE a->b MOVE b->a`
		885	`.DE`
		886	`Iterating these transformations`
		887	will usually get rid of all redundant ``move'' instructions.
		888	`.LP`
		889	`A problem with this organization is that the costs`
		890	`of registerization calculated in the previous pass`
		891	`must depend on how well this pass can detect and remove`
		892	`redundant instructions.`
		893	`Often,`
		894	`a fine candidate for registerization is rejected`
		895	`because of the cost of instructions that are later`
		896	`removed.`
		897	`.NH 2`
		898	`Writing the object file`
		899	`.LP`
		900	`The last pass walks the internal assembly language`
		901	`and writes the object file.`
		902	`The object file is reduced in size by about a factor`
		903	`of three with simple compression`
		904	`techniques.`
		905	`The most important aspect of the object file`
		906	`format is that it is independent of the compiling machine.`
		907	`All integer and floating numbers in the object`
		908	`code are converted to known formats and byte`
		909	`orders.`
		910	`.NH`
		911	`The loader`
		912	`.LP`
		913	`The loader is a multiple pass program that`
		914	`reads object files and libraries and produces`
		915	`an executable binary.`
		916	`The loader also does some minimal`
		917	`optimizations and code rewriting.`
		918	`Many of the operations performed by the`
		919	`loader are machine-dependent.`
		920	`.LP`
		921	`The first pass of the loader reads the`
		922	`object modules into an internal data`
		923	`structure that looks like binary assembly language.`
		924	`As the instructions are read,`
		925	`code is reordered to remove`
		926	`unconditional branch instructions.`
		927	`Conditional branch instructions are inverted`
		928	`to prevent the insertion of unconditional branches.`
		929	`The loader will also make a copy of a few instructions`
		930	`to remove an unconditional branch.`
		931	`.LP`
		932	`The next pass allocates addresses for`
		933	`all external data.`
		934	`Typical of processors is the MIPS,`
		935	`which can reference ±32K bytes from a`
		936	`register.`
		937	`The loader allocates the register`
		938	`.CW R30`
		939	`as the static pointer.`
		940	`The value placed in`
		941	`.CW R30`
		942	`is the base of the data segment plus 32K.`
		943	`It is then cheap to reference all data in the`
		944	`first 64K of the data segment.`
		945	`External variables are allocated to`
		946	`the data segment`
		947	`with the smallest variables allocated first.`
		948	`If all of the data cannot fit into the first`
		949	`64K of the data segment,`
		950	`then usually only a few large arrays`
		951	`need more expensive addressing modes.`
		952	`.LP`
		953	`For the MIPS processor,`
		954	`the loader makes a pass over the internal`
		955	`structures,`
		956	`exchanging instructions to try`
		957	to fill ``delay slots'' with useful work.
		958	`If a useful instruction cannot be found`
		959	`to fill a delay slot,`
		960	`the loader will insert`
		961	``noop''
		962	`instructions.`
		963	`This pass is very expensive and does not`
		964	`do a good job.`
		965	`About 40% of all instructions are in`
		966	`delay slots.`
		967	`About 65% of these are useful instructions and`
		968	35% are ``noops.''
		969	`The vendor-supplied assembler does this job`
		970	`more effectively,`
		971	`filling about 80%`
		972	`of the delay slots with useful instructions.`
		973	`.LP`
		974	`On the 68020 processor,`
		975	`branch instructions come in a variety of`
		976	`sizes depending on the relative distance`
		977	`of the branch.`
		978	`Thus the size of branch instructions`
		979	`can be mutually dependent.`
		980	`The loader uses a multiple pass algorithm`
		981	`to resolve the branch lengths`
		982	`[Szy78].`
		983	`Initially, all branches are assumed minimal length.`
		984	`On each subsequent pass,`
		985	`the branches are reassessed`
		986	`and expanded if necessary.`
		987	`When no more expansions occur,`
		988	`the locations of the instructions in`
		989	`the text segment are known.`
		990	`.LP`
		991	`On the MIPS processor,`
		992	`all instructions are one size.`
		993	`A single pass over the instructions will`
		994	`determine the locations of all addresses`
		995	`in the text segment.`
		996	`.LP`
		997	`The last pass of the loader produces the`
		998	`executable binary.`
		999	`A symbol table and other tables are`
		1000	`produced to help the debugger to`
		1001	`interpret the binary symbolically.`
		1002	`.LP`
		1003	`The loader places absolute source line numbers in the symbol table.`
		1004	`The name and absolute line number of all`
		1005	`.CW #include`
		1006	`files is also placed in the`
		1007	`symbol table so that the debuggers can`
		1008	`associate object code to source files.`
		1009	`.NH`
		1010	`Performance`
		1011	`.LP`
		1012	`The following is a table of the source size of the MIPS`
		1013	`compiler.`
		1014	`.DS`
		1015	`.ta .1i .6i`
		1016	`lines module`
		1017	`\0509 machine-independent headers`
		1018	`1070 machine-independent YACC source`
		1019	`6090 machine-independent C source`
		1020
		1021	`\0545 machine-dependent headers`
		1022	`6532 machine-dependent C source`
		1023
		1024	`\0298 loader headers`
		1025	`5215 loader C source`
		1026	`.DE`
		1027	`.LP`
		1028	`The following table shows timing`
		1029	`of a test program`
		1030	`that plays checkers, running on a MIPS R4000.`
		1031	`The test program is 26 files totaling 12600 lines of C.`
		1032	`The execution time does not significantly`
		1033	`depend on library implementation.`
		1034	`Since no other compiler runs on Plan 9,`
		1035	`the Plan 9 tests were done with the Plan 9 operating system;`
		1036	`the other tests were done on the vendor's operating system.`
		1037	`The hardware was identical in both cases.`
		1038	`The optimizer in the vendor's compiler`
		1039	`is reputed to be extremely good.`
		1040	`.DS`
		1041	`.ta .1i .9i`
		1042	`\0\04.49s Plan 9 \f(CWvc\fP \f(CW-N\fP compile time (opposite of \f(CW-O\fP)`
		1043	`\0\01.72s Plan 9 \f(CWvc\fP \f(CW-N\fP load time`
		1044	`148.69s Plan 9 \f(CWvc\fP \f(CW-N\fP run time`
		1045
		1046	`\015.07s Plan 9 \f(CWvc\fP compile time (\f(CW-O\fP implicit)`
		1047	`\0\01.66s Plan 9 \f(CWvc\fP load time`
		1048	`\089.96s Plan 9 \f(CWvc\fP run time`
		1049
		1050	`\014.83s vendor \f(CWcc\fP compile time`
		1051	`\0\00.38s vendor \f(CWcc\fP load time`
		1052	`104.75s vendor \f(CWcc\fP run time`
		1053
		1054	`\043.59s vendor \f(CWcc\fP \f(CW-O\fP compile time`
		1055	`\0\00.38s vendor \f(CWcc\fP \f(CW-O\fP load time`
		1056	`\076.19s vendor \f(CWcc\fP \f(CW-O\fP run time`
		1057
		1058	`\0\08.19s vendor \f(CWcc\fP \f(CW-O3\fP compile time`
		1059	`\035.97s vendor \f(CWcc\fP \f(CW-O3\fP load time`
		1060	`\071.16s vendor \f(CWcc\fP \f(CW-O3\fP run time`
		1061	`.DE`
		1062	`.LP`
		1063	`To compare the Intel compiler,`
		1064	`a program that is about 40% bit manipulation and`
		1065	`about 60% single precision floating point was`
		1066	`run on the same 33 MHz 486, once under Windows`
		1067	`compiled with the Watcom compiler, version 10.0,`
		1068	`in 16-bit mode and once under`
		1069	`Plan 9 in 32-bit mode.`
		1070	`The Plan 9 execution time was 27 sec while the Windows`
		1071	`execution time was 31 sec.`
		1072	`.NH`
		1073	`Conclusions`
		1074	`.LP`
		1075	`The new compilers compile`
		1076	`quickly,`
		1077	`load slowly,`
		1078	`and produce`
		1079	`medium quality`
		1080	`object code.`
		1081	`The compilers are relatively`
		1082	`portable,`
		1083	`requiring but a couple of weeks' work to`
		1084	`produce a compiler for a different computer.`
		1085	`For Plan 9,`
		1086	`where we needed several compilers`
		1087	`with specialized features and`
		1088	`our own object formats,`
		1089	`this project was indispensable.`
		1090	`It is also necessary for us to`
		1091	`be able to freely distribute our compilers`
		1092	`with the Plan 9 distribution.`
		1093	`.LP`
		1094	`Two problems have come up in retrospect.`
		1095	`The first has to do with the`
		1096	`division of labor between compiler and loader.`
		1097	`Plan 9 runs on multi-processors and as such`
		1098	`compilations are often done in parallel.`
		1099	`Unfortunately,`
		1100	`all compilations must be complete before loading`
		1101	`can begin.`
		1102	`The load is then single-threaded.`
		1103	`With this model,`
		1104	`any shift of work from compile to load`
		1105	`results in a significant increase in real time.`
		1106	`The same is true of libraries that are compiled`
		1107	`infrequently and loaded often.`
		1108	`In the future,`
		1109	`we may try to put some of the loader work`
		1110	`back into the compiler.`
		1111	`.LP`
		1112	`The second problem comes from`
		1113	`the various optimizations performed over several`
		1114	`passes.`
		1115	`Often optimizations in different passes depend`
		1116	`on each other.`
		1117	`Iterating the passes could compromise efficiency,`
		1118	`or even loop.`
		1119	`We see no real solution to this problem.`
		1120	`.NH`
		1121	`References`
		1122	`.LP`
		1123	`[Aho87] A. V. Aho, R. Sethi, and J. D. Ullman,`
		1124	`.I`
		1125	`Compilers \- Principles, Techniques, and Tools,`
		1126	`.R`
		1127	`Addison Wesley,`
		1128	`Reading, MA,`
		1129	`1987.`
		1130	`.LP`
		1131	`[ANSI90] \f2American National Standard for Information Systems \-`
		1132	`Programming Language C\f1, American National Standards Institute, Inc.,`
		1133	`New York, 1990.`
		1134	`.LP`
		1135	`[Dav91] J. W. Davidson and D. B. Whalley,`
		1136	``Methods for Saving and Restoring Register Values across Function Calls'',
		1137	`.I`
		1138	`Software\-Practice and Experience,`
		1139	`.R`
		1140	`Vol 21(2), pp. 149-165, February 1991.`
		1141	`.LP`
		1142	`[Joh79] S. C. Johnson,`
		1143	``YACC \- Yet Another Compiler Compiler'',
		1144	`.I`
		1145	`UNIX Programmer's Manual, Seventh Ed., Vol. 2A,`
		1146	`.R`
		1147	`AT&T Bell Laboratories,`
		1148	`Murray Hill, NJ,`
		1149	`1979.`
		1150	`.LP`
		1151	`[Set70] R. Sethi and J. D. Ullman,`
		1152	``The Generation of Optimal Code for Arithmetic Expressions'',
		1153	`.I`
		1154	`Journal of the ACM,`
		1155	`.R`
		1156	`Vol 17(4), pp. 715-728, 1970.`
		1157	`.LP`
		1158	`[Szy78] T. G. Szymanski,`
		1159	``Assembling Code for Machines with Span-dependent Instructions'',
		1160	`.I`
		1161	`Communications of the ACM,`
		1162	`.R`
		1163	`Vol 21(4), pp. 300-308, 1978.`

Subversion Repositories planix.SVN

(root)/os/trunk/sys/doc/compiler.ms – Rev 2