WebSVN – tendra.SVN – Blame – /trunk/doc/papers/porting/porting.xml

Rev	Author	Line No.	Line
6	7u83	1	`<?xml version="1.0" standalone="no"?>`
		2	`<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"`
		3	`"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">`
		4
		5	`<!--`
		6	$Id$
		7	`-->`
		8
		9	`<book>`
		10	`<bookinfo>`
		11	`<title>TDF and Portability</title>`
		12
		13	`<corpauthor>The TenDRA Project</corpauthor>`
		14
		15	`<author>`
		16	`<firstname>Jeroen</firstname>`
		17	`<surname>Ruigrok van der Werven</surname>`
		18	`</author>`
		19	`<authorinitials>JRvdW</authorinitials>`
		20	`<pubdate>2005</pubdate>`
		21
		22	`<copyright>`
		23	`<year>2004</year>`
		24	`<year>2005</year>`
		25
		26	`<holder>The TenDRA Project</holder>`
		27	`</copyright>`
		28
		29	`<copyright>`
		30	`<year>1998</year>`
		31
		32	`<holder>DERA</holder>`
		33	`</copyright>`
		34	`</bookinfo>`
		35
		36	`<chapter id="introduction">`
		37	`<title>Introduction</title>`
		38
		39	`<para>TDF is the name of the technology developed at DRA which has been`
		40	`adopted by the Open Software Foundation (OSF), Unix System Laboratories`
		41	`(USL), the European Community's Esprit Programme and others as their`
		42	`Architecture Neutral Distribution Format (ANDF). To date much of the`
		43	`discussion surrounding it has centred on the question, "How do you`
		44	`distribute portable software?". This paper concentrates on the more`
		45	`difficult question, "How do you write portable software in the first`
		46	`place?" and shows how TDF can be a valuable tool to aid the writing of`
		47	`portable software. Most of the discussion centres on programs written in`
		48	`C and is Unix specific. This is because most of the experience of TDF to`
		49	`date has been in connection with C in a Unix environment, and not`
		50	`because of any inbuilt bias in TDF.</para>`
		51
		52	`<para>It is assumed that the reader is familiar with the ANDF concept`
		53	`(although not necessarily with the details of TDF), and with the`
		54	`problems involved in writing portable C code.</para>`
		55
		56	`<para>The discussion is divided into two sections. Firstly some of the`
		57	`problems involved in writing portable programs are considered. The`
		58	`intention is not only to catalogue what these problems are, but to`
		59	`introduce ways of looking at them which will be important in the second`
		60	`section. This deals with the TDF approach to portability.</para>`
		61	`</chapter>`
		62
		63	`<chapter>`
		64	`<sect1 id="portability">`
		65	`<title>Portability</title>`
		66
		67	`<para>We start by examining some of the problems`
		68	`involved in the writing of portable programs. Although the`
		69	`discussion is very general, and makes no mention of TDF, many of`
		70	`the ideas introduced are of importance in the second half of the`
		71	`paper, which deals with TDF.</para>`
		72
		73	`<sect2 id="S3">`
		74	`<title>2.1. Portable Programs</title>`
		75
		76	`<sect3 id="S4">`
		77	`<title>2.1.1. Definitions and Preliminary Discussion</title>`
		78
		79	`<para>Let us firstly say what we mean by a portable program. A`
		80	`program is portable to a number of machines if it can be compiled`
		81	`to give the same functionality on all those machines. Note that`
		82	`this does not mean that exactly the same source code is used on`
		83	`all the machines. One could envisage a program written in, say,`
		84	`68020 assembly code for a certain machine which has been`
		85	`translated into 80386 assembly code for some other machine to give`
		86	`a program with exactly equivalent functionality. This would, under`
		87	`our definition, be a program which is portable to these two`
		88	`machines. At the other end of the scale, the C program:`
		89
		90	`<programlisting>`
		91	`#include <stdio.h>`
		92
		93	`int`
		94	`main()`
		95	`{`
		96	`fputs("Hello world\n", stdout);`
		97	`return(0);`
		98	`}`
		99	`</programlisting>`
		100
		101	`which prints the message, "Hello world", onto the standard output`
		102	`stream, will be portable to a vast range of machines without any`
		103	`need for rewriting. Most of the portable programs we shall be`
		104	`considering fall closer to the latter end of the spectrum - they`
		105	`will largely consist of target independent source with small`
		106	`sections of target dependent source for those constructs for which`
		107	`target independent expression is either impossible or of`
		108	`inadequate efficiency.</para>`
		109
		110	`<para>Note that we are defining portability in terms of a set of`
		111	`target machines and not as some universal property. The act of`
		112	`modifying an existing program to make it portable to a new target`
		113	`machine is called porting. Clearly in the examples above, porting`
		114	`the first program would be a highly complex task involving almost`
		115	`an entire rewrite, whereas in the second case it should be`
		116	`trivial.</para>`
		117	`</sect3>`
		118
		119	`<sect3 id="S5">`
		120	`<title>2.1.2. Separation and Combination of Code</title>`
		121
		122	`<para>So why is the second example above more portable (in the sense`
		123	`of more easily ported to a new machine) than the first? The`
		124	`first, obvious, point to be made is that it is written in a`
		125	`high-level language, C, rather than the low-level languages, 68020`
		126	`and 80386 assembly codes, used in the first example. By using a`
		127	`high-level language we have abstracted out the details of the`
		128	`processor to be used and expressed the program in an architecture`
		129	`neutral form. It is one of the jobs of the compiler on the target`
		130	`machine to transform this high-level representation into the`
		131	`appropriate machine dependent low-level representation.</para>`
		132
		133	`<para>The second point is that the second example program is not in`
		134	`itself complete. The objects <code>fputs</code> and`
		135	`<code>stdout</code>, representing the procedure to output a string`
		136	`and the standard output stream respectively, are left undefined.`
		137	`Instead the header <code>stdio.h</code> is included on the`
		138	`understanding that it contains the specification of these`
		139	`objects.</para>`
		140
		141	`<para>A version of this file is to be found on each target machine.`
		142	`On a particular machine it might contain something like:`
		143
		144	`<programlisting>`
		145	`typedef struct {`
		146	`int __cnt ;`
		147	`unsigned char *__ptr ;`
		148	`unsigned char *__base ;`
		149	`short __flag ;`
		150	`char __file ;`
		151	`} FILE ;`
		152
		153	`extern FILE __iob[60];`
		154	`#define stdout (&__iob[1])`
		155
		156	`extern int fputs(const char , FILE );`
		157
		158	`</programlisting>`
		159
		160	`meaning that the type <code>FILE</code> is defined by the given`
		161	`structure, <code>__iob</code> is an external array of 60`
		162	`<code>FILE</code>'s, <code>stdout</code> is a pointer to the`
		163	`second element of this array, and that <code>fputs</code> is an`
		164	`external procedure which takes a <code>const char *</code> and a`
		165	`<code>FILE *</code> and returns an <code>int</code>. On a`
		166	`different machine, the details may be different (exactly what we`
		167	`can, or cannot, assume is the same on all target machines is`
		168	`discussed below).</para>`
		169
		170	`<para>These details are fed into the program by the pre-processing`
		171	`phase of the compiler. (The various compilation phases are`
		172	`discussed in more detail later - see Fig. 1.) This is a simple,`
		173	`preliminary textual substitution. It provides the definitions of`
		174	`the type <code>FILE</code> and the value <code>stdout</code> (in`
		175	`terms of <code>__iob</code>), but still leaves the precise`
		176	`definitions of <code>__iob</code> and <code>fputs</code> still`
		177	`unresolved (although we do know their types). The definitions of`
		178	`these values are not provided until the final phase of the`
		179	`compilation - linking - where they are linked in from the`
		180	`precompiled system libraries.</para>`
		181
		182	`<para>Note that, even after the pre-processing phase, our portable`
		183	`program has been transformed into a target dependent form, because`
		184	`of the substitution of the target dependent values from`
		185	`<code>stdio.h</code>. If we had also included the definitions of`
		186	`<code>__iob</code> and, more particularly, <code>fputs</code>,`
		187	`things would have been even worse - the procedure for outputting a`
		188	`string to the screen is likely to be highly target`
		189	`dependent.</para>`
		190
		191	`<para>To conclude, we have, by including <code>stdio.h</code>, been`
		192	`able to effectively separate the target independent part of our`
		193	`program (the main program) from the target dependent part (the`
		194	`details of <code>stdout</code> and <code>fputs</code>). It is one`
		195	`of the jobs of the compiler to recombine these parts to produce a`
		196	`complete program.</para>`
		197	`</sect3>`
		198
		199	`<sect3 id="S6">`
		200	`<title>2.1.3. Application Programming Interfaces</title>`
		201
		202	`<para>As we have seen, the separation of the target dependent`
		203	`sections of a program into the system headers and system libraries`
		204	`greatly facilitates the construction of portable programs. What`
		205	`has been done is to define an interface between the main program`
		206	`and the existing operating system on the target machine in`
		207	`abstract terms. The program should then be portable to any machine`
		208	`which implements this interface correctly.</para>`
		209
		210	`<para>The interface for the "Hello world" program above might be`
		211	`described as follows : defined in the header <code>stdio.h</code>`
		212	`are a type <code>FILE</code> representing a file, an object`
		213	`<code>stdout</code> of type <code>FILE *</code> representing the`
		214	`standard output file, and a procedure <code>fputs</code> with`
		215	`prototype:`
		216
		217	`<programlisting>`
		218	`int fputs(const char s, FILE f);`
		219	`</programlisting>`
		220
		221	`which prints the string <code>s</code> to the file <code>f</code>.`
		222	`This is an example of an Application Programming Interface (API).`
		223	`Note that it can be split into two aspects, the syntactic (what`
		224	`they are) and the semantic (what they mean). On any machine which`
		225	`implements this API our program is both syntactically correct and`
		226	`does what we expect it to.</para>`
		227
		228	`<para>The benefit of describing the API at this fairly high level is`
		229	`that it leaves scope for a range of implementation (and thus more`
		230	`machines which implement it) while still encapsulating the main`
		231	`program's requirements.</para>`
		232
		233	`<para>In the example implementation of <code>stdio.h</code> above we`
		234	`see that this machine implements this API correctly syntactically,`
		235	`but not necessarily semantically. One would have to read the`
		236	`documentation provided on the system to be sure of the`
		237	`semantics.</para>`
		238
		239	`<para>Another way of defining an API for this program would be to`
		240	`note that the given API is a subset of the ANSI C standard. Thus`
		241	`we could take ANSI C as an "off the shelf" API. It is then clear`
		242	`that our program should be portable to any ANSI-compliant`
		243	`machine.</para>`
		244
		245	`<para>It is worth emphasising that all programs have an API, even if`
		246	`it is implicit rather than explicit. However it is probably fair`
		247	`to say that programs without an explicit API are only portable by`
		248	`accident. We shall have more to say on this subject later.</para>`
		249	`</sect3>`
		250
		251	`<sect3 id="S7">`
		252	`<title>2.1.4. Compilation Phases</title>`
		253
		254	`<para>The general plan for how to write the extreme example of a`
		255	`portable program, namely one which contains no target dependent`
		256	`code, is now clear. It is shown in the compilation diagram in Fig.`
		257	`1 which represents the traditional compilation process. This`
		258	`diagram is divided into four sections. The left half of the`
		259	`diagram represents the actual program and the right half the`
		260	`associated API. The top half of the diagram represents target`
		261	`independent material - things which only need to be done once -`
		262	`and the bottom half target dependent material - things which need`
		263	`to be done on every target machine.</para>`
		264
		265	`<para>FIGURE 1. Traditional Compilation Phases</para>`
		266
		267	`<img src="../images/trad_scheme.gif" />`
		268
		269	`<para> So, we write our target independent program (top left),`
		270	`conforming to the target independent API specification (top`
		271	`right). All the compilation actually takes place on the target`
		272	`machine. This machine must have the API correctly implemented`
		273	`(bottom right). This implementation will in general be in two`
		274	`parts - the system headers, providing type definitions, macros,`
		275	`procedure prototypes and so on, and the system libraries,`
		276	`providing the actual procedure definitions. Another way of`
		277	`characterising this division is between syntax (the system`
		278	`headers) and semantics (the system libraries).</para>`
		279
		280	`<para>The compilation is divided into three main phases. Firstly the`
		281	`system headers are inserted into the program by the pre-processor.`
		282	`This produces, in effect, a target dependent version of the`
		283	`original program. This is then compiled into a binary object file.`
		284	`During the compilation process the compiler inserts all the`
		285	`information it has about the machine - including the Application`
		286	`Binary Interface (ABI) - the sizes of the basic C types, how they`
		287	`are combined into compound types, the system procedure calling`
		288	`conventions and so on. This ensures that in the final linking`
		289	`phase the binary object file and the system libraries are obeying`
		290	`the same ABI, thereby producing a valid executable. (On a`
		291	`dynamically linked system this final linking phase takes place`
		292	`partially at run time rather than at compile time, but this does`
		293	`not really affect the general scheme.)</para>`
		294
		295	`<para>The compilation scheme just described consists of a series of`
		296	`phases of two types ; code combination (the pre-processing and`
		297	`system linking phases) and code transformation (the actual`
		298	`compilation phases). The existence of the combination phases`
		299	`allows for the effective separation of the target independent code`
		300	`(in this case, the whole program) from the target dependent code`
		301	`(in this case, the API implementation), thereby aiding the`
		302	`construction of portable programs. These ideas on the separation,`
		303	`combination and transformation of code underlie the TDF approach`
		304	`to portability.</para>`
		305	`</sect3>`
		306	`</sect2>`
		307
		308	`<sect2 id="S8">`
		309	`<title>2.2. Portability Problems</title>`
		310
		311	`<para>We have set out a scheme whereby it should be possible to write`
		312	`portable programs with a minimum of difficulties. So why, in`
		313	`reality, does it cause so many problems? Recall that we are still`
		314	`primarily concerned with programs which contain no target dependent`
		315	`code, although most of the points raised apply by extension to all`
		316	`programs.</para>`
		317
		318	`<sect3 id="S9">`
		319	`<title>2.2.1. Programming Problems</title>`
		320
		321	`<para>A first, obvious class of problems concern the program itself.`
		322	`It is to be assumed that as many bugs as possible have been`
		323	`eliminated by testing and debugging on at least one platform`
		324	`before a program is considered as a candidate for being a portable`
		325	`program. But for even the most self-contained program, working on`
		326	`one platform is no guarantee of working on another. The program`
		327	`may use undefined behaviour - using uninitialised values or`
		328	`dereferencing null pointers, for example - or have built-in`
		329	`assumptions about the target machine - whether it is big-endian or`
		330	`little-endian, or what the sizes of the basic integer types are,`
		331	`for example. This latter point is going to become increasingly`
		332	`important over the next couple of years as 64-bit architectures`
		333	`begin to be introduced. How many existing programs implicitly`
		334	`assume a 32-bit architecture?</para>`
		335
		336	`<para>Many of these built-in assumptions may arise because of the`
		337	`conventional porting process. A program is written on one machine,`
		338	`modified slightly to make it work on a second machine, and so on.`
		339	`This means that the program is "biased" towards the existing set`
		340	`of target machines, and most particularly to the original machine`
		341	`it was written on. This applies not only to assumptions about`
		342	`endianness, say, but also to the questions of API conformance`
		343	`which we will be discussing below.</para>`
		344
		345	`<para>Most compilers will pick up some of the grosser programming`
		346	`errors, particularly by type checking (including procedure`
		347	`arguments if prototypes are used). Some of the subtler errors can`
		348	`be detected using the <b>-Wall</b> option to the Free Software`
		349	`Foundation's GNU C Compiler (<code>gcc</code>) or separate program`
		350	`checking tools such as <code>lint</code>, for example, but this`
		351	`remains a very difficult area.</para>`
		352	`</sect3>`
		353
		354	`<sect3 id="S10">`
		355	`<title>2.2.2. Code Transformation Problems</title>`
		356
		357	`<para>We now move on from programming problems to compilation`
		358	`problems. As we mentioned above, compilation may be regarded as a`
		359	`series of phases of two types : combination and transformation.`
		360	`Transformation of code - translating a program in one form into an`
		361	`equivalent program in another form - may lead to a variety of`
		362	`problems. The code may be transformed wrongly, so that the`
		363	`equivalence is broken (a compiler bug), or in an unexpected manner`
		364	`(differing compiler interpretations), or not at all, because it is`
		365	`not recognised as legitimate code (a compiler limitation). The`
		366	`latter two problems are most likely when the input is a high level`
		367	`language, with complex syntax and semantics.</para>`
		368
		369	`<para>Note that in Fig. 1 all the actual compilation takes place on`
		370	`the target machine. So, to port the program to`
		371	`<varname>n</varname> machines, we need to deal with the bugs and`
		372	`limitations of <varname>n</varname>, potentially different,`
		373	`compilers. For example, if you have written your program using`
		374	`prototypes, it is going to be a large and rather tedious job`
		375	`porting it to a compiler which does not have prototypes (this`
		376	`particular example can be automated; not all such jobs can). Other`
		377	`compiler limitations can be surprising`
		378	`- not understanding the <code>L</code> suffix for long numeric`
		379	`literals and not allowing members of enumeration types as array`
		380	`indexes are among the problems drawn from my personal`
		381	`experience.</para>`
		382
		383	`<para>The differing compiler interpretations may be more subtle. For`
		384	`example, there are differences between ANSI and "traditional" C`
		385	`which may trap the unwary. Examples are the promotion of integral`
		386	`types and the resolution of the linkage of static objects.</para>`
		387
		388	`<para>Many of these problems may be reduced by using the "same"`
		389	`compiler on all the target machines. For example, <code>gcc</code>`
		390	`has a single front end (C -> RTL) which may be combined with an`
		391	`appropriate back end (RTL -> target) to form a suitable`
		392	`compiler for a wide range of target machines. The existence of a`
		393	`single front end virtually eliminates the problems of differing`
		394	`interpretation of code and compiler quirks. It also reduces the`
		395	`exposure to bugs. Instead of being exposed to the bugs in`
		396	`<varname>n</varname> separate compilers, we are now only exposed`
		397	`to bugs in one half-compiler (the front end) plus`
		398	`<varname>n</varname> half-compilers (the back ends) - a total of`
		399	`<varname>(n + 1) / 2</varname>. (This calculation is not meant`
		400	`totally seriously, but it is true in principle.) Front end bugs,`
		401	`when tracked down, also only require a single workaround.</para>`
		402	`</sect3>`
		403
		404	`<sect3>`
		405	`<title id="S11">2.2.3. Code Combination Problems</title>`
		406
		407	`<para>If code transformation problems may be regarded as a time`
		408	`consuming irritation, involving the rewriting of sections of code`
		409	`or using a different compiler, the second class of problems, those`
		410	`concerned with the combination of code, are far more`
		411	`serious.</para>`
		412
		413	`<para>The first code combination phase is the pre-processor pulling`
		414	`in the system headers. These can contain some nasty surprises.`
		415	`For example, consider a simple ANSI compliant program which`
		416	`contains a linked list of strings arranged in alphabetical order.`
		417	`This might also contain a routine:</para>`
		418
		419	`<programlisting>`
		420	`void index(char *);`
		421	`</programlisting>`
		422
		423	`<para>which adds a string to this list in the appropriate position,`
		424	`using <code>strcmp</code> from <code>string.h</code> to find it.`
		425	`This works fine on most machines, but on some it gives the`
		426	`error:</para>`
		427
		428	`<programlisting>`
		429	`Only 1 argument to macro 'index'`
		430	`</programlisting>`
		431
		432	`<para>The reason for this is that the system version of`
		433	`<code>string.h</code> contains the line:</para>`
		434
		435	`<programlisting>`
		436	`#define index(s, c) strchr(s, c)`
		437	`</programlisting>`
		438
		439	`<para>But this is nothing to do with ANSI, this macro is defined for`
		440	`compatibility with BSD.</para>`
		441
		442	`<para>In reality the system headers on any given machine are a hodge`
		443	`podge of implementations of different APIs, and it is often`
		444	`virtually impossible to separate them (feature test macros such as`
		445	`<code>_POSIX_SOURCE</code> are of some use, but are not always`
		446	`implemented and do not always produce a complete separation; they`
		447	`are only provided for "standard" APIs anyway). The problem above`
		448	`arose because there is no transitivity rule of the form : if`
		449	`program <varname>P</varname> conforms to API <varname>A</varname>,`
		450	`and API <varname>B</varname> extends <varname>A</varname>, then`
		451	`<varname>P</varname> conforms to <varname>B</varname>. The only`
		452	`reason this is not true is these namespace problems.</para>`
		453
		454	`<para>A second example demonstrates a slightly different point. The`
		455	`POSIX standard states that <code>sys/stat.h</code> contains the`
		456	`definition of the structure <code>struct stat</code>, which`
		457	`includes several members, amongst them:</para>`
		458
		459	`<programlisting>`
		460	`time_t st_atime;`
		461	`</programlisting>`
		462
		463	`<para>representing the access time for the corresponding file. So`
		464	`the program:</para>`
		465
		466	`<programlisting>`
		467	`#include <sys/types.h>`
		468	`#include <sys/stat.h>`
		469
		470	`time_t`
		471	`st_atime(struct stat *p)`
		472	`{`
		473	`return(p->st_atime);`
		474	`}`
		475	`</programlisting>`
		476
		477	`<para>should be perfectly valid - the procedure name`
		478	`<code>st_atime</code> and the field selector <code>st_atime</code>`
		479	`occupy different namespaces (see however the appendix on`
		480	`namespaces and APIs below). However at least one popular operating`
		481	`system has the implementation:</para>`
		482
		483	`<programlisting>`
		484	`struct stat{`
		485	`....`
		486	`union {`
		487	`time_t st__sec;`
		488	`timestruc_t st__tim;`
		489	`} st_atim;`
		490	`....`
		491	`};`
		492	`#define st_atime st_atim.st__sec`
		493	`</programlisting>`
		494
		495	`<para>This seems like a perfectly legitimate implementation. In the`
		496	`program above the field selector <code>st_atime</code> is replaced`
		497	`by <code>st_atim.st__sec</code> by the pre-processor, as intended,`
		498	`but unfortunately so is the procedure name <code>st_atime</code>,`
		499	`leading to a syntax error.</para>`
		500
		501	`<para>The problem here is not with the program or the`
		502	`implementation, but in the way they were combined. C does not`
		503	`allow individual field selectors to be defined. Instead the`
		504	`indiscriminate sledgehammer of macro substitution was used,`
		505	`leading to the problem described.</para>`
		506
		507	`<para>Problems can also occur in the other combination phase of the`
		508	`traditional compilation scheme, the system linking. Consider the`
		509	`ANSI compliant routine:</para>`
		510
		511	`<programlisting>`
		512	`#include <stdio.h>`
		513
		514	`int open ( char *nm )`
		515	`{`
		516	`int c, n = 0 ;`
		517	`FILE *f = fopen ( nm, "r" ) ;`
		518	`if ( f == NULL ) return ( -1 ) ;`
		519	`while ( c = getc ( f ), c != EOF ) n++ ;`
		520	`( void ) fclose ( f ) ;`
		521	`return ( n ) ;`
		522	`}`
		523	`</programlisting>`
		524
		525	`<para>which opens the file <code>nm</code>, returning its size in`
		526	`bytes if it exists and -1 otherwise. As a quick porting exercise,`
		527	`I compiled it under six different operating systems. On three it`
		528	`worked correctly; on one it returned -1 even when the file`
		529	`existed; and on two it crashed with a segmentation error.</para>`
		530
		531	`<para>The reason for this lies in the system linking. On those`
		532	`machines which failed the library routine <code>fopen</code>`
		533	`calls (either directly or indirectly) the library routine`
		534	`<code>open</code> (which is in POSIX, but not ANSI). The system`
		535	`linker, however, linked my routine <code>open</code> instead of`
		536	`the system version, so the call to <code>fopen</code> did not`
		537	`work correctly.</para>`
		538
		539	`<para>So code combination problems are primarily namespace problems.`
		540	`The task of combining the program with the API implementation on`
		541	`a given platform is complicated by the fact that, because the`
		542	`system headers and system libraries contain things other than the`
		543	`API implementation, or even because of the particular`
		544	`implementation chosen, the various namespaces in which the`
		545	`program is expected to operate become "polluted".</para>`
		546	`</sect3>`
		547
		548	`<sect3>`
		549	`<title id="S12">2.2.4. API Problems</title>`
		550	`<para>We have`
		551	`said that the API defines the interface between the program and`
		552	`the standard library provided with the operating system on the`
		553	`target machine. There are three main problems concerned with`
		554	`APIs. The first, how to choose the API in the first place, is`
		555	`discussed separately. Here we deal with the compilation aspects :`
		556	`how to check that the program conforms to its API, and what to do`
		557	`about incorrect API implementations on the target machine(s).</para>`
		558
		559	`<sect4>`
		560	`<title id="S13">2.2.4.1. API Checking</title>`
		561	`<para>The`
		562	`problem of whether or not a program conforms to its API - not`
		563	`using any objects from the operating system other than those`
		564	`specified in the API, and not making any unwarranted assumptions`
		565	`about these objects - is one which does not always receive`
		566	`sufficient attention, mostly because the necessary checking tools`
		567	`do not exist (or at least are not widely available). Compiling`
		568	`the program on a number of API compliant machines merely checks`
		569	`the program against the system headers for these machines. For a`
		570	`genuine portability check we need to check against the abstract`
		571	`API description, thereby in effect checking against all possible`
		572	`implementations.</para>`
		573
		574	`<para>Recall from above that the system headers on a given machine`
		575	`are an amalgam of all the APIs it implements. This can cause`
		576	`programs which should compile not to, because of namespace`
		577	`clashes; but it may also cause programs to compile which should`
		578	`not, because they have used objects which are not in their API,`
		579	`but which are in the system headers. For example, the supposedly`
		580	`ANSI compliant program:`
		581	`<programlisting>`
		582	`#include <signal.h>`
		583	`int sig = SIGKILL ;`
		584	`</programlisting>`
		585	`will compile on most systems, despite the fact that`
		586	`<code>SIGKILL</code> is not an ANSI signal, because`
		587	`<code>SIGKILL</code> is in POSIX, which is also implemented in the`
		588	`system <code>signal.h</code>. Again, feature test macros are of`
		589	`some use in trying to isolate the implementation of a single API`
		590	`from the rest of the system headers. However they are highly`
		591	`unlikely to detect the error in the following supposedly POSIX`
		592	`compliant program which prints the entries of the directory <code>`
		593	`nm</code>, together with their inode numbers:`
		594	`<programlisting>`
		595	`#include <stdio.h>`
		596	`#include <sys/types.h>`
		597	`#include <dirent.h>`
		598
		599	`void listdir ( char *nm )`
		600	`{`
		601	`struct dirent *entry ;`
		602	`DIR *dir = opendir ( nm ) ;`
		603	`if ( dir == NULL ) return ;`
		604	`while ( entry = readdir ( dir ), entry != NULL ) {`
		605	`printf ( "%s : %d\n", entry->d_name, ( int ) entry->d_ino ) ;`
		606	`}`
		607	`( void ) closedir ( dir ) ;`
		608	`return ;`
		609	`}`
		610	`</programlisting>`
		611	`This is not POSIX compliant because, whereas the`
		612	`<code>d_name</code> field of <code>struct dirent</code> is in`
		613	`POSIX, the <code>d_ino</code> field is not. It is however in XPG3,`
		614	`so it is likely to be in many system implementations.</para>`
		615
		616	`<para>The previous examples have been concerned with simply telling`
		617	`whether or not a particular object is in an API. A more`
		618	`difficult, and in a way more important, problem is that of`
		619	`assuming too much about the objects which are in the API. For`
		620	`example, in the program:`
		621	`<programlisting>`
		622	`#include <stdio.h>`
		623	`#include <stdlib.h>`
		624
		625	`div_t d = { 3, 4 } ;`
		626
		627	`int main ()`
		628	`{`
		629	`printf ( "%d,%d\n", d.quot, d.rem ) ;`
		630	`return ( 0 ) ;`
		631	`}`
		632	`</programlisting>`
		633	`the ANSI standard specifies that the type <code>div_t</code>`
		634	`is a structure containing two fields, <code>quot</code> and <code>`
		635	`rem</code>, of type <code>int</code>, but it does not specify`
		636	`which order these fields appear in, or indeed if there are other`
		637	`fields. Therefore the initialisation of <code>d</code> is not`
		638	`portable. Again, the type <code>time_t</code> is used to`
		639	`represent times in seconds since a certain fixed date. On most`
		640	`systems this is implemented as <code>long</code>, so it is`
		641	`tempting to use <code>( t & 1 )</code> to determine for a`
		642	`<code>time_t</code> <code>t</code> whether this number of seconds`
		643	`is odd or even. But ANSI actually says that <code>time_t</code>`
		644	`is an arithmetic, not an integer, type, so it would be possible`
		645	`for it to be implemented as <code>double</code>. But in this case`
		646	`<code>( t & 1 )</code> is not even type correct, so it is not`
		647	`a portable way of finding out whether <code>t</code> is odd or`
		648	`even.</para>`
		649	`</sect4>`
		650
		651	`<sect4>`
		652	`<title id="S14">2.2.4.2. API Implementation Errors</title>`
		653	`<para>Undoubtedly the problem which causes the writer of`
		654	`portable programs the greatest headache (and heartache) is that`
		655	`of incorrect API implementations. However carefully you have`
		656	`chosen your API and checked that your program conforms to it, you`
		657	`are still reliant on someone (usually the system vendor) having`
		658	`implemented this API correctly on the target machine. Machines`
		659	`which do not implement the API at all do not enter the equation`
		660	`(they are not suitable target machines), what causes problems is`
		661	`incorrect implementations. As the implementation may be divided`
		662	`into two parts - system headers and system libraries - we shall`
		663	`similarly divide our discussion. Inevitably the choice of`
		664	`examples is personal; anyone who has ever attempted to port a`
		665	`program to a new machine is likely to have their own favourite`
		666	`examples.</para>`
		667	`</sect4>`
		668
		669	`<sect4>`
		670	`<title id="S15">2.2.4.3. System Header Problems</title>`
		671	`<para>Some header problems are immediately apparent`
		672	`because they are syntactic and cause the program to fail to`
		673	`compile. For example, values may not be defined or be defined in`
		674	`the wrong place (not in the header prescribed by the API).</para>`
		675
		676	`<para>A common example (one which I have to include a workaround for`
		677	`in virtually every program I write) is that`
		678	`<code>EXIT_SUCCESS</code> and <code>EXIT_FAILURE</code> are not`
		679	`always defined (ANSI specifies that they should be in`
		680	`<code>stdlib.h</code>). It is tempting to change <code>exit`
		681	`(EXIT_FAILURE)</code> to <code>exit (1)</code> because "everyone`
		682	`knows" that <code>EXIT_FAILURE</code> is 1. But this is to`
		683	`decrease the portability of the program because it ties it to a`
		684	`particular class of implementations. A better workaround would`
		685	`be:`
		686	`<programlisting>`
		687	`#include <stdlib.h>`
		688	`#ifndef EXIT_FAILURE`
		689	`#define EXIT_FAILURE 1`
		690	`#endif`
		691	`</programlisting>`
		692	`which assumes that anyone choosing a non-standard value for`
		693	`<code>EXIT_FAILURE</code> is more likely to put it in`
		694	`<code>stdlib.h</code>. Of course, if one subsequently came across a`
		695	`machine on which not only is <code>EXIT_FAILURE</code> not defined,`
		696	`but also the value it should have is not 1, then it would be`
		697	`necessary to resort to <code>#ifdef machine_name</code> statements.`
		698	`The same is true of all the API implementation problems we shall be`
		699	`discussing : non-conformant machines require workarounds involving`
		700	`conditional compilation. As more machines are considered, so these`
		701	`conditional compilations multiply.</para>`
		702
		703	`<para>As an example of things being defined in the wrong place, ANSI`
		704	`specifies that <code>SEEK_SET</code>, <code>SEEK_CUR</code> and`
		705	`<code>SEEK_END</code> should be defined in <code>stdio.h</code>,`
		706	`whereas POSIX specifies that they should also be defined in`
		707	`<code>unistd.h</code>. It is not uncommon to find machines on`
		708	`which they are defined in the latter but not in the former. A`
		709	`possible workaround in this case would be:`
		710	`<programlisting>`
		711	`#include <stdio.h>`
		712	`#ifndef SEEK_SET`
		713	`#include <unistd.h>`
		714	`#endif`
		715	`</programlisting>`
		716	`Of course, by including "unnecessary" headers like`
		717	`<code>unistd.h</code> the risk of namespace clashes such as those`
		718	`discussed above is increased.</para>`
		719
		720	`<para>A final syntactic problem, which perhaps should belong with`
		721	`the system header problems above, concerns dependencies between`
		722	`the headers themselves. For example, the POSIX header`
		723	`<code>unistd.h</code> declares functions involving some of the`
		724	`types <code>pid_t</code>, <code>uid_t</code> etc, defined in`
		725	`<code>sys/types.h</code>. Is it necessary to include`
		726	`<code>sys/types.h</code> before including <code>unistd.h</code>,`
		727	`or does <code>unistd.h</code> automatically include`
		728	`<code>sys/types.h</code>? The approach of playing safe and`
		729	`including everything will normally work, but this can lead to`
		730	`multiple inclusions of a header. This will normally cause no`
		731	`problems because the system headers are protected against`
		732	`multiple inclusions by means of macros, but it is not unknown for`
		733	`certain headers to be left unprotected. Also not all header`
		734	`dependencies are as clear cut as the one given, so that what`
		735	`headers need to be included, and in what order, is in fact target`
		736	`dependent.</para>`
		737
		738	`<para>There can also be semantic errors in the system headers :`
		739	`namely wrongly defined values. The following two examples are`
		740	`taken from real operating systems. Firstly the definition:`
		741	`<programlisting>`
		742	`#define DBL_MAX 1.797693134862316E+308`
		743	`</programlisting>`
		744	`in <code>float.h</code> on an IEEE-compliant machine is`
		745	`subtly wrong - the given value does not fit into a`
		746	`<code>double</code> - the correct value is:`
		747	`<programlisting>`
		748	`#define DBL_MAX 1.7976931348623157E+308`
		749	`</programlisting>`
		750	`Again, the type definition:`
		751	`<programlisting>`
		752	`typedef int size_t ; /* ??? */`
		753	`</programlisting>`
		754	`(sic) is not compliant with ANSI, which says that`
		755	`<code>size_t</code> is an unsigned integer type. (I'm not sure if`
		756	`this is better or worse than another system which defines`
		757	`<code>ptrdiff_t</code> to be <code>unsigned int</code> when it is`
		758	`meant to be signed. This would mean that the difference between any`
		759	`two pointers is always positive.) These particular examples are`
		760	`irritating because it would have cost nothing to get things right,`
		761	`correcting the value of <code>DBL_MAX</code> and changing the`
		762	`definition of <code>size_t</code> to <code>unsigned int</code>.`
		763	`These corrections are so minor that the modified system headers`
		764	`would still be a valid interface for the existing system libraries`
		765	`(we shall have more to say about this later). However it is not`
		766	`possible to change the system headers, so it is necessary to build`
		767	`workarounds into the program. Whereas in the first case it is`
		768	`possible to devise such a workaround:`
		769	`<programlisting>`
		770	`#include <float.h>`
		771	`#ifdef machine_name`
		772	`#undef DBL_MAX`
		773	`#define DBL_MAX 1.7976931348623157E+308`
		774	`#endif`
		775	`</programlisting>`
		776	`for example, in the second, because <code>size_t</code> is`
		777	`defined by a <code>typedef</code> it is virtually impossible to`
		778	`correct in a simple fashion. Thus any program which relies on the`
		779	`fact that <code>size_t</code> is unsigned will require considerable`
		780	`rewriting before it can be ported to this machine.</para>`
		781	`</sect4>`
		782
		783	`<sect4>`
		784	`<title id="S16">2.2.4.4. System Library Problems</title>`
		785	`<para>The system header problems just discussed are`
		786	`primarily syntactic problems. By contrast, system library`
		787	`problems are primarily semantic - the provided library routines`
		788	`do not behave in the way specified by the API. This makes them`
		789	`harder to detect. For example, consider the routine:`
		790	`<programlisting>`
		791	`void realloc ( void p, size_t s ) ;`
		792	`</programlisting>`
		793	`which reallocates the block of memory <code>p</code> to have`
		794	`size <code>s</code> bytes, returning the new block of memory. The`
		795	`ANSI standard says that if <code>p</code> is the null pointer, then`
		796	`the effect of <code>realloc ( p, s )</code> is the same as`
		797	`<code>malloc ( s )</code>, that is, to allocate a new block of`
		798	`memory of size <code>s</code>. This behaviour is exploited in the`
		799	`following program, in which the routine <code>add_char</code> adds`
		800	`a character to the expanding array, <code>buffer</code>:`
		801	`<programlisting>`
		802	`#include <stdio.h>`
		803	`#include <stdlib.h>`
		804
		805	`char *buffer = NULL ;`
		806	`int buff_sz = 0, buff_posn = 0 ;`
		807
		808	`void add_char ( char c )`
		809	`{`
		810	`if ( buff_posn >= buff_sz ) {`
		811	`buff_sz += 100 ;`
		812	`buffer = ( char * ) realloc ( ( void * ) buffer, buff_sz * sizeof ( char ) ) ;`
		813	`if ( buffer == NULL ) {`
		814	`fprintf ( stderr, "Memory allocation error\n" ) ;`
		815	`exit ( EXIT_FAILURE ) ;`
		816	`}`
		817	`}`
		818	`buffer [ buff_posn++ ] = c ;`
		819	`return ;`
		820	`}`
		821	`</programlisting>`
		822	`On the first call of <code>add_char</code>,`
		823	`<code>buffer</code> is set to a real block of memory (as opposed to`
		824	`<code>NULL</code>) by a call of the form <code>realloc ( NULL, s`
		825	`)</code>. This is extremely convenient and efficient - if it was`
		826	`not for this behaviour we would have to have an explicit`
		827	`initialisation of <code>buffer</code>, either as a special case in`
		828	`<code>add_char</code> or in a separate initialisation routine.</para>`
		829
		830	`<para>Of course this all depends on the behaviour of <code>realloc (`
		831	`NULL, s )</code> having been implemented precisely as described`
		832	`in the ANSI standard. The first indication that this is not so on`
		833	`a particular target machine might be when the program is compiled`
		834	`and run on that machine for the first time and does not perform`
		835	`as expected. To track the problem down will demand time debugging`
		836	`the program.</para>`
		837
		838	`<para>Once the problem has been identified as being with`
		839	`<code>realloc</code> a number of possible workarounds are`
		840	`possible. Perhaps the most interesting is to replace the`
		841	`inclusion of <code>stdlib.h</code> by the following:`
		842	`<programlisting>`
		843	`#include <stdlib.h>`
		844	`#ifdef machine_name`
		845	`#define realloc ( p, s )\`
		846	`( ( p ) ? ( realloc ) ( p, s ) : malloc ( s ) )`
		847	`#endif`
		848	`</programlisting>`
		849	`where <code>realloc ( p, s )</code> is redefined as a macro`
		850	`which is the result of the procedure <code>realloc</code> if <code>`
		851	`p</code> is not null, and <code>malloc ( s )</code> otherwise.`
		852	`(In fact this macro will not always have the desired effect,`
		853	`although it does in this case. Why (exercise)?)</para>`
		854
		855	`<para>The only alternative to this trial and error approach to`
		856	`finding API implementation problems is the application of`
		857	`personal experience, either of the particular target machine or`
		858	`of things that are implemented wrongly by many machines and as`
		859	`such should be avoided. This sort of detailed knowledge is not`
		860	`easily acquired. Nor can it ever be complete: new operating`
		861	`system releases are becoming increasingly regular and are on`
		862	`occasions quite as likely to introduce new implementation errors`
		863	`as to solve existing ones. It is in short a "black art".</para>`
		864	`</sect4>`
		865	`</sect3>`
		866	`</sect2>`
		867
		868	`<sect2>`
		869	`<title id="S17">2.3. APIs and Portability</title>`
		870	`<para>We now return to our discussion`
		871	`of the general issues involved in portability to more closely`
		872	`examine the role of the API.</para>`
		873
		874	`<sect3>`
		875	`<title id="S18">2.3.1. Target Dependent Code</title>`
		876	`<para>So far we have been considering programs which`
		877	`contain no conditional compilation, in which the API forms the`
		878	`basis of the separation of the target independent code (the whole`
		879	`program) and the target dependent code (the API implementation).`
		880	`But a glance at most large C programs will reveal that they do`
		881	`contain conditional compilation. The code is scattered with`
		882	`<code>#if</code>'s and <code>#ifdef</code>'s which, in effect,`
		883	`cause the pre-processor to construct slightly different programs`
		884	`on different target machines. So here we do not have a clean`
		885	`division between the target independent and the target dependent`
		886	`code - there are small sections of target dependent code spread`
		887	`throughout the program.</para>`
		888
		889	`<para>Let us briefly consider some of the reasons why it is`
		890	`necessary to introduce this conditional compilation. Some have`
		891	`already been mentioned - workarounds for compiler bugs, compiler`
		892	`limitations, and API implementation errors; others will be`
		893	`considered later. However the most interesting and important`
		894	`cases concern things which need to be done genuinely differently`
		895	`on different machines. This can be because they really cannot be`
		896	`expressed in a target independent manner, or because the target`
		897	`independent way of doing them is unacceptably inefficient.</para>`
		898
		899	`<para>Efficiency (either in terms of time or space) is a key issue`
		900	`in many programs. The argument is often advanced that writing a`
		901	`program portably means using the, often inefficient, lowest`
		902	`common denominator approach. But under our definition of`
		903	`portability it is the functionality that matters, not the actual`
		904	`source code. There is nothing to stop different code being used`
		905	`on different machines for reasons of efficiency.</para>`
		906
		907	`<para>To examine the relationship between target dependent code and`
		908	`APIs, consider the simple program:`
		909	`<programlisting>`
		910	`#include <stdio.h>`
		911
		912	`int main ()`
		913	`{`
		914	`#ifdef mips`
		915	`fputs ( "This machine is a mips\n", stdout ) ;`
		916	`#endif`
		917	`return ( 0 ) ;`
		918	`}`
		919	`</programlisting>`
		920	`which prints a message if the target machine is a mips. What`
		921	`is the API of this program? Basically it is the same as in the`
		922	`"Hello world" example discussed in sections 2.1.1 and 2.1.2, but if`
		923	`we wish the API to fully describe the interface between the program`
		924	`and the target machine, we must also say that whether or not the`
		925	`macro <code>mips</code> is defined is part of the API. Like the`
		926	`rest of the API, this has a semantic aspect as well as a syntactic`
		927	`- in this case that <code>mips</code> is only defined on mips`
		928	`machines. Where it differs is in its implementation. Whereas the`
		929	`main part of the API is implemented in the system headers and the`
		930	`system libraries, the implementation of either defining, or not`
		931	`defining, <code>mips</code> ultimately rests with the person`
		932	`performing the compilation. (In this particular example, the macro`
		933	`<code>mips</code> is normally built into the compiler on mips`
		934	`machines, but this is only a convention.)</para>`
		935
		936	`<para>So the API in this case has two components : a system-defined`
		937	`part which is implemented in the system headers and system`
		938	`libraries, and a user-defined part which ultimately relies on the`
		939	`person performing the compilation to provide an implementation.`
		940	`The main point to be made in this section is that introducing`
		941	`target dependent code is equivalent to introducing a user-defined`
		942	`component to the API. The actual compilation process in the case`
		943	`of programs containing target dependent code is basically the`
		944	`same as that shown in Fig. 1. But whereas previously the vertical`
		945	`division of the diagram also reflects a division of`
		946	`responsibility - the left hand side is the responsibility of the`
		947	`programmer (the person writing the program), and the right hand`
		948	`side of the API specifier (for example, a standards defining`
		949	`body) and the API implementor (the system vendor) - now the right`
		950	`hand side is partially the responsibility of the programmer and`
		951	`the person performing the compilation. The programmer specifies`
		952	`the user-defined component of the API, and the person compiling`
		953	`the program either implements this API (as in the mips example`
		954	`above) or chooses between a number of alternative implementations`
		955	`provided by the programmer (as in the example below).</para>`
		956
		957	`<para>Let us consider a more complex example. Consider the following`
		958	`program which assumes, for simplicity, that an <code>unsigned`
		959	`int</code> contains 32 bits:`
		960	`<programlisting>`
		961	`#include <stdio.h>`
		962	`#include "config.h"`
		963
		964	`#ifndef SLOW_SHIFT`
		965	`#define MSB ( a ) ( ( unsigned char ) ( a >> 24 ) )`
		966	`#else`
		967	`#ifdef BIG_ENDIAN`
		968	`#define MSB ( a ) ( ( unsigned char ) &( a ) )`
		969	`#else`
		970	`#define MSB ( a ) ( ( unsigned char ) &( a ) + 3 )`
		971	`#endif`
		972	`#endif`
		973
		974	`unsigned int x = 100000000 ;`
		975
		976	`int main ()`
		977	`{`
		978	`printf ( "%u\n", MSB ( x ) ) ;`
		979	`return ( 0 ) ;`
		980	`}`
		981	`</programlisting>`
		982	`The intention is to print the most significant byte of <code>`
		983	`x</code>. Three alternative definitions of the macro`
		984	`<code>MSB</code> used to extract this value are provided. The`
		985	`first, if <code>SLOW_SHIFT</code> is not defined, is simply to`
		986	`shift the value right by 24 bits. This will work on all 32-bit`
		987	`machines, but may be inefficient (depending on the nature of the`
		988	`machine's shift instruction). So two alternatives are provided.`
		989	`An <code>unsigned int</code> is assumed to consist of four`
		990	`<code>unsigned char</code>'s. On a big-endian machine, the most`
		991	`significant byte is the first of these <code>unsigned`
		992	`char</code>'s; on a little-endian machine it is the fourth. The`
		993	`second definition of <code>MSB</code> is intended to reflect the`
		994	`former case, and the third the latter.</para>`
		995
		996	`<para>The person compiling the program has to choose between the`
		997	`three possible implementations of <code>MSB</code> provided by`
		998	`the programmer. This is done by either defining, or not defining,`
		999	`the macros <code>SLOW_SHIFT</code> and <code>BIG_ENDIAN</code>.`
		1000	`This could be done as command line options, but we have chosen to`
		1001	`reflect another commonly used device, the configuration file. For`
		1002	`each target machine, the programmer provides a version of the`
		1003	`file <code>config.h</code> which defines the appropriate`
		1004	`combination of the macros <code>SLOW_SHIFT</code> and`
		1005	`<code>BIG_ENDIAN</code>. The person performing the compilation`
		1006	`simply chooses the appropriate <code>config.h</code> for the`
		1007	`target machine.</para>`
		1008
		1009	`<para>There are two possible ways of looking at what the`
		1010	`user-defined API of this program is. Possibly it is most natural`
		1011	`to say that it is <code>MSB</code>, but it could also be argued`
		1012	`that it is the macros <code>SLOW_SHIFT</code> and`
		1013	`<code>BIG_ENDIAN</code>. The former more accurately describes the`
		1014	`target dependent code, but is only implemented indirectly, via`
		1015	`the latter.</para>`
		1016	`</sect3>`
		1017
		1018	`<sect3>`
		1019	`<title id="S19">2.3.2. Making APIs Explicit</title>`
		1020	`<para>As`
		1021	`we have said, every program has an API even if it is implicit`
		1022	`rather than explicit. Every system header included, every type or`
		1023	`value used from it, and every library routine used, adds to the`
		1024	`system-defined component of the API, and every conditional`
		1025	`compilation adds to the user-defined component. What making the`
		1026	`API explicit does is to encapsulate the set of requirements that`
		1027	`the program has of the target machine (including requirements`
		1028	`like, I need to know whether or not the target machine is`
		1029	`big-endian, as well as, I need <code>fputs</code> to be`
		1030	`implemented as in the ANSI standard). By making these`
		1031	`requirements explicit it is made absolutely clear what is needed`
		1032	`on a target machine if a program is to be ported to it. If the`
		1033	`requirements are not explicit this can only be found by trial and`
		1034	`error. This is what we meant earlier by saying that a program`
		1035	`without an explicit API is only portable by accident.</para>`
		1036
		1037	`<para>Another advantage of specifying the requirements of a program`
		1038	`is that it may increase their chances of being implemented. We`
		1039	`have spoken as if porting is a one-way process; program writers`
		1040	`porting their programs to new machines. But there is also traffic`
		1041	`the other way. Machine vendors may wish certain programs to be`
		1042	`ported to their machines. If these programs come with a list of`
		1043	`requirements then the vendor knows precisely what to implement in`
		1044	`order to make such a port possible.</para>`
		1045	`</sect3>`
		1046
		1047	`<sect3>`
		1048	`<title id="S20">2.3.3. Choosing an API</title>`
		1049	`<para>So how`
		1050	`does one go about choosing an API? In a sense the user-defined`
		1051	`component is easier to specify than the system-defined component`
		1052	`because it is less tied to particular implementation models. What`
		1053	`is required is to abstract out what exactly needs to be done in a`
		1054	`target dependent manner and to decide how best to separate it`
		1055	`out. The most difficult problem is how to make the implementation`
		1056	`of this API as simple as possible for the person performing the`
		1057	`compilation, if necessary providing a number of alternative`
		1058	`implementations to choose between and a simple method of making`
		1059	`this choice (for example, the <code>config.h</code> file above).`
		1060	`With the system-defined component the question is more likely to`
		1061	`be, how do the various target machines I have in mind implement`
		1062	`what I want to do? The abstraction of this is usually to choose a`
		1063	`standard and widely implemented API, such as POSIX, which`
		1064	`provides all the necessary functionality.</para>`
		1065
		1066	`<para>The choice of "standard" API is of course influenced by the`
		1067	`type of target machines one has in mind. Within the Unix world,`
		1068	`the increasing adoption of Open Standards, such as POSIX, means`
		1069	`that choosing a standard API which is implemented on a wide`
		1070	`variety Unix boxes is becoming easier. Similarly, choosing an API`
		1071	`which will work on most MSDOS machines should cause few problems.`
		1072	`The difficulty is that these are disjoint worlds; it is very`
		1073	`difficult to find a standard API which is implemented on both`
		1074	`Unix and MSDOS machines. At present not much can be done about`
		1075	`this, it reflects the disjoint nature of the computer market.</para>`
		1076
		1077	`<para>To develop a similar point : the drawback of choosing POSIX`
		1078	`(for example) as an API is that it restricts the range of`
		1079	`possible target machines to machines which implement POSIX. Other`
		1080	`machines, for example, BSD compliant machines, might offer the`
		1081	`same functionality (albeit using different methods), so they`
		1082	`should be potential target machines, but they have been excluded`
		1083	`by the choice of API. One approach to the problem is the`
		1084	`"alternative API" approach. Both the POSIX and the BSD variants`
		1085	`are built into the program, but only one is selected on any given`
		1086	`target machine by means of conditional compilation. Under our`
		1087	`"equivalent functionality" definition of portability, this is a`
		1088	`program which is portable to both POSIX and BSD compliant`
		1089	`machines. But viewed in the light of the discussion above, if we`
		1090	`regard a program as a program-API pair, it could be regarded as`
		1091	`two separate programs combined on a single source code tree. A`
		1092	`more interesting approach would be to try to abstract out what`
		1093	`exactly the functionality which both POSIX and BSD offer is and`
		1094	`use that as the API. Then instead of two separate APIs we would`
		1095	`have a single API with two broad classes of implementations. The`
		1096	`advantage of this latter approach becomes clear if wished to port`
		1097	`the program to a machine which implements neither POSIX nor BSD,`
		1098	`but provides the equivalent functionality in a third way.</para>`
		1099
		1100	`<para>As a simple example, both POSIX and BSD provide very similar`
		1101	`methods for scanning the entries of a directory. The main`
		1102	`difference is that the POSIX version is defined in`
		1103	`<code>dirent.h</code> and uses a structure called <code>struct`
		1104	`dirent</code>, whereas the BSD version is defined in`
		1105	`<code>sys/dir.h</code> and calls the corresponding structure`
		1106	`<code>struct direct</code>. The actual routines for manipulating`
		1107	`directories are the same in both cases. So the only abstraction`
		1108	`required to unify these two APIs is to introduce an abstract`
		1109	`type, <code>dir_entry</code> say, which can be defined by:`
		1110	`<programlisting>`
		1111	`typedef struct dirent dir_entry ;`
		1112	`</programlisting>`
		1113	`on POSIX machines, and:`
		1114	`<programlisting>`
		1115	`typedef struct direct dir_entry ;`
		1116	`</programlisting>`
		1117	`on BSD machines. Note how this portion of the API crosses the`
		1118	`system-user boundary. The object <code>dir_entry</code> is defined`
		1119	`in terms of the objects in the system headers, but the precise`
		1120	`definition depends on a user-defined value (whether the target`
		1121	`machine implements POSIX or BSD).</para>`
		1122	`</sect3>`
		1123
		1124	`<sect3>`
		1125	`<title id="S21">2.3.4. Alternative Program Versions</title>`
		1126	`<para>Another reason for introducing conditional`
		1127	`compilation which relates to APIs is the desire to combine`
		1128	`several programs, or versions of programs, on a single source`
		1129	`tree. There are several cases to be distinguished between. The`
		1130	`reuse of code between genuinely different programs does not`
		1131	`really enter the argument : any given program will only use one`
		1132	`route through the source tree, so there is no real conditional`
		1133	`compilation per se in the program. What is more interesting is`
		1134	`the use of conditional compilation to combine several versions of`
		1135	`the same program on the same source tree to provide additional or`
		1136	`alternative features.</para>`
		1137
		1138	`<para>It could be argued that the macros (or whatever) used to`
		1139	`select between the various versions of the program are just part`
		1140	`of the user-defined API as before. But consider a simple program`
		1141	`which reads in some numerical input, say, processes it, and`
		1142	`prints the results. This might, for example, have POSIX as its`
		1143	`API. We may wish to optionally enhance this by displaying the`
		1144	`results graphically rather than textually on machines which have`
		1145	`X Windows, the compilation being conditional on some boolean`
		1146	`value, <code>HAVE_X_WINDOWS</code>, say. What is the API of the`
		1147	`resultant program? The answer from the point of view of the`
		1148	`program is the union of POSIX, X Windows and the user-defined`
		1149	`value <code>HAVE_X_WINDOWS</code>. But from the implementation`
		1150	`point of view we can either implement POSIX and set`
		1151	`<code>HAVE_X_WINDOWS</code> to false, or implement both POSIX and`
		1152	`X Windows and set <code>HAVE_X_WINDOWS</code> to true. So what`
		1153	`introducing <code>HAVE_X_WINDOWS</code> does is to allow`
		1154	`flexibility in the API implementation.</para>`
		1155
		1156	`<para>This is very similar to the alternative APIs discussed above.`
		1157	`However the approach outlined will really only work for optional`
		1158	`API extensions. To work in the alternative API case, we would`
		1159	`need to have the union of POSIX, BSD and a boolean value, say, as`
		1160	`the API. Although this is possible in theory, it is likely to`
		1161	`lead to namespace clashes between POSIX and BSD.</para>`
		1162	`</sect3>`
		1163	`</sect2>`
		1164	`</sect1>`
		1165
		1166	`<appendix>`
		1167	`<title>Appendix: Namespaces and APIs</title>`
		1168	`<para>Namespace problems are`
		1169	`amongst the most difficult faced by standard defining bodies (for`
		1170	`example, the ANSI and POSIX committees) and they often go to`
		1171	`great lengths to specify which names should, and should not,`
		1172	`appear when certain headers are included. (The position is set`
		1173	`out in D. F. Prosser, <i>Header and name space rules for UNIX`
		1174	`systems</i> (private communication), USL, 1993.)</para>`
		1175
		1176	`<para>For example, the intention, certainly in ANSI, is that each`
		1177	`header should operate as an independent sub-API. Thus`
		1178	`<code>va_list</code> is prohibited from appearing in the`
		1179	`namespace when <code>stdio.h</code> is included (it is defined`
		1180	`only in <code>stdarg.h</code>) despite the fact that it appears`
		1181	`in the prototype:`
		1182	`<programlisting>`
		1183	`int vprintf ( char *, va_list ) ;`
		1184	`</programlisting>`
		1185	`This seeming contradiction is worked round on most`
		1186	`implementations by defining a type <code>__va_list</code> in <code>`
		1187	`stdio.h</code> which has exactly the same definition as`
		1188	`<code>va_list</code>, and declaring <code>vprintf</code> as:`
		1189	`<programlisting>`
		1190	`int vprintf ( char *, __va_list ) ;`
		1191	`</programlisting>`
		1192	`This is only legal because <code>__va_list</code> is deemed`
		1193	`not to corrupt the namespace because of the convention that names`
		1194	`beginning with <code>__</code> are reserved for implementation use.</para>`
		1195
		1196	`<para>This particular namespace convention is well-known, but there`
		1197	`are others defined in these standards which are not generally`
		1198	`known (and since no compiler I know tests them, not widely`
		1199	`adhered to). For example, the ANSI header <code>errno.h</code>`
		1200	`reserves all names given by the regular expression:`
		1201	`<programlisting>`
		1202	`E[0-9A-Z][0-9a-z_A-Z]+`
		1203	`</programlisting>`
		1204	`against macros (i.e. in all namespaces). By prohibiting the`
		1205	`user from using names of this form, the intention is to protect`
		1206	`against namespace clashes with extensions of the ANSI API which`
		1207	`introduce new error numbers. It also protects against a particular`
		1208	`implementation of these extensions - namely that new error numbers`
		1209	`will be defined as macros.</para>`
		1210
		1211	`<para>A better example of protecting against particular`
		1212	`implementations comes from POSIX. If <code>sys/stat.h</code> is`
		1213	`included names of the form:`
		1214	`<programlisting>`
		1215	`st_[0-9a-z_A-Z]+`
		1216	`</programlisting>`
		1217	`are reserved against macros (as member names). The intention`
		1218	`here is not only to reserve field selector names for future`
		1219	`extensions to <code>struct stat</code> (which would only affect API`
		1220	`implementors, not ordinary users), but also to reserve against the`
		1221	`possibility that these field selectors might be implemented by`
		1222	`macros. So our <code>st_atime</code> example in section 2.2.3 is`
		1223	`strictly illegal because the procedure name <code>st_atime</code>`
		1224	`lies in a restricted namespace. Indeed the namespace is restricted`
		1225	`precisely to disallow this program.</para>`
		1226
		1227	`<para>As an exercise to the reader, how many of your programs use`
		1228	`names from the following restricted namespaces (all drawn from`
		1229	`ANSI, all applying to all namespaces)?`
		1230	`<programlisting>`
		1231	`is[a-z][0-9a-z_A-Z]+ (ctype.h)`
		1232	`to[a-z][0-9a-z_A-Z]+ (ctype.h)`
		1233	`str[a-z][0-9a-z_A-Z]+ (stdlib.h)`
		1234	`</programlisting>`
		1235	`With the TDF approach of describing APIs in abstract terms`
		1236	`using the <code>#pragma token</code> syntax most of these namespace`
		1237	`restrictions are seen to be superfluous. When a target independent`
		1238	`header is included precisely the objects defined in that header in`
		1239	`that version of the API appear in the namespace. There are no`
		1240	`worries about what else might happen to be in the header, because`
		1241	`there is nothing else. Also implementation details are separated`
		1242	`off to the TDF library building, so possible namespace pollution`
		1243	`through particular implementations does not arise.</para>`
		1244
		1245	`<para>Currently TDF does not have a neat way of solving the`
		1246	`<code>va_list</code> problem. The present target independent`
		1247	`headers use a similar workaround to that described above`
		1248	`(exploiting a reserved namespace). (See the footnote in section`
		1249	`3.4.1.1.)</para>`
		1250
		1251	`<para>None of this is intended as criticism of the ANSI or POSIX`
		1252	`standards. It merely shows some of the problems that can arise`
		1253	`from the insufficient separation of code.</para>`
		1254	`</appendix>`
		1255
		1256	`<sect1>`
		1257	`<title>3. TDF</title>`
		1258	`<para>Having discussed many of the problems involved`
		1259	`with writing portable programs, we now eventually turn to TDF.`
		1260	`Firstly a brief technical overview is given, indicating those`
		1261	`features of TDF which facilitate the separation of program.`
		1262	`Secondly the TDF compilation scheme is described. It is shown how`
		1263	`the features of TDF are exploited to aid in the separation of`
		1264	`target independent and target dependent code which we have`
		1265	`indicated as characterising portable programs. Finally, the`
		1266	`various constituents of this scheme are considered individually,`
		1267	`and their particular roles are described in more detail.</para>`
		1268
		1269	`<sect2 id="S23">`
		1270	`<title>3.1. Features of TDF</title>`
		1271	`<para>It is not the purpose of this paper`
		1272	`to explain the exact specification of TDF - this is described`
		1273	`elsewhere (see [6] and [4]) - but rather to show how its general`
		1274	`design features make it suitable as an aid to writing portable`
		1275	`programs.</para>`
		1276
		1277	`<para>TDF is an abstraction of high-level languages - it contains`
		1278	`such things as <code>exps</code> (abstractions of expressions and`
		1279	`statements), <code>shapes</code> (abstractions of types) and`
		1280	`<code>tags</code> (abstractions of variable identifiers). In`
		1281	`general form it is an abstract syntax tree which is flattened and`
		1282	`encoded as a series of bits, called a <code>capsule</code>. This`
		1283	`fairly high level of definition (for a compiler intermediate`
		1284	`language) means that TDF is architecture neutral in the sense`
		1285	`that it makes no assumptions about the underlying processor`
		1286	`architecture.</para>`
		1287
		1288	`<para>The translation of a capsule to and from the corresponding`
		1289	`syntax tree is totally unambiguous, also TDF has a "universal"`
		1290	`semantic interpretation as defined in the TDF specification.</para>`
		1291
		1292	`<sect3>`
		1293	`<title id="S24">3.1.1. Capsule Structure</title>`
		1294	`<para>A TDF`
		1295	`capsule consists of a number of units of various types. These are`
		1296	`embedded in a general linkage scheme (see Fig. 2). Each unit`
		1297	`contains a number of variable objects of various sorts (for`
		1298	`example, tags and tokens) which are potentially visible to other`
		1299	`units. Within the unit body each variable object is identified by`
		1300	`a unique number. The linking is via a set of variable objects`
		1301	`which are global to the entire capsule. These may in turn be`
		1302	`associated with external names. For example, in Fig. 2, the`
		1303	`fourth variable of the first unit is identified with the first`
		1304	`variable of the third unit, and both are associated with the`
		1305	`fourth external name.</para>`
		1306
		1307	`<para>FIGURE 2. TDF Capsule Structure</para>`
		1308
		1309	`<img src="../images/tdf_link.gif" />`
		1310	`<para>`
		1311	`This capsule structure means that the combination of a number of`
		1312	`capsules to form a single capsule is a very natural operation.`
		1313	`The actual units are copied unchanged into the resultant capsule`
		1314	`- it is only the surrounding linking information that needs`
		1315	`changing. Many criteria could be used to determine how this`
		1316	`linking is to be organised, but the simplest is to link two`
		1317	`objects if and only if they have the same external name. This is`
		1318	`the scheme that the current TDF linker has implemented.`
		1319	`Furthermore such operations as changing an external name or`
		1320	`removing it altogether ("hiding") are very simple under this`
		1321	`linking scheme.</para>`
		1322	`</sect3>`
		1323
		1324	`<sect3 id="S25">`
		1325	`<title>3.1.2. Tokens</title>`
		1326	`<para>>So, the`
		1327	`combination of program at this high level is straightforward. But`
		1328	`TDF also provides another mechanism which allows for the`
		1329	`combination of program at the syntax tree level, namely`
		1330	`<code>tokens</code>. Virtually any node of the TDF tree may be a`
		1331	`token : a place holder which stands for a subtree. Before the TDF`
		1332	`can be decoded fully the definition of this token must be`
		1333	`provided. The token definition is then macro substituted for the`
		1334	`token in the decoding process to form the complete tree (see Fig.`
		1335	`3).</para>`
		1336
		1337	`<para>FIGURE 3. TDF Tokens</para>`
		1338
		1339	`<img src="../images/token.gif" />`
		1340	`<para>Tokens may also take arguments (see Fig. 4). The actual argument`
		1341	`values (from the main tree) are substituted for the formal`
		1342	`parameters in the token definition.</para>`
		1343
		1344	`<para>FIGURE 4. TDF Tokens (with Arguments)</para>`
		1345
		1346	`<img src="../images/token_args.gif" />`
		1347	`<para>As mentioned above, tokens are one of the types of variable`
		1348	`objects which are potentially visible to external units. This`
		1349	`means that a token does not have to be defined in the same unit`
		1350	`as it is used in. Nor do these units have originally to have come`
		1351	`from the same capsule, provided they have been linked before they`
		1352	`need to be fully decoded. Tokens therefore provide a mechanism`
		1353	`for the low-level separation and combination of code.</para>`
		1354	`</sect3>`
		1355	`</sect2>`
		1356
		1357	`<sect2 id="S26">`
		1358	`<title>3.2. TDF Compilation Phases</title>`
		1359	`<para>We have seen how one of the`
		1360	`great strengths of TDF is the fact that it facilitates the`
		1361	`separation and combination of program. We now demonstrate how`
		1362	`this is applied in the TDF compilation strategy. This section is`
		1363	`designed only to give an outline of this scheme. The various`
		1364	`constituent phases are discussed in more detail later.</para>`
		1365
		1366	`<para>Again we start with the simplest case, where the program`
		1367	`contains no target dependent code. The strategy is illustrated in`
		1368	`Fig. 5, which should be compared with the traditional compilation`
		1369	`strategy shown in Fig. 1. The general layout of the diagrams is`
		1370	`the same. The left halves of the diagrams refers to the program`
		1371	`itself, and the right halves to the corresponding API. The top`
		1372	`halves refer to machine independent material, and the bottom`
		1373	`halves to what happens on each target machine. Thus, as before,`
		1374	`the portable program appears in the top left of the diagram, and`
		1375	`the corresponding API in the top right.</para>`
		1376
		1377	`<para>The first thing to note is that, whereas previously all the`
		1378	`compilation took place on the target machines, here the`
		1379	`compilation has been split into a target independent (C ->`
		1380	`TDF) part, called <code>production</code>, and a target dependent`
		1381	`(TDF -> target) part, called <code>installation</code> . One`
		1382	`of the synonyms for TDF is ANDF, Architecture Neutral`
		1383	`Distribution Format, and we require that the production is`
		1384	`precisely that - architecture neutral - so that precisely the`
		1385	`same TDF is installed on all the target machines.</para>`
		1386
		1387	`<para>This architecture neutrality necessitates a separation of`
		1388	`code. For example, in the "Hello world" example discussed in`
		1389	`sections 2.1.1 and 2.1.2, the API specifies that there shall be a`
		1390	`type <code>FILE</code> and an object <code>stdout</code> of type`
		1391	`<code>FILE *</code>, but the implementations of these may be`
		1392	`different on all the target machines. Thus we need to be able to`
		1393	`abstract out the code for <code>FILE</code> and`
		1394	`<code>stdout</code> from the TDF output by the producer, and`
		1395	`provide the appropriate (target dependent) definitions for these`
		1396	`objects in the installation phase.</para>`
		1397
		1398	`<para>FIGURE 5. TDF Compilation Phases</para>`
		1399
		1400	`<img src="../images/tdf_scheme.gif" />`
		1401
		1402	`<sect3 id="S27">`
		1403	`<title>3.2.1. API Description (Top Right)</title>`
		1404	`<para>The method used for this separation is the token`
		1405	`mechanism. Firstly the syntactic element of the API is described`
		1406	`in the form of a set of target independent headers. Whereas the`
		1407	`target dependent, system headers contain the actual`
		1408	`implementation of the API on a particular machine, the target`
		1409	`independent headers express to the producer what is actually in`
		1410	`the API, and which may therefore be assumed to be common to all`
		1411	`compliant target machines. For example, in the target independent`
		1412	`headers for the ANSI standard, there will be a file`
		1413	`<code>stdio.h</code> containing the lines:`
		1414	`<programlisting>`
		1415	`#pragma token TYPE FILE # ansi.stdio.FILE`
		1416	`#pragma token EXP rvalue : FILE * : stdout # ansi.stdio.stdout`
		1417	`#pragma token FUNC int ( const char , FILE ) : fputs # ansi.stdio.fputs`
		1418	`</programlisting>`
		1419	`These <code>#pragma token</code> directives are extensions to`
		1420	`the C syntax which enable the expression of abstract syntax`
		1421	`information to the producer. The directives above tell the producer`
		1422	`that there exists a type called <code>FILE</code>, an expression`
		1423	`<code>stdout</code> which is an rvalue (that is, a non-assignable`
		1424	`value) of type <code>FILE *</code>, and a procedure`
		1425	`<code>fputs</code> with prototype:`
		1426	`<programlisting>`
		1427	`int fputs ( const char , FILE ) ;`
		1428	`</programlisting>`
		1429	`and that it should leave their values unresolved by means of`
		1430	`tokens (for more details on the <code>#pragma token</code>`
		1431	`directive see [3]). Note how the information in the target`
		1432	`independent header precisely reflects the syntactic information in`
		1433	`the ANSI API.</para>`
		1434
		1435	`<para>The names <code>ansi.stdio.FILE</code> etc. give the external`
		1436	`names for these tokens, those which will be visible at the`
		1437	`outermost layer of the capsule; they are intended to be unique`
		1438	`(this is discussed below). It is worth making the distinction`
		1439	`between the internal names and these external token names. The`
		1440	`former are the names used to represent the objects within C, and`
		1441	`the latter the names used within TDF to represent the tokens`
		1442	`corresponding to these objects.</para>`
		1443	`</sect3>`
		1444
		1445	`<sect3 id="S28">`
		1446	`<title>3.2.2. Production (Top Left)</title>`
		1447	`<para>Now the producer can compile the program using`
		1448	`these target independent headers. As will be seen from the "Hello`
		1449	`world" example, these headers contain sufficient information to`
		1450	`check that the program is syntactically correct. The produced,`
		1451	`target independent, TDF will contain tokens corresponding to the`
		1452	`various uses of <code>stdout</code>, <code>fputs</code> and so`
		1453	`on, but these tokens will be left undefined. In fact there will`
		1454	`be other undefined tokens in the TDF. The basic C types,`
		1455	`<code>int</code> and <code>char</code> are used in the program,`
		1456	`and their implementations may vary between target machines. Thus`
		1457	`these types must also be represented by tokens. However these`
		1458	`tokens are implicit in the producer rather than explicit in the`
		1459	`target independent headers.</para>`
		1460
		1461	`<para>Note also that because the information in the target`
		1462	`independent headers describes abstractly the contents of the API`
		1463	`and not some particular implementation of it, the producer is in`
		1464	`effect checking the program against the API itself.</para>`
		1465	`</sect3>`
		1466
		1467	`<sect3 id="S29">`
		1468	`<title>3.2.3. API Implementation (Bottom Right)</title>`
		1469	`<para>Before the TDF output by the producer can be`
		1470	`decoded fully it needs to have had the definitions of the tokens`
		1471	`it has left undefined provided. These definitions will be`
		1472	`potentially different on all target machines and reflect the`
		1473	`implementation of the API on that machine.</para>`
		1474
		1475	`<para>The syntactic details of the implementation are to be found in`
		1476	`the system headers. The process of defining the tokens describing`
		1477	`the API (called TDF library building) consists of comparing the`
		1478	`implementation of the API as given in the system headers with the`
		1479	`abstract description of the tokens comprising the API given in`
		1480	`the target independent headers. The token definitions thus`
		1481	`produced are stored as TDF libraries, which are just archives of`
		1482	`TDF capsules.</para>`
		1483
		1484	`<para>For example, in the example implementation of`
		1485	`<code>stdio.h</code> given in section 2.1.2, the token`
		1486	`<code>ansi.stdio.FILE</code> will be defined as the TDF compound`
		1487	`shape corresponding to the structure defining the type`
		1488	`<code>FILE</code> (recall the distinction between internal and`
		1489	`external names). <code>__iob</code> will be an undefined tag`
		1490	`whose shape is an array of 60 copies of the shape given by the`
		1491	`token <code>ansi.stdio.FILE</code>, and the token`
		1492	`<code>ansi.stdio.stdout</code> will be defined to be the TDF`
		1493	`expression corresponding to a pointer to the second element of`
		1494	`this array. Finally the token <code>ansi.stdio.fputs</code> is`
		1495	`defined to be the effect of applying the procedure given by the`
		1496	`undefined tag <code>fputs</code>. (In fact, this picture has been`
		1497	`slightly simplified for the sake of clarity. See the section on C`
		1498	`-> TDF mappings in section 3.3.2.)</para>`
		1499
		1500	`<para>These token definitions are created using exactly the same C`
		1501	`-> TDF translation program as is used in the producer phase.`
		1502	`This program knows nothing about the distinction between target`
		1503	`independent and target dependent TDF, it merely translates the C`
		1504	`it is given (whether from a program or a system header) into TDF.`
		1505	`It is the compilation process itself which enables the separation`
		1506	`of target independent and target dependent TDF.</para>`
		1507
		1508	`<para>In addition to the tokens made explicit in the API, the`
		1509	`implicit tokens built into the producer must also have their`
		1510	`definitions inserted into the TDF libraries. The method of`
		1511	`definition of these tokens is slightly different. The definitions`
		1512	`are automatically deduced by, for example, looking in the target`
		1513	`machine's <code>limits.h</code> header to find the local values`
		1514	`of <code>CHAR_MIN</code> and <code>CHAR_MAX</code> , and deducing`
		1515	`the definition of the token corresponding to the C type`
		1516	`<code>char</code> from this. It will be the <code>variety</code>`
		1517	`(the TDF abstraction of integer types) consisting of all integers`
		1518	`between these values.</para>`
		1519
		1520	`<para>Note that what we are doing in the main library build is`
		1521	`checking the actual implementation of the API against the`
		1522	`abstract syntactic description. Any variations of the syntactic`
		1523	`aspects of the implementation from the API will therefore show`
		1524	`up. Thus library building is an effective way of checking the`
		1525	`syntactic conformance of a system to an API. Checking the`
		1526	`semantic conformance is far more difficult - we shall return to`
		1527	`this issue later.</para>`
		1528	`</sect3>`
		1529
		1530	`<sect3 id="S30">`
		1531	`<title>3.2.4. Installation (Bottom Left)</title>`
		1532	`<para>The installation phase is now straightforward. The`
		1533	`target independent TDF representing the program contains various`
		1534	`undefined tokens (corresponding to objects in the API), and the`
		1535	`definitions for these tokens on the particular target machine`
		1536	`(reflecting the API implementation) are to be found in the local`
		1537	`TDF libraries. It is a natural matter to link these to form a`
		1538	`complete, target dependent, TDF capsule. The rest of the`
		1539	`installation consists of a straightforward translation phase (TDF`
		1540	`-> target) to produce a binary object file, and linking with`
		1541	`the system libraries to form a final executable. Linking with the`
		1542	`system libraries will resolve any tags left undefined in the TDF.</para>`
		1543	`</sect3>`
		1544
		1545	`<sect3 id="S31">`
		1546	`<title>3.2.5. Illustrated Example</title>`
		1547	`<para>In`
		1548	`order to help clarify exactly what is happening where, Fig. 6`
		1549	`shows a simple example superimposed on the TDF compilation`
		1550	`diagram.</para>`
		1551
		1552	`<para>FIGURE 6. Example Compilation</para>`
		1553
		1554	`<img src="../images/eg_scheme.gif" />`
		1555	`<para>The program to be translated is simply:`
		1556	`<programlisting>`
		1557	`FILE f ;`
		1558	`</programlisting>`
		1559	`and the API is as above, so that <code>FILE</code> is an`
		1560	`abstract type. This API is described as target independent headers`
		1561	`containing the <code>#pragma token</code> statements given above.`
		1562	`The producer combines the program with the target independent`
		1563	`headers to produce a target independent capsule which declares a`
		1564	`tag <code>f</code> whose shape is given by the token representing`
		1565	`<code>FILE</code>, but leaves this token undefined. In the API`
		1566	`implementation, the local definition of the type <code>FILE</code>`
		1567	`from the system headers is translated into the definition of this`
		1568	`token by the library building process. Finally in the installation,`
		1569	`the target independent capsule is combined with the local token`
		1570	`definition library to form a target dependent capsule in which all`
		1571	`the tokens used are also defined. This is then installed further as`
		1572	`described above.</para>`
		1573	`</sect3>`
		1574	`</sect2>`
		1575
		1576	`<sect2 id="S32">`
		1577	`<title>3.3. Aspects of the TDF System</title>Let us now consider in`
		1578	`more detail some of the components of the TDF system and how they`
		1579	`fit into the compilation scheme.`
		1580
		1581	`<sect3 id="S33">`
		1582	`<title>3.3.1. The C to TDF Producer</title>`
		1583	`<para>Above it was emphasised how the design of the`
		1584	`compilation strategy aids the representation of program in a`
		1585	`target independent manner, but this is not enough in itself. The`
		1586	`C -> TDF producer must represent everything symbolically; it`
		1587	`cannot make assumptions about the target machine. For example,`
		1588	`the line of C containing the initialisation:`
		1589	`<programlisting>`
		1590	`int a = 1 + 1 ;`
		1591	`</programlisting>`
		1592	`is translated into TDF representing precisely that, 1 + 1,`
		1593	`not 2, because it does not know the representation of`
		1594	`<code>int</code> on the target machine. The installer does know`
		1595	`this, and so is able to replace 1 + 1 by 2 (provided this is`
		1596	`actually true).</para>`
		1597
		1598	`<para>As another example, in the structure:`
		1599	`<programlisting>`
		1600	`struct tag {`
		1601	`int a ;`
		1602	`double b ;`
		1603	`} ;`
		1604	`</programlisting>`
		1605	`the producer does not know the actual value in bits of the`
		1606	`offset of the second field from the start of the structure - it`
		1607	`depends on the sizes of <code>int</code> and <code>double</code>`
		1608	`and the alignment rules on the target machine. Instead it`
		1609	`represents it symbolically (it is the size of <code>int</code>`
		1610	`rounded up to a multiple of the alignment of <code>double</code>).`
		1611	`This level of abstraction makes the tokenisation required by the`
		1612	`target independent API headers very natural. If we only knew that`
		1613	`there existed a structure <code>struct tag</code> with a field`
		1614	`<code>b</code> of type <code>double</code> then it is perfectly`
		1615	`simple to use a token to represent the (unknown) offset of this`
		1616	`field from the start of the structure rather than using the`
		1617	`calculated (known) value. Similarly, when it comes to defining this`
		1618	`token in the library building phase (recall that this is done by`
		1619	`the same C -> TDF translation program as the production) it is a`
		1620	`simple matter to define the token to be the calculated value.</para>`
		1621
		1622	`<para>Furthermore, because all the producer's operations are`
		1623	`performed at this very abstract level, it is a simple matter to`
		1624	`put in extra portability checks. For example, it would be a`
		1625	`relatively simple task to put most of the functionality of`
		1626	`<code>lint</code> (excluding intermodular checking) or`
		1627	`<code>gcc</code>'s <b>-Wall</b> option into the producer, and`
		1628	`moreover have these checks applied to an abstract machine rather`
		1629	`than a particular target machine. Indeed a number of these checks`
		1630	`have already been implemented.</para>`
		1631
		1632	`<para>These extra checks are switched on and off by using`
		1633	`<code>#pragma</code> statements. (For more details on the`
		1634	`<code>#pragma</code> syntax and which portability checks are`
		1635	`currently supported by the producer see [3].) For example, ANSI C`
		1636	`states that any undeclared function is assumed to return`
		1637	`<code>int</code>, whereas for strict portability checking it is`
		1638	`more useful to have undeclared functions marked as an error`
		1639	`(indeed for strict API checking this is essential). This is done`
		1640	`by inserting the line:`
		1641	`<programlisting>`
		1642	`#pragma no implicit definitions`
		1643	`</programlisting>`
		1644	`either at the start of each file to be checked or, more`
		1645	`simply, in a start-up file - a file which can be`
		1646	`<code>#include</code>'d at the start of each source file by means`
		1647	`of a command line option.</para>`
		1648
		1649	`<para>Because these checks can be turned off as well as on it is`
		1650	`possible to relax as well as strengthen portability checking.`
		1651	`Thus if a program is only intended to work on 32-bit machines, it`
		1652	`is possible to switch off certain portability checks. The whole`
		1653	`ethos underlying the producer is that these portability`
		1654	`assumptions should be made explicit, so that the appropriate`
		1655	`level of checking can be done.</para>`
		1656
		1657	`<para>As has been previously mentioned, the use of a single`
		1658	`front-end to any compiler not only virtually eliminates the`
		1659	`problems of differing code interpretation and compiler quirks,`
		1660	`but also reduces the exposure to compiler bugs. Of course, this`
		1661	`also applies to the TDF compiler, which has a single front-end`
		1662	`(the producer) and multiple back-ends (the installers). As`
		1663	`regards the syntax and semantics of the C language, the producer`
		1664	`is by default a strictly ANSI C compliant compiler. (Addition to`
		1665	`the October 1993 revision : Alas, this is no longer true; however`
		1666	`strict ANSI can be specified by means of a simple command line`
		1667	`option (see [1]). The decision whether to make the default strict`
		1668	`and allow people to relax it, or to make the default lenient and`
		1669	`allow people to strengthen it, is essentially a political one. It`
		1670	`does not really matter in technical terms provided the user is`
		1671	`made aware of exactly what each compilation mode means in terms`
		1672	`of syntax, semantics and portability checking.) However it is`
		1673	`possible to change its behaviour (again by means of`
		1674	`<code>#pragma</code> statements) to implement many of the`
		1675	`features found in "traditional" or "K&R" C. Hence it is`
		1676	`possible to precisely determine how the producer will interpret`
		1677	`the C code it is given by explicitly describing the C dialect it`
		1678	`is written in in terms of these <code>#pragma</code>`
		1679	`statements.</para>`
		1680	`</sect3>`
		1681
		1682	`<sect3 id="S34">`
		1683	`<title>3.3.2. C to TDF Mappings</title>`
		1684	`<para>The`
		1685	`nature of the C -> TDF transformation implemented by the`
		1686	`producer is worth considering, although not all the features`
		1687	`described in this section are fully implemented in the current`
		1688	`(October 1993) producer. Although it is only indirectly related`
		1689	`to questions of portability, this mapping does illustrate some of`
		1690	`the problems the producer has in trying to represent program in`
		1691	`an architecture neutral manner.</para>`
		1692
		1693	`<para>Once the initial difficulty of overcoming the syntactic and`
		1694	`semantic differences between the various C dialects is overcome,`
		1695	`the C -> TDF mapping is quite straightforward. In a hierarchy`
		1696	`from high level to low level languages C and TDF are not that`
		1697	`dissimilar - both come towards the bottom of what may`
		1698	`legitimately be regarded as high level languages. Thus the`
		1699	`constructs in C map easily onto the constructs of TDF (there are`
		1700	`a few exceptions, for example coercing integers to pointers,`
		1701	`which are discussed in [3]). Eccentricities of the C language`
		1702	`specification such as doing all integer arithmetic in the`
		1703	`promoted integer type are translated explicitly into TDF. So to`
		1704	`add two <code>char</code>'s, they are promoted to`
		1705	`<code>int</code>'s, added together as <code>int</code>'s, and the`
		1706	`result is converted back to a <code>char</code>. These rules are`
		1707	`not built directly into TDF because of the desire to support`
		1708	`languages other than C (and even other C dialects).</para>`
		1709
		1710	`<para>A number of issues arise when tokens are introduced. Consider`
		1711	`for example the type <code>size_t</code> from the ANSI standard.`
		1712	`This is a target dependent integer type, so bearing in mind what`
		1713	`was said above it is natural for the producer to use a tokenised`
		1714	`variety (the TDF representation of integer types) to stand for`
		1715	`<code>size_t</code>. This is done by a <code>#pragma token</code>`
		1716	`statement of the form:</para>`
		1717	`<programlisting>`
		1718	`#pragma token VARIETY size_t # ansi.stddef.size_t`
		1719	`</programlisting>But if we want to do arithmetic on <code>size_t</code>'s we`
		1720	`need to know the integer type corresponding to the integral`
		1721	`promotion of <code>size_t</code> . But this is again target`
		1722	`dependent, so it makes sense to have another tokenised variety`
		1723	`representing the integral promotion of <code>size_t</code>. Thus`
		1724	`the simple token directive above maps to (at least) two TDF tokens,`
		1725	`the type itself and its integral promotion.`
		1726
		1727	`<para>As another example, suppose that we have a target dependent C`
		1728	`type, <code>type</code> say, and we define a procedure which`
		1729	`takes an argument of type <code>type</code>. In both the`
		1730	`procedure body and at any call of the procedure the TDF we need`
		1731	`to produce to describe how C passes this argument will depend on`
		1732	`<code>type</code>. This is because C does not treat all procedure`
		1733	`argument types uniformly. Most types are passed by value, but`
		1734	`array types are passed by address. But whether or not`
		1735	`<code>type</code> is an array type is target dependent, so we`
		1736	`need to use tokens to abstract out the argument passing`
		1737	`mechanism. For example, we could implement the mechanism using`
		1738	`four tokens : one for the type <code>type</code> (which will be a`
		1739	`tokenised shape), one for the type an argument of type`
		1740	`<code>type</code> is passed as, <code>arg_type</code> say, (which`
		1741	`will be another tokenised shape), and two for converting values`
		1742	`of type <code>type</code> to and from the corresponding values of`
		1743	`type <code>arg_type</code> (these will be tokens which take one`
		1744	`exp argument and give an exp). For most types,`
		1745	`<code>arg_type</code> will be the same as <code>type</code> and`
		1746	`the conversion tokens will be identities, but for array types,`
		1747	`<code>arg_type</code> will be a pointer to <code>type</code> and`
		1748	`the conversion tokens will be "address of" and "contents of".</para>`
		1749
		1750	`<para>So there is not the simple one to one correspondence between`
		1751	`<code>#pragma token</code> directives and TDF tokens one might`
		1752	`expect. Each such directive maps onto a family of TDF tokens, and`
		1753	`this mapping in a sense encapsulates the C language`
		1754	`specification. Of course in the TDF library building process the`
		1755	`definitions of all these tokens are deduced automatically from`
		1756	`the local values.</para>`
		1757	`</sect3>`
		1758
		1759	`<sect3 id="S35">`
		1760	`<title>3.3.3. TDF Linking</title>`
		1761	`<para>We now move`
		1762	`from considering the components of the producer to those of the`
		1763	`installer. The first phase of the installation - linking in the`
		1764	`TDF libraries containing the token definitions describing the`
		1765	`local implementation of the API - is performed by a general`
		1766	`utility program, the TDF linker (or builder). This is a very`
		1767	`simple program which is used to combine a number of TDF capsules`
		1768	`and libraries into a single capsule. As has been emphasised`
		1769	`previously, the capsule structure means that this is a very`
		1770	`natural operation, but, as will be seen from the previous`
		1771	`discussion (particularly section 2.2.3), such combinatorial`
		1772	`phases are very prone to namespace problems.</para>`
		1773
		1774	`<para>In TDF tags, tokens and other externally named objects occupy`
		1775	`separate namespaces, and there are no constructs which can cut`
		1776	`across these namespaces in the way that the C macros do. There`
		1777	`still remains the problem that the only way to know that two`
		1778	`tokens, say, in different capsules are actually the same is if`
		1779	`they have the same name. This, as we have already seen in the`
		1780	`case of system linking, can cause objects to be identified`
		1781	`wrongly.</para>`
		1782
		1783	`<para>In the main TDF linking phase - linking in the token`
		1784	`definitions at the start of the installation - we are primarily`
		1785	`linking on token names, these tokens being those arising from the`
		1786	`use of the target independent headers. Potential namespace`
		1787	`problems are virtually eliminated by the use of unique external`
		1788	`names for the tokens in these headers (such as`
		1789	`<code>ansi.stdio.FILE</code> in the example above). This means`
		1790	`that there is a genuine one to one correspondence between tokens`
		1791	`and token names. Of course this relies on the external token`
		1792	`names given in the headers being genuinely unique. In fact, as is`
		1793	`explained below, these names are normally automatically`
		1794	`generated, and uniqueness of names within a given API is checked.`
		1795	`Also incorporating the API name into the token name helps to`
		1796	`ensure uniqueness across APIs. However the token namespace does`
		1797	`require careful management. (Note that the user does not normally`
		1798	`have access to the token namespace; all variable and procedure`
		1799	`names map into the tag namespace.)</para>`
		1800
		1801	`<para>We can illustrate the "clean" nature of TDF linking by`
		1802	`considering the <code>st_atime</code> example given in section`
		1803	`2.2.3. Recall that in the traditional compilation scheme the`
		1804	`problem arose, not because of the program or the API`
		1805	`implementation, but because of the way they were combined by the`
		1806	`pre-processor. In the TDF scheme the target independent version`
		1807	`of <code>sys/stat.h</code> will be included. Thus the procedure`
		1808	`name <code>st_atime</code> and the field selector`
		1809	`<code>st_atime</code> will be seen to belong to genuinely`
		1810	`different namespaces - there are no macros to disrupt this. The`
		1811	`former will be translated into a TDF tag with external name`
		1812	`<code>st_atime</code>, whereas the latter is translated into a`
		1813	`token with external name`
		1814	`<code>posix.stat.struct_stat.st_atime</code> , say. In the TDF`
		1815	`library reflecting the API implementation, the token`
		1816	`<code>posix.stat.struct_stat.st_atime</code> will be defined`
		1817	`precisely as the system header intended, as the offset`
		1818	`corresponding to the C field selector`
		1819	`<code>st_atim.st__sec</code>. The fact that this token is defined`
		1820	`using a macro rather than a conventional direct field selector is`
		1821	`not important to the library building process. Now the`
		1822	`combination of the program with the API implementation in this`
		1823	`case is straightforward - not only are the procedure name and the`
		1824	`field selector name in the TDF now different, but they also lie`
		1825	`in distinct namespaces. This shows how the separation of the API`
		1826	`implementation from the main program is cleaner in the TDF`
		1827	`compilation scheme than in the traditional scheme.</para>`
		1828
		1829	`<para>TDF linking also opens up new ways of combining code which may`
		1830	`solve some other namespace problems. For example, in the`
		1831	`<code>open</code> example in section 2.2.3, the name`
		1832	`<code>open</code> is meant to be internal to the program. It is`
		1833	`the fact that it is not treated as such which leads to the`
		1834	`problem. If the program consisted of a single source file then we`
		1835	`could make <code>open</code> a <code>static</code> procedure, so`
		1836	`that its name does not appear in the external namespace. But if`
		1837	`the program consists of several source files the external name is`
		1838	`necessary for intra-program linking. The TDF linker allows this`
		1839	`intra-program linking to be separated from the main system`
		1840	`linking. In the TDF compilation scheme described above each`
		1841	`source file is translated into a separate TDF capsule, which is`
		1842	`installed separately to a binary object file. It is only the`
		1843	`system linking which finally combines the various components into`
		1844	`a single program. An alternative scheme would be to use the TDF`
		1845	`linker to combine all the TDF capsules into a single capsule in`
		1846	`the production phase and install that. Because all the`
		1847	`intra-program linking has already taken place, the external names`
		1848	`required for it can be "hidden" - that is to say, removed from`
		1849	`the tag namespace. Only tag names which are used but not defined`
		1850	`(and so are not internal to the program) and <code>main</code>`
		1851	`should not be hidden. In effect this linking phase has made all`
		1852	`the internal names in the program (except <code>main</code>)`
		1853	`<code>static</code>.</para>`
		1854
		1855	`<para>In fact this type of complete program linking is not always`
		1856	`feasible. For very large programs the resulting TDF capsule can`
		1857	`to be too large for the installer to cope with (it is the system`
		1858	`assembler which tends to cause the most problems). Instead it may`
		1859	`be better to use a more judiciously chosen partial linking and`
		1860	`hiding scheme.</para>`
		1861	`</sect3>`
		1862
		1863	`<sect3 id="S36">`
		1864	`<title>3.3.4. The TDF Installers</title>`
		1865	`<para>>The`
		1866	`TDF installer on a given machine typically consists of four`
		1867	`phases: TDF linking, which has already been discussed,`
		1868	`translating TDF to assembly source code, translating assembly`
		1869	`source code to a binary object file, and linking binary object`
		1870	`files with the system libraries to form the final executable. The`
		1871	`latter two phases are currently implemented by the system`
		1872	`assembler and linker, and so are identical to the traditional`
		1873	`compilation scheme.</para>`
		1874
		1875	`<para>It is the TDF to assembly code translator which is the main`
		1876	`part of the installer. Although not strictly related to the`
		1877	`question of portability, the nature of the translator is worth`
		1878	`considering. Like the producer (and the assembler), it is a`
		1879	`transformational, as opposed to a combinatorial, compilation`
		1880	`phase. But whereas the transformation from C to TDF is`
		1881	`"difficult" because of the syntax and semantics of C and the need`
		1882	`to represent everything in an architecture neutral manner, the`
		1883	`transformation from TDF to assembly code is much easier because`
		1884	`of the unambiguous syntax and uniform semantics of TDF, and`
		1885	`because now we know the details of the target machine, it is no`
		1886	`longer necessary to work at such an abstract level.</para>`
		1887
		1888	`<para>The whole construction of the current generation of TDF`
		1889	`translators is based on the concept of compilation as`
		1890	`transformation. They represent the TDF they read in as a syntax`
		1891	`tree, virtually identical to the syntax tree comprising the TDF.`
		1892	`The translation process then consists of continually applying`
		1893	`transformations to this tree - in effect TDF -> TDF`
		1894	`transformations - gradually optimising it and changing it to a`
		1895	`form where the translation into assembly source code is a simple`
		1896	`transcription process (see [7]).</para>`
		1897
		1898	`<para>Even such operations as constant evaluation - replacing 1 + 1`
		1899	`by 2 in the example above - may be regarded as TDF -> TDF`
		1900	`transformations. But so may more complex optimisations such as`
		1901	`taking constants out of a loop, common sub-expression`
		1902	`elimination, strength reduction and so on. Some of these`
		1903	`transformations are universally applicable, others can only be`
		1904	`applied on certain classes of machines. This transformational`
		1905	`approach results in high quality code generation (see [5]) while`
		1906	`minimising the risk of transformational errors. Moreover the`
		1907	`sharing of so much code - up to 70% - between all the TDF`
		1908	`translators, like the introduction of a common front-end, further`
		1909	`reduces the exposure to compiler bugs.</para>`
		1910
		1911	`<para>Much of the machine ABI information is built into the`
		1912	`translator in a very simple way. For example, to evaluate the`
		1913	`offset of the field <code>b</code> in the structure <code>struct`
		1914	`tag</code> above, the producer has already done all the hard`
		1915	`work, providing a formula for the offset in terms of the sizes`
		1916	`and alignments of the basic C types. The translator merely`
		1917	`provides these values and the offset is automatically evaluated`
		1918	`by the constant evaluation transformations. Other aspects of the`
		1919	`ABI, for example the procedure argument and result passing`
		1920	`conventions, require more detailed attention.</para>`
		1921
		1922	`<para>One interesting range of optimisations implemented by many of`
		1923	`the current translators consists of the inlining of certain`
		1924	`standard procedure calls. For example, <code>strlen ( "hello"`
		1925	`)</code> is replaced by 5. As it stands this optimisation appears`
		1926	`to run the risk of corrupting the programmer's namespace - what`
		1927	`if <code>strlen</code> was a user-defined procedure rather than`
		1928	`the standard library routine (cf. the <code>open</code> example`
		1929	`in section 2.2.3)? This risk only materialises however if we`
		1930	`actually use the procedure name to spot this optimisation. In`
		1931	`code compiled from the target independent headers all calls to`
		1932	`the library routine <code>strlen</code> will be implemented by`
		1933	`means of a uniquely named token, <code>ansi.string.strlen</code>`
		1934	`say. It is by recognising this token name as the token is`
		1935	`expanded that the translators are able to ensure that this is`
		1936	`really the library routine <code>strlen</code>.</para>`
		1937
		1938	`<para>Another example of an inlined procedure of this type is`
		1939	`<code>alloca</code>. Many other compilers inline`
		1940	`<code>alloca</code>, or rather they inline`
		1941	`<code>__builtin_alloca</code> and rely on the programmer to`
		1942	`identify <code>alloca</code> with <code>__builtin_alloca</code>.`
		1943	`This gets round the potential namespace problems by getting the`
		1944	`programmer to confirm that <code>alloca</code> in the program`
		1945	`really is the library routine <code>alloca</code>. By the use of`
		1946	`tokens this information is automatically provided to the TDF`
		1947	`translators.</para>`
		1948	`</sect3>`
		1949	`</sect2>`
		1950
		1951	`<sect2 id="S37">`
		1952	`<title>3.4. TDF and APIs</title>`
		1953	`<para>What the discussion above has`
		1954	`emphasised is that the ability to describe APIs abstractly as`
		1955	`target independent headers underpins the entire TDF approach to`
		1956	`portability. We now consider this in more detail.</para>`
		1957
		1958	`<sect3 id="S38">`
		1959	`<title>3.4.1. API Description</title>`
		1960	`<para>The`
		1961	`process of transforming an API specification into its description`
		1962	`in terms of <code>#pragma token</code> directives is a`
		1963	`time-consuming but often fascinating task. In this section we`
		1964	`discuss some of the issues arising from the process of describing`
		1965	`an API in this way.</para>`
		1966
		1967	`<sect4 id="S39">`
		1968	`<title>3.4.1.1. The Description Process</title>`
		1969	`<para>As may be observed from the example given in`
		1970	`section 3.2.1, the <code>#pragma token</code> syntax is not`
		1971	`necessarily intuitively obvious. It is designed to be a low-level`
		1972	`description of tokens which is capable of expressing many complex`
		1973	`token specifications. Most APIs are however specified in C-like`
		1974	`terms, so an alternative syntax, closer to C, has been developed`
		1975	`in order to facilitate their description. This is then`
		1976	`transformed into the corresponding <code>#pragma token</code>`
		1977	`directives by a specification tool called <code>tspec</code> (see`
		1978	`[2]), which also applies a number of checks to the input and`
		1979	`generates the unique token names. For example, the description`
		1980	`leading to the example above was:`
		1981	`<programlisting>`
		1982	`+TYPE FILE ;`
		1983	`+EXP FILE *stdout ;`
		1984	`+FUNC int fputs ( const char , FILE ) ;`
		1985	`</programlisting>`
		1986	`Note how close this is to the English language specification`
		1987	`of the API given previously. (There are a number of open issues`
		1988	`relating to <code>tspec</code> and the <code>#pragma token</code>`
		1989	`syntax, mainly concerned with determining the type of syntactic`
		1990	`statements that it is desired to make about the APIs being`
		1991	`described. The current scheme is adequate for those APIs so far`
		1992	`considered, but it may need to be extended in future.)</para>`
		1993
		1994	`<para><code>tspec</code> is not capable of expressing the full power`
		1995	`of the <code>#pragma token</code> syntax. Whereas this makes it`
		1996	`easier to use in most cases, for describing the normal C-like`
		1997	`objects such as types, expressions and procedures, it cannot`
		1998	`express complex token descriptions. Instead it is necessary to`
		1999	`express these directly in the <code>#pragma token</code> syntax.`
		2000	`However this is only rarely required : the constructs`
		2001	`<code>offsetof</code>, <code>va_start</code> and`
		2002	`<code>va_arg</code> from ANSI are the only examples so far`
		2003	`encountered during the API description programme at DRA. For`
		2004	`example, <code>va_arg</code> takes an assignable expression of`
		2005	`type <code>va_list</code> and a type <code>t</code> and returns`
		2006	`an expression of type <code>t</code>. Clearly, this cannot be`
		2007	`expressed abstractly in C-like terms; so the <code>#pragma`
		2008	`token</code> description:`
		2009	`<programlisting>`
		2010	`#pragma token PROC ( EXP lvalue : va_list : e, TYPE t )\`
		2011	`EXP rvalue : t : va_arg # ansi.stdarg.va_arg`
		2012	`</programlisting>`
		2013	`must be used instead.</para>`
		2014
		2015	`<para>Most of the process of describing an API consists of going`
		2016	`through its English language specification transcribing the`
		2017	`object specifications it gives into the <code>tspec</code> syntax`
		2018	`(if the specification is given in a machine readable form this`
		2019	`process can be partially automated). The interesting part`
		2020	`consists of trying to interpret what is written and reading`
		2021	`between the lines as to what is meant. It is important to try to`
		2022	`represent exactly what is in the specification rather than being`
		2023	`influenced by one's knowledge of a particular implementation,`
		2024	`otherwise the API checking phase of the compilation will not be`
		2025	`checking against what is actually in the API but against a`
		2026	`particular way of implementing it.</para>`
		2027
		2028	`<para>There is a continuing API description programme at DRA. The`
		2029	`current status (October 1993) is that ANSI (X3.159), POSIX`
		2030	`(1003.1), XPG3 (X/Open Portability Guide 3) and SVID (System V`
		2031	`Interface Definition, 3rd Edition) have been described and`
		2032	`extensively tested. POSIX2 (1003.2), XPG4, AES (Revision A), X11`
		2033	`(Release 5) and Motif (Version 1.1) have been described, but not`
		2034	`yet extensively tested.</para>`
		2035
		2036	`<para>There may be some syntactic information in the paper API`
		2037	`specifications which <code>tspec</code> (and the <code>#pragma`
		2038	`token</code> syntax) is not yet capable of expressing. In`
		2039	`particular, some APIs go into very careful management of`
		2040	`namespaces within the API, explicitly spelling out exactly what`
		2041	`should, and should not, appear in the namespaces as each header`
		2042	`is included (see the appendix on namespaces and APIs below). What`
		2043	`is actually being done here is to regard each header as an`
		2044	`independent sub-API. There is not however a sufficiently`
		2045	`developed "API calculus" to allow such relationships to be easily`
		2046	`expressed.</para>`
		2047	`</sect4>`
		2048
		2049	`<sect4 id="S40">`
		2050	`<title>3.4.1.2. Resolving Conflicts</title>`
		2051	`<para>>Another consideration during the description`
		2052	`process is to try to integrate the various API descriptions. For`
		2053	`example, POSIX extends ANSI, so it makes sense to have the target`
		2054	`independent POSIX headers include the corresponding ANSI headers`
		2055	`and just add the new objects introduced by POSIX. This does`
		2056	`present problems with APIs which are basically compatible but`
		2057	`have a small number of incompatibilities, whether deliberate or`
		2058	`accidental. As an example of an "accidental" incompatibility,`
		2059	`XPG3 is an extension of POSIX, but whereas POSIX declares`
		2060	`<code>malloc</code> by means of the prototype:`
		2061	`<programlisting>`
		2062	`void *malloc(size_t);`
		2063	`</programlisting>`
		2064	`XPG3 declares it by means of the traditional procedure`
		2065	`declaration:`
		2066	`<programlisting>`
		2067	`void *malloc(s)`
		2068	`size_t s;`
		2069	`</programlisting>`
		2070	`These are surely intended to express the same thing, but in`
		2071	`the first case the argument is passed as a <code>size_t</code> and`
		2072	`in the second it is firstly promoted to the integer promotion of`
		2073	`<code>size_t</code>. On most machines these are compatible, either`
		2074	`because of the particular implementation of <code>size_t</code>, or`
		2075	`because the procedure calling conventions make them compatible.`
		2076	`However in general they are incompatible, so the target independent`
		2077	`headers either have to reflect this or have to read between the`
		2078	`lines and assume that the incompatibility was accidental and ignore`
		2079	`it.</para>`
		2080
		2081	`<para>As an example of a deliberate incompatibility, both XPG3 and`
		2082	`SVID3 declare a structure <code>struct msqid_ds</code> in`
		2083	`<code>sys/msg.h</code> which has fields <code>msg_qnum</code> and`
		2084	`<code>msg_qbytes</code>. The difference is that whereas XPG3`
		2085	`declares these fields to have type <code>unsigned short</code>,`
		2086	`SVID3 declares them to have type <code>unsigned long</code>.`
		2087	`However for most purposes the precise types of these fields is`
		2088	`not important, so the APIs can be unified by making the types of`
		2089	`these fields target dependent. That is to say, tokenised integer`
		2090	`types <code>__msg_q_t</code> and <code>__msg_l_t</code> are`
		2091	`introduced. On XPG3-compliant machines these will both be defined`
		2092	`to be <code>unsigned short</code>, and on SVID3-compliant`
		2093	`machines they will both be <code>unsigned long</code>. So,`
		2094	`although strict XPG3 and strict SVID3 are incompatible, the two`
		2095	`extension APIs created by adding these types are compatible. In`
		2096	`the rare case when the precise type of these fields is important,`
		2097	`the strict APIs can be recovered by defining the field types to`
		2098	`be <code>unsigned short</code> or <code>unsigned long</code> at`
		2099	`produce-time rather than at install-time. (XPG4 uses a similar`
		2100	`technique to resolve this incompatibility. But whereas the XPG4`
		2101	`types need to be defined explicitly, the tokenised types are`
		2102	`defined implicitly according to whatever the field types are on a`
		2103	`particular machine.)</para>`
		2104
		2105	`<para>This example shows how introducing extra abstractions can`
		2106	`resolve potential conflicts between APIs. But it may also be used`
		2107	`to resolve conflicts between the API specification and the API`
		2108	`implementations. For example, POSIX specifies that the structure`
		2109	`<code>struct flock</code> defined in <code>fcntl.h</code> shall`
		2110	`have a field <code>l_pid</code> of type <code>pid_t</code>.`
		2111	`However on at least two of the POSIX implementations examined at`
		2112	`DRA, <code>pid_t</code> was implemented as an <code>int</code>,`
		2113	`but the <code>l_pid</code> field of <code>struct flock</code> was`
		2114	`implemented as a <code>short</code> (this showed up in the TDF`
		2115	`library building process). The immediate reaction might be that`
		2116	`these system have not implemented POSIX correctly, so they should`
		2117	`be cast into the outer darkness. However for the vast majority of`
		2118	`applications, even those which use the <code>l_pid</code> field,`
		2119	`its precise type is not important. So the decision was taken to`
		2120	`introduce a tokenised integer type, <code>__flock_pid_t</code>,`
		2121	`to stand for the type of the <code>l_pid</code> field. So`
		2122	`although the implementations do not conform to strict POSIX, they`
		2123	`do to this slightly more relaxed extension. Of course, one could`
		2124	`enforce strict POSIX by defining <code>__flock_pid_t</code> to be`
		2125	`<code>pid_t</code> at produce-time, but the given implementations`
		2126	`would not conform to this stricter API.</para>`
		2127
		2128	`<para>Both the previous two examples are really concerned with the`
		2129	`question of determining the correct level of abstraction in API`
		2130	`specification. Abstraction is inclusive and allows for API`
		2131	`evolution, whereas specialisation is exclusive and may lead to`
		2132	`dead-end APIs. The SVID3 method of allowing for longer messages`
		2133	`than XPG3 - changing the <code>msg_qnum</code> and`
		2134	`<code>msg_qbytes</code> fields of <code>struct msqid_ds</code>`
		2135	`from <code>unsigned short</code> to <code>unsigned long</code> -`
		2136	`is an over-specialisation which leads to an unnecessary conflict`
		2137	`with XPG3. The XPG4 method of achieving exactly the same end -`
		2138	`abstracting the types of these fields - is, by contrast, a smooth`
		2139	`evolutionary path.</para>`
		2140	`</sect4>`
		2141
		2142	`<sect4 id="S41">`
		2143	`<title>3.4.1.3. The Benefits of API Description</title>`
		2144	`<para>The description process is potentially of`
		2145	`great benefit to bodies involved in API specification. While the`
		2146	`specification itself stays on paper the only real existence of`
		2147	`the API is through its implementations. Giving the specification`
		2148	`a concrete form means not only does it start to be seen as an`
		2149	`object in its own right, rather than some fuzzy document`
		2150	`underlying the real implementations, but also any omissions,`
		2151	`insufficient specifications (where what is written down does not`
		2152	`reflect what the writer actually meant) or built-in assumptions`
		2153	`are more apparent. It may also be able to help show up the kind`
		2154	`of over-specialisation discussed above. The concrete`
		2155	`representation also becomes an object which both applications and`
		2156	`implementations can be automatically checked against. As has been`
		2157	`mentioned previously, the production phase of the compilation`
		2158	`involves checking the program against the abstract API`
		2159	`description, and the library building phase checks the syntactic`
		2160	`aspect of the implementation against it.</para>`
		2161
		2162	`<para>The implementation checking aspect is considered below. Let us`
		2163	`here consider the program checking aspect by re-examining the`
		2164	`examples given in section 2.2.4.1. The <code>SIGKILL</code>`
		2165	`example is straightforward; <code>SIGKILL</code> will appear in`
		2166	`the POSIX version of <code>signal.h</code> but not the ANSI`
		2167	`version, so if the program is compiled with the target`
		2168	`independent ANSI headers it will be reported as being undefined.`
		2169	`In a sense this is nothing to do with the <code>#pragma`
		2170	`token</code> syntax, but with the organisation of the target`
		2171	`independent headers. The other examples do however rely on the`
		2172	`fact that the <code>#pragma token</code> syntax can express`
		2173	`syntactic information in a way which is not possible directly`
		2174	`from C. Thus the target independent headers express exactly the`
		2175	`fact that <code>time_t</code> is an arithmetic type, about which`
		2176	`nothing else is known. Thus <code>( t & 1 )</code> is not`
		2177	`type correct for a <code>time_t t</code> because the binary`
		2178	`<code>&</code> operator does not apply to all arithmetic`
		2179	`types. Similarly, for the type <code>div_t</code> the target`
		2180	`independent headers express the information that there exists a`
		2181	`structure type <code>div_t</code> and field selectors`
		2182	`<code>quot</code> and <code>rem</code> of <code>div_t</code> of`
		2183	`type <code>int</code>, but nothing about the order of these`
		2184	`fields or the existence of other fields. Thus any attempt to`
		2185	`initialise a <code>div_t</code> will fail because the`
		2186	`correspondence between the values in the initialisation and the`
		2187	`fields of the structure is unknown. The <code>struct`
		2188	`dirent</code> example is entirely analogous, except that here the`
		2189	`declarations of the structure type <code>struct dirent</code> and`
		2190	`the field selector <code>d_name</code> appear in both the POSIX`
		2191	`and XPG3 versions of <code>dirent.h</code>, whereas the field`
		2192	`selector <code>d_ino</code> appears only in the XPG3 version.</para>`
		2193	`</sect4>`
		2194	`</sect3>`
		2195
		2196	`<sect3 id="S42">`
		2197	`<title>3.4.2. TDF Library Building</title>`
		2198	`<para>As`
		2199	`we have said, two of the primary problems with writing portable`
		2200	`programs are dealing with API implementation errors on the target`
		2201	`machines - objects not being defined, or being defined in the`
		2202	`wrong place, or being implemented incorrectly - and namespace`
		2203	`problems - particularly those introduced by the system headers.`
		2204	`The most interesting contrast between the traditional compilation`
		2205	`scheme (Fig. 1) and the TDF scheme (Fig. 5) is that in the former`
		2206	`the program comes directly into contact with the "real world" of`
		2207	`messy system headers and incorrectly implemented APIs, whereas in`
		2208	`the latter there is an "ideal world" layer interposed. This`
		2209	`consists of the target independent headers, which describe all`
		2210	`the syntactic features of the API where they are meant to be, and`
		2211	`with no extraneous material to clutter up the namespaces (like`
		2212	`<code>index</code> and the macro <code>st_atime</code> in the`
		2213	`examples given in section 2.2.3), and the TDF libraries, which`
		2214	`can be combined "cleanly" with the program without any namespace`
		2215	`problems. All the unpleasantness has been shifted to the`
		2216	`interface between this "ideal world" and the "real world"; that`
		2217	`is to say, the TDF library building.</para>`
		2218
		2219	`<para>The importance of this change may be summarised by observing`
		2220	`that previously all the unpleasantnesses happened in the left`
		2221	`hand side of the diagram (the program half), whereas in the TDF`
		2222	`scheme they are in the right hand side (the API half). So API`
		2223	`implementation problems are seen to be a genuinely separate issue`
		2224	`from the main business of writing programs; the ball is firmly in`
		2225	`the API implementor's court rather than the programmer's. Also`
		2226	`the problems need to be solved once per API rather than once per`
		2227	`program.</para>`
		2228
		2229	`<para>It might be said that this has not advanced us very far`
		2230	`towards actually dealing with the implementation errors. The API`
		2231	`implementation still contains errors whoever's responsibility it`
		2232	`is. But the TDF library building process gives the API`
		2233	`implementor a second chance. Many of the syntactic implementation`
		2234	`problems will be shown up as the library builder compares the`
		2235	`implementation against the abstract API description, and it may`
		2236	`be possible to build corrections into the TDF libraries so that`
		2237	`the libraries reflect, not the actual implementation, but some`
		2238	`improved version of it.</para>`
		2239
		2240	`<para>To show how this might be done, we reconsider the examples of`
		2241	`API implementation errors given in section 2.2.4.2. As before we`
		2242	`may divide our discussion between system header problems and`
		2243	`system library problems. Recall however the important`
		2244	`distinction, that whereas previously the programmer was trying to`
		2245	`deal with these problems in a way which would work on all`
		2246	`machines (top left of the compilation diagrams), now the person`
		2247	`building the TDF libraries is trying to deal with implementation`
		2248	`problems for a particular API on a particular machine (bottom`
		2249	`right).</para>`
		2250
		2251	`<sect4 id="S43">`
		2252	`<title>3.4.2.1. System Header Problems</title>`
		2253	`<para>Values which are defined in the wrong place,`
		2254	`such as <code>SEEK_SET</code> in the example given, present no`
		2255	`difficulties. The library builder will look where it expects to`
		2256	`find them and report that they are undefined. To define these`
		2257	`values it is merely a matter of telling the library builder where`
		2258	`they are actually defined (in <code>unistd.h</code> rather than`
		2259	`<code>stdio.h</code>).</para>`
		2260
		2261	`<para>Similarly, values which are undefined are also reported. If`
		2262	`these values can be deduced from other information, then it is a`
		2263	`simple matter to tell the library builder to use these deduced`
		2264	`values. For example, if <code>EXIT_SUCCESS</code> and`
		2265	`<code>EXIT_FAILURE</code> are undefined, it is probably possible`
		2266	`to deduce their values from experimentation or experience (or`
		2267	`guesswork).</para>`
		2268
		2269	`<para>Wrongly defined values are more difficult. Firstly they are`
		2270	`not necessarily detected by the library builder because they are`
		2271	`semantic rather than syntactic errors. Secondly, whereas it is`
		2272	`easy to tell the library builder to use a corrected value rather`
		2273	`than the value given in the implementation, this mechanism needs`
		2274	`to be used with circumspection. The system libraries are provided`
		2275	`pre-compiled, and they have been compiled using the system`
		2276	`headers. If we define these values differently in the TDF`
		2277	`libraries we are effectively changing the system headers, and`
		2278	`there is a risk of destroying the interface with the system`
		2279	`libraries. For example, changing a structure is not a good idea,`
		2280	`because different parts of the program - the main body and the`
		2281	`parts linked in from the system libraries - will have different`
		2282	`ideas of the size and layout of this structure. (See the`
		2283	`<code>struct flock</code> example in section 3.4.1.2 for a`
		2284	`potential method of resolving such implementation problems.)</para>`
		2285
		2286	`<para>In the two cases given above - <code>DBL_MAX</code> and`
		2287	`<code>size_t</code> - the necessary changes are probably "safe".`
		2288	`<code>DBL_MAX</code> is not a special value in any library`
		2289	`routines, and changing <code>size_t</code> from <code>int</code>`
		2290	`to <code>unsigned int</code> does not affect its size, alignment`
		2291	`or procedure passing rules (at least not on the target machines`
		2292	`we have in mind) and so should not disrupt the interface with the`
		2293	`system library.</para>`
		2294	`</sect4>`
		2295
		2296	`<sect4 id="S44">`
		2297	`<title>3.4.2.2. System Library Problems</title>`
		2298	`<para>Errors in the system libraries will not be`
		2299	`detected by the TDF library builder because they are semantic`
		2300	`errors, whereas the library building process is only checking`
		2301	`syntax. The only realistic ways of detecting semantic problems is`
		2302	`by means of test suites, such as the Plum-Hall or CVSA library`
		2303	`tests for ANSI and VSX for XPG3, or by detailed knowledge of`
		2304	`particular API implementations born of personal experience.`
		2305	`However it may be possible to build workarounds for problems`
		2306	`identified in these tests into the TDF libraries.</para>`
		2307
		2308	`<para>For example, the problem with <code>realloc</code> discussed`
		2309	`in section 2.2.4.4 could be worked around by defining the token`
		2310	`representing <code>realloc</code> to be the equivalent of:`
		2311	`<programlisting>`
		2312	`#define realloc ( p, s ) ( void *q = ( p ) ? ( realloc ) ( q, s ) : malloc ( s ) )`
		2313	`</programlisting>`
		2314	`(where the C syntax has been extended to allow variables to`
		2315	`be introduced inside expressions) or:`
		2316	`<programlisting>`
		2317	`static void __realloc ( void p, size_t s )`
		2318	`{`
		2319	`if ( p == NULL ) return ( malloc ( s ) ) ;`
		2320	`return ( ( realloc ) ( p, s ) ) ;`
		2321	`}`
		2322
		2323	`#define realloc ( p, s ) __realloc ( p, s )`
		2324	`</programlisting>`
		2325	`Alternatively, the token definition could be encoded directly`
		2326	`into TDF (not via C), using the TDF notation compiler (see [9]).</para>`
		2327	`</sect4>`
		2328
		2329	`<sect4 id="S45">`
		2330	`<title>3.4.2.3. TDF Library Builders</title>`
		2331	`<para>The discussion above shows how the TDF libraries`
		2332	`are an extra layer which lies on top of the existing system API`
		2333	`implementation, and how this extra layer can be exploited to`
		2334	`provide corrections and workarounds to various implementation`
		2335	`problems. The expertise of particular API implementation problems`
		2336	`on particular machines can be captured once and for all in the`
		2337	`TDF libraries, rather than being spread piecemeal over all the`
		2338	`programs which use that API implementation. But being able to`
		2339	`encapsulate this expertise in this way makes it a marketable`
		2340	`quantity. One could envisage a market in TDF libraries: ranging`
		2341	`from libraries closely reflecting the actual API implementation`
		2342	`to top of the range libraries with many corrections and`
		2343	`workarounds built in.</para>`
		2344
		2345	`<para>All of this has tended to paint the system vendors as the`
		2346	`villains of the piece for not providing correct API`
		2347	`implementations, but this is not entirely fair. The reason why`
		2348	`API implementation errors may persist over many operating system`
		2349	`releases is that system vendors have as many porting problems as`
		2350	`anyone else - preparing a new operating system release is in`
		2351	`effect a huge porting exercise - and are understandably reluctant`
		2352	`to change anything which basically works. The use of TDF`
		2353	`libraries could be a low-risk strategy for system vendors to`
		2354	`allow users the benefits of API conformance without changing the`
		2355	`underlying operating system.</para>`
		2356
		2357	`<para>Of course, if the system vendor's porting problems could be`
		2358	`reduced, they would have more confidence to make their underlying`
		2359	`systems more API conformant, and thereby help reduce the normal`
		2360	`programmer's porting problems. So whereas using the TDF libraries`
		2361	`might be a short-term workaround for API implementation problems,`
		2362	`the rest of the TDF porting system might help towards a long-term`
		2363	`solution.</para>`
		2364
		2365	`<para>Another interesting possibility arises. As we said above, many`
		2366	`APIs, for example POSIX and BSD, offer equivalent functionality`
		2367	`by different methods. It may be possible to use the TDF library`
		2368	`building process to express one in terms of the other. For`
		2369	`example, in the <code>struct dirent</code> example10 in section`
		2370	`2.3.3, the only differences between POSIX and BSD were that the`
		2371	`BSD version was defined in a different header and that the`
		2372	`structure was called <code>struct direct</code>. But this`
		2373	`presents no problems to the TDF library builder : it is perfectly`
		2374	`simple to tell it to look in <code>sys/dir.h</code> instead of`
		2375	`<code>dirent.h</code> , and to identify <code>struct`
		2376	`direct</code> with <code>struct dirent</code>. So it may be`
		2377	`possible to build a partial POSIX lookalike on BSD systems by`
		2378	`using the TDF library mechanism.</para>`
		2379	`</sect4>`
		2380	`</sect3>`
		2381	`</sect2>`
		2382
		2383	`<sect2 id="S46">`
		2384	`<title>3.5. TDF and Conditional Compilation</title>`
		2385	`<para>So far our`
		2386	`discussion of the TDF approach to portability has been confined`
		2387	`to the simplest case, where the program itself contains no target`
		2388	`dependent code. We now turn to programs which contain conditional`
		2389	`compilation. As we have seen, many of the reasons why it is`
		2390	`necessary to introduce conditional compilation into the`
		2391	`traditional compilation process either do not arise or are seen`
		2392	`to be distinct phases in the TDF compilation process. The use of`
		2393	`a single front-end (the producer) virtually eliminates problems`
		2394	`of compiler limitations and differing interpretations and reduces`
		2395	`compiler bug problems, so it is not necessary to introduce`
		2396	`conditionally compiled workarounds for these. Also API`
		2397	`implementation problems, another prime reason for introducing`
		2398	`conditional compilation in the traditional scheme, are seen to be`
		2399	`isolated in the TDF library building process, thereby allowing`
		2400	`the programmer to work in an idealised world one step removed`
		2401	`from the real API implementations. However the most important`
		2402	`reason for introducing conditional compilation is where things,`
		2403	`for reasons of efficiency or whatever, are genuinely different on`
		2404	`different machines. It is this we now consider.</para>`
		2405
		2406	`<sect3 id="S47">`
		2407	`<title>3.5.1. User-Defined APIs</title>The`
		2408	`things which are done genuinely differently on different machines`
		2409	`have previously been characterised as comprising the user-defined`
		2410	`component of the API. So the real issue in this case is how to`
		2411	`use the TDF API description and representation methods within`
		2412	`one's own programs. A very simple worked example is given below`
		2413	`(in section 3.5.2), for more detailed examples see [8].`
		2414
		2415	`<para>For the <code>MSB</code> example given in section 2.3 we`
		2416	`firstly have to decide what the user-defined API is. To fully`
		2417	`reflect exactly what the target dependent code is, we could`
		2418	`define the API, in <code>tspec</code> terms, to be:`
		2419	`<programlisting>`
		2420	`+MACRO unsigned char MSB ( unsigned int a ) ;`
		2421	`</programlisting>`
		2422	`where the macro <code>MSB</code> gives the most significant`
		2423	`byte of its argument, <code>a</code>. Let us say that the`
		2424	`corresponding <code>#pragma token</code> statement is put into the`
		2425	`header <code>msb.h</code>. Then the program can be recast into the`
		2426	`form:`
		2427	`<programlisting>`
		2428	`#include <stdio.h>`
		2429	`#include "msb.h"`
		2430
		2431	`unsigned int x = 100000000 ;`
		2432
		2433	`int main ()`
		2434	`{`
		2435	`printf ( "%u\n", MSB ( x ) ) ;`
		2436	`return ( 0 ) ;`
		2437	`}`
		2438	`</programlisting>`
		2439	`The producer will compile this into a target independent TDF`
		2440	`capsule which uses a token to represent the use of`
		2441	`<code>MSB</code>, but leaves this token undefined. The only`
		2442	`question that remains is how this token is defined on the target`
		2443	`machine; that is, how the user-defined API is implemented. On each`
		2444	`target machine a TDF library containing the local definition of the`
		2445	`token representing <code>MSB</code> needs to be built. There are`
		2446	`two basic possibilities. Firstly the person performing the`
		2447	`installation could build the library directly, by compiling a`
		2448	`program of the form:`
		2449	`<programlisting>`
		2450	`#pragma implement interface "msb.h"`
		2451	`#include "config.h"`
		2452
		2453	`#ifndef SLOW_SHIFT`
		2454	`#define MSB ( a ) ( ( unsigned char ) ( a >> 24 ) )`
		2455	`#else`
		2456	`#ifdef BIG_ENDIAN`
		2457	`#define MSB ( a ) ( ( unsigned char ) &( a ) )`
		2458	`#else`
		2459	`#define MSB ( a ) ( ( unsigned char ) &( a ) + 3 )`
		2460	`#endif`
		2461	`#endif`
		2462	`</programlisting>`
		2463	`with the appropriate <code>config.h</code> to choose the`
		2464	`correct local implementation of the interface described in`
		2465	`<code>msb.h</code>. Alternatively the programmer could provide`
		2466	`three alternative TDF libraries corresponding to the three`
		2467	`implementations, and let the person installing the program choose`
		2468	`between these. The two approaches are essentially equivalent, they`
		2469	`just provide for making the choice of the implementation of the`
		2470	`user-defined component of the API in different ways. An interesting`
		2471	`alternative approach would be to provide a short program which does`
		2472	`the selection between the provided API implementations`
		2473	`automatically. This approach might be particularly effective in`
		2474	`deciding which implementation offers the best performance on a`
		2475	`particular target machine.</para>`
		2476	`</sect3>`
		2477
		2478	`<sect3>`
		2479	`<title id="S48">3.5.2. User Defined Tokens - Example</title>`
		2480	`<para>As an example of how to define a simple token`
		2481	`consider the following example. We have a simple program which`
		2482	`prints "hello" in some language, the language being target`
		2483	`dependent. Our first task is choose an API. We choose ANSI C`
		2484	`extended by a tokenised object <code>hello</code> of type`
		2485	`<code>char *</code> which gives the message to be printed. This`
		2486	`object will be an rvalue (i.e. it cannot be assigned to). For`
		2487	`convenience this token is declared in a header file,`
		2488	`<code>tokens.h</code> say. This particular case is simple enough`
		2489	`to encode by hand; it takes the form:`
		2490	`<programlisting>`
		2491	`#pragma token EXP rvalue : char * : hello #`
		2492	`#pragma interface hello`
		2493	`</programlisting>consisting of a <code>#pragma token</code> directive`
		2494	`describing the object to be tokenised, and a <code>#pragma`
		2495	`interface</code> directive to show that this is the only object in`
		2496	`the API. An alternative would be to generate <code>tokens.h</code>`
		2497	`from a <code>tspec</code> specification of the form:`
		2498	`<programlisting>`
		2499	`+EXP char *hello ;`
		2500	`</programlisting>The next task is to write the program conforming to this API.`
		2501	`This may take the form of a single source file,`
		2502	`<code>hello.c</code>, containing the lines:`
		2503	`<programlisting>`
		2504	`#include <stdio.h>`
		2505	`#include "tokens.h"`
		2506
		2507	`int main ()`
		2508	`{`
		2509	`printf ( "%s\n", hello ) ;`
		2510	`return ( 0 ) ;`
		2511	`}`
		2512	`</programlisting>The production process may be specified by means of a <code>`
		2513	`Makefile</code>. This uses the TDF C compiler, <code>tcc</code>,`
		2514	`which is an interface to the TDF system which is designed to be`
		2515	`like <code>cc</code>, but with extra options to handle the extra`
		2516	`functionality offered by the TDF system (see [1]).`
		2517	`<programlisting>`
		2518	`produce : hello.j`
		2519	`echo "PRODUCTION COMPLETE"`
		2520
		2521	`hello.j : hello.c tokens.h`
		2522	`echo "PRODUCTION : C->TDF"`
		2523	`tcc -Fj hello.c`
		2524	`</programlisting>The production is run by typing <code>make produce</code>.`
		2525	`The ANSI API is the default, and so does not need to be specified`
		2526	`to <code>tcc</code>. The program <code>hello.c</code> is compiled`
		2527	`to a target independent capsule, <code>hello.j</code>. This will`
		2528	`use a token to represent <code>hello</code>, but it will be left`
		2529	`undefined.`
		2530
		2531	`<para>On each target machine we need to create a token library`
		2532	`giving the local definitions of the objects in the API. We shall`
		2533	`assume that the library corresponding to the ANSI C API has`
		2534	`already been constructed, so that we only need to define the`
		2535	`token representing <code>hello</code>. This is done by means of a`
		2536	`short C program, <code>tokens.c</code>, which implements the`
		2537	`tokens declared in <code>tokens.h</code>. This might take the`
		2538	`form:</para>`
		2539	`<programlisting>`
		2540	`#pragma implement interface "tokens.h"`
		2541	`#define hello "bonjour"`
		2542	`</programlisting>to define <code>hello</code> to be "bonjour". On a different`
		2543	`machine, the definition of <code>hello</code> could be given as`
		2544	`"hello", "guten Tag", "zdrastvetye" (excuse my transliteration) or`
		2545	`whatever (including complex expressions as well as simple strings).`
		2546	`Note the use of <code>#pragma implement interface</code> to`
		2547	`indicate that we are now implementing the API described in`
		2548	`<code>tokens.h</code>, as opposed to the use of`
		2549	`<code>#include</code> earlier when we were just using the API.`
		2550
		2551	`<para>The installation process may be specified by adding the`
		2552	`following lines to the <code>Makefile</code>:</para>`
		2553	`<programlisting>`
		2554	`install : hello`
		2555	`echo "INSTALLATION COMPLETE"`
		2556
		2557	`hello : hello.j tokens.tl`
		2558	`echo "INSTALLATION : TDF->TARGET"`
		2559	`tcc -o hello -J. -jtokens hello.j`
		2560
		2561	`tokens.tl : tokens.j`
		2562	`echo "LIBRARY BUILDING : LINKING LIBRARY"`
		2563	`tcc -Ymakelib -o tokens.tl tokens.j`
		2564
		2565	`tokens.j : tokens.c tokens.h`
		2566	`echo "LIBRARY BUILDING : DEFINING TOKENS"`
		2567	`tcc -Fj -not_ansi tokens.c`
		2568	`</programlisting>The complete installation process is run by typing <code>make`
		2569	`install</code>. Firstly the file <code>tokens.c</code> is compiled`
		2570	`to give the TDF capsule <code>tokens.j</code> containing the`
		2571	`definition of <code>hello</code>. The <b>-not_ansi</b> flag is`
		2572	`needed because <code>tokens.c</code> does not contain any real C`
		2573	`(declarations or definitions), which is not allowed in ANSI C. The`
		2574	`next step is to turn the capsule <code>tokens.j</code> into a TDF`
		2575	`library, <code>tokens.tl</code>, using the <b>-Ymakelib</b> option`
		2576	`to <code>tcc</code> (with older versions of <code>tcc</code> it may`
		2577	`be necessary to change this option to <b>-Ymakelib -M -Fj</b>).`
		2578	`This completes the API implementation.</para>`
		2579
		2580	`<para>The final step is installation. The target independent TDF,`
		2581	`<code>hello.j</code>, is linked with the TDF libraries`
		2582	`<code>tokens.tl</code> and <code>ansi.tl</code> (which is built`
		2583	`into <code>tcc</code> as default) to form a target dependent TDF`
		2584	`capsule with all the necessary token definitions, which is then`
		2585	`translated to a binary object file and linked with the system`
		2586	`libraries. All of this is under the control of`
		2587	`<code>tcc</code>.</para>`
		2588
		2589	`<para>Note the four stages of the compilation : API specification,`
		2590	`production, API implementation and installation, corresponding to`
		2591	`the four regions of the compilation diagram (Fig. 5).</para>`
		2592	`</sect3>`
		2593
		2594	`<sect3>`
		2595	`<title id="S49">3.5.3. Conditional Compilation within TDF</title>`
		2596	`<para>Although tokens are the main method used to deal with`
		2597	`target dependencies, TDF does have built-in conditional`
		2598	`compilation constructs. For most TDF sorts <code>X</code> (for`
		2599	`example, exp, shape or variety) there is a construct`
		2600	`<code>X_cond</code> which takes an exp and two <code>X</code>'s`
		2601	`and gives an <code>X</code>. The exp argument will evaluate to an`
		2602	`integer constant at install time. If this is true (nonzero), the`
		2603	`result of the construct is the first <code>X</code> argument and`
		2604	`the second is ignored; otherwise the result is the second`
		2605	`<code>X</code> argument and the first is ignored. By ignored we`
		2606	`mean completely ignored - the argument is stepped over and not`
		2607	`decoded. In particular any tokens in the definition of this`
		2608	`argument are not expanded, so it does not matter if they are`
		2609	`undefined.</para>`
		2610
		2611	`<para>These conditional compilation constructs are used by the C`
		2612	`-> TDF producer to translate certain statements`
		2613	`containing:`
		2614	`<programlisting>`
		2615	`#if condition`
		2616	`</programlisting>`
		2617	`where <code>condition</code> is a target dependent value.`
		2618	`Thus, because it is not known which branch will be taken at produce`
		2619	`time, the decision is postponed to install time. If`
		2620	`<code>condition</code> is a target independent value then the`
		2621	`branch to be taken is known at produce time, so the producer only`
		2622	`translates this branch. Thus, for example, code surrounded by`
		2623	`<code>#if 0</code> ... <code>#endif</code> will be ignored by the`
		2624	`producer.</para>`
		2625
		2626	`<para>Not all such <code>#if</code> statements can be translated`
		2627	`into TDF <code>X_cond</code> constructs. The two branches of the`
		2628	`<code>#if</code> statement are translated into the two`
		2629	`<code>X</code> arguments of the <code>X_cond</code> construct;`
		2630	`that is, into sub-trees of the TDF syntax tree. This can only be`
		2631	`done if each of the two branches is syntactically complete.</para>`
		2632
		2633	`<para>The producer interprets <code>#ifdef</code> (and`
		2634	`<code>#ifndef</code>) constructs to mean, is this macro is`
		2635	`defined (or undefined) at produce time? Given the nature of`
		2636	`pre-processing in C this is in fact the only sensible`
		2637	`interpretation. But if such constructs are being used to control`
		2638	`conditional compilation, what is actually intended is, is this`
		2639	`macro defined at install time? This distinction is necessitated`
		2640	`by the splitting of the TDF compilation into production and`
		2641	`installation - it does not exist in the traditional compilation`
		2642	`scheme. For example, in the mips example in section 2.3, whether`
		2643	`or not <code>mips</code> is defined is intended to be an`
		2644	`installer property, rather than what it is interpreted as, a`
		2645	`producer property. The choice of the conditional compilation path`
		2646	`may be put off to install time by, for example, changing`
		2647	`<code>#ifdef mips</code> to <code>#if is_mips</code> where`
		2648	`<code>is_mips</code> is a tokenised integer which is either 1 (on`
		2649	`those machines on which <code>mips</code> would be defined) or 0`
		2650	`(otherwise). In fact in view of what was said above about`
		2651	`syntactic completeness, it might be better to recast the program`
		2652	`as:`
		2653	`<programlisting>`
		2654	`#include <stdio.h>`
		2655	`#include "user_api.h" /* For the spec of is_mips */`
		2656
		2657	`int main ()`
		2658	`{`
		2659	`if ( is_mips ) {`
		2660	`fputs ( "This machine is a mips\n", stdout ) ;`
		2661	`}`
		2662	`return ( 0 ) ;`
		2663	`}`
		2664	`</programlisting>because the branches of an <code>if</code> statement, unlike`
		2665	`those of an <code>#if</code> statement, have to be syntactically`
		2666	`complete is any case. The installer will optimise out the`
		2667	`unnecessary test and any unreached code, so the use of <code>if (`
		2668	`condition )</code> is guaranteed to produce as efficient code as`
		2669	`<code>#if condition</code>.</para>`
		2670
		2671	`<para>In order to help detect such "installer macro" problems the`
		2672	`producer has a mode for detecting them. All <code>#ifdef</code>`
		2673	`and <code>#ifndef</code> constructs in which the compilation path`
		2674	`to be taken is potentially target dependent are reported (see [3]`
		2675	`and [8]).</para>`
		2676
		2677	`<para>The existence of conditional compilation within TDF also gives`
		2678	`flexibility in how to approach expressing target dependent code.`
		2679	`Instead of a "full" abstraction of the user-defined API as target`
		2680	`dependent types, values and functions, it can be abstracted as a`
		2681	`set of binary tokens (like <code>is_mips</code> in the example`
		2682	`above) which are used to control conditional compilation. This`
		2683	`latter approach can be used to quickly adapt existing programs to`
		2684	`a TDF-portable form since it is closer to the "traditional"`
		2685	`approach of scattering the program with <code>#ifdef</code>'s and`
		2686	`<code>#ifndef</code>'s to implement target dependent code.`
		2687	`However the definition of a user-defined API gives a better`
		2688	`separation of target independent and target dependent code, and`
		2689	`the effort to define such as API may often be justified. When`
		2690	`writing a new program from scratch the API rather than the`
		2691	`conditional compilation approach is recommended.</para>`
		2692
		2693	`<para>The latter approach of a fully abstracted user-defined API may`
		2694	`be more time consuming in the short run, but this may well be`
		2695	`offset by the increased ease of porting. Also there is no reason`
		2696	`why a user-defined API, once specified, should not serve more`
		2697	`than one program. Similar programs are likely to require the same`
		2698	`abstractions of target dependent constructs. Because the API is a`
		2699	`concrete object, it can be reused in this way in a very simple`
		2700	`fashion. One could envisage libraries of private APIs being built`
		2701	`up in this way.</para>`
		2702	`</sect3>`
		2703
		2704	`<sect3 id="S50">`
		2705	`<title>3.5.4. Alternative Program Versions</title>`
		2706	`<para>Consider again the program described in section`
		2707	`2.3.4 which has optional features for displaying its output`
		2708	`graphically depending on the boolean value`
		2709	`<code>HAVE_X_WINDOWS</code>. By making`
		2710	`<code>HAVE_X_WINDOWS</code> part of the user-defined API as a`
		2711	`tokenised integer and using:`
		2712	`<programlisting>`
		2713	`#if HAVE_X_WINDOWS`
		2714	`</programlisting>to conditionally compile the X Windows code, the choice of`
		2715	`whether or not to use this version of the program is postponed to`
		2716	`install time. If both POSIX and X Windows are implemented on the`
		2717	`target machine the installation is straightforward.`
		2718	`<code>HAVE_X_WINDOWS</code> is defined to be true, and the`
		2719	`installation proceeds as normal. The case where only POSIX is`
		2720	`implemented appears to present problems. The TDF representing the`
		2721	`program will contain undefined tokens representing objects from`
		2722	`both the POSIX and X Windows APIs. Surely it is necessary to define`
		2723	`these tokens (i.e. implement both APIs) in order to install the`
		2724	`TDF. But because of the use of conditional compilation, all the`
		2725	`applications of X Windows tokens will be inside <code>X_cond</code>`
		2726	`constructs on the branch corresponding to`
		2727	`<code>HAVE_X_WINDOWS</code> being true. If it is actually false`
		2728	`then these branches are stepped over and completely ignored. Thus`
		2729	`it does not matter that these tokens are undefined. Hence the`
		2730	`conditional compilation constructs within TDF give the same`
		2731	`flexibility in the API implementation is this case as do those in`
		2732	`C.</para>`
		2733	`</sect3>`
		2734	`</sect2>`
		2735	`</sect1>`
		2736
		2737	`<sect1>`
		2738	`<title>4. Conclusions</title>`
		2739	`<para>The philosophy underlying the whole TDF`
		2740	`approach to portability is that of separation or isolation. This`
		2741	`separation of the various components of the compilation system`
		2742	`means that to a large extent they can be considered`
		2743	`independently. The separation is only possible because the`
		2744	`definition of TDF has mechanisms which facilitate it - primarily`
		2745	`the token mechanism, but also the capsule linkage scheme.</para>`
		2746
		2747	`<para>The most important separation is that of the abstract`
		2748	`description of the syntactic aspects of the API, in the form of`
		2749	`the target independent headers, from the API implementation. It`
		2750	`is this which enables the separation of target independent from`
		2751	`target dependent code which is necessary for any Architecture`
		2752	`Neutral Distribution Format. It also means that programs can be`
		2753	`checked against the abstract API description, instead of against`
		2754	`a particular implementation, allowing for effective API`
		2755	`conformance testing of applications. Furthermore, it isolates the`
		2756	`actual program from the API implementation, thereby allowing the`
		2757	`programmer to work in the idealised world envisaged by the API`
		2758	`description, rather than the real world of API implementations`
		2759	`and all their faults.</para>`
		2760
		2761	`<para>This isolation also means that these API implementation`
		2762	`problems are seen to be genuinely separate from the main program`
		2763	`development. They are isolated into a single process, TDF library`
		2764	`building, which needs to be done only once per API`
		2765	`implementation. Because of the separation of the API description`
		2766	`from the implementation, this library building process also`
		2767	`serves as a conformance check for the syntactic aspects of the`
		2768	`API implementation. However the approach is evolutionary in that`
		2769	`it can handle the current situation while pointing the way`
		2770	`forward. Absolute API conformance is not necessary; the TDF`
		2771	`libraries can be used as a medium for workarounds for minor`
		2772	`implementation errors.</para>`
		2773
		2774	`<para>The same mechanism which is used to separate the API`
		2775	`description and implementation can also be used within an`
		2776	`application to separate the target dependent code from the main`
		2777	`body of target independent code. This use of user-defined APIs`
		2778	`also enables a separation of the portability requirements of the`
		2779	`program from the particular ways these requirements are`
		2780	`implemented on the various target machines. Again, the approach`
		2781	`is evolutionary, and not prescriptive. Programs can be made more`
		2782	`portable in incremental steps, with the degree of portability to`
		2783	`be used being made a conscious decision.</para>`
		2784
		2785	`<para>In a sense the most important contribution TDF has to portability is`
		2786	`in enabling the various tasks of API description, API implementation and`
		2787	`program writing to be considered independently, while showing up the`
		2788	`relationships between them. It is often said that well specified APIs`
		2789	`are the solution to the world's portability and interoperability`
		2790	`problems; but by themselves they can never be. Without methods of`
		2791	`checking the conformance of programs which use the API and of API`
		2792	`implementations, the APIs themselves will remain toothless. TDF, by`
		2793	`providing syntactic API checking for both programs and implementations,`
		2794	`is a significant first step towards solving this problem.</para>`
		2795	`</sect1>`
		2796
		2797	`<para>`
		2798	`[1] tcc User's Guide, DRA, 1993.`
		2799	`[2] tspec - An API Specification Tool, DRA, 1993.`
		2800	`[3] The C to TDF Producer, DRA, 1993.`
		2801	`[4] A Guide to the TDF Specification, DRA, 1993.`
		2802	`[5] TDF Facts and Figures, DRA, 1993.`
		2803	`[6] TDF Specification, DRA, 1993.`
		2804	`[7] The 80386/80486 TDF Installer, DRA, 1992.`
		2805	`[8] A Guide to Porting using TDF, DRA, 1993.`
		2806	`[9] The TDF Notation Compiler, DRA, 1993.`
		2807	`</para>`
		2808	`</chapter>`
		2809	`</book>`

Subversion Repositories tendra.SVN

(root)/trunk/doc/papers/porting/porting.xml – Rev 7