WebSVN – tendra.SVN – Blame – /trunk/doc/port/port3.html

Rev	Author	Line No.	Line
2	7u83	1	`<!-- Crown Copyright (c) 1998 -->`
		2	`<HTML>`
		3	`<HEAD>`
		4	`<TITLE>TDF and Portability: Portability</TITLE>`
		5	`</HEAD>`
		6	`<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#400080" ALINK="#FF0000">`
		7	`<A NAME=S2>`
		8	`<H1>TDF and Portability</H1>`
		9	`<H3>January 1998</H3>`
		10	`<A HREF="port4.html"><IMG SRC="../images/next.gif" ALT="next section"></A>`
		11	`<A HREF="port1.html"><IMG SRC="../images/prev.gif" ALT="previous section"></A>`
		12	`<A HREF="port1.html"><IMG SRC="../images/top.gif" ALT="current document"></A>`
		13	`<A HREF="../index.html"><IMG SRC="../images/home.gif" ALT="TenDRA home page">`
		14	`</A>`
		15	`<IMG SRC="../images/no_index.gif" ALT="document index"><P>`
		16	`<HR>`
		17	`<DL>`
		18	`<DT><A HREF="#S3"><B>2.1</B> - Portable Programs</A><DD>`
		19	`<DL>`
		20	`<DT><A HREF="#S4"><B>2.1.1</B> - Definitions and Preliminary Discussion</A><DD>`
		21	`<DT><A HREF="#S5"><B>2.1.2</B> - Separation and Combination of Code</A><DD>`
		22	`<DT><A HREF="#S6"><B>2.1.3</B> - Application Programming Interfaces</A><DD>`
		23	`<DT><A HREF="#S7"><B>2.1.4</B> - Compilation Phases</A><DD>`
		24	`</DL>`
		25	`<DT><A HREF="#S8"><B>2.2</B> - Portability Problems</A><DD>`
		26	`<DL>`
		27	`<DT><A HREF="#S9"><B>2.2.1</B> - Programming Problems</A><DD>`
		28	`<DT><A HREF="#S10"><B>2.2.2</B> - Code Transformation Problems</A><DD>`
		29	`<DT><A HREF="#S11"><B>2.2.3</B> - Code Combination Problems</A><DD>`
		30	`<DT><A HREF="#S12"><B>2.2.4</B> - API Problems</A><DD>`
		31	`<DL>`
		32	`<DT><A HREF="#S13"><B>2.2.4.1</B> - API Checking</A><DD>`
		33	`<DT><A HREF="#S14"><B>2.2.4.2</B> - API Implementation Errors</A><DD>`
		34	`<DT><A HREF="#S15"><B>2.2.4.3</B> - System Header Problems</A><DD>`
		35	`<DT><A HREF="#S16"><B>2.2.4.4</B> - System Library Problems</A><DD>`
		36	`</DL>`
		37	`</DL>`
		38	`<DT><A HREF="#S17"><B>2.3</B> - APIs and Portability</A><DD>`
		39	`<DL>`
		40	`<DT><A HREF="#S18"><B>2.3.1</B> - Target Dependent Code</A><DD>`
		41	`<DT><A HREF="#S19"><B>2.3.2</B> - Making APIs Explicit</A><DD>`
		42	`<DT><A HREF="#S20"><B>2.3.3</B> - Choosing an API</A><DD>`
		43	`<DT><A HREF="#S21"><B>2.3.4</B> - Alternative Program Versions</A><DD>`
		44	`</DL>`
		45	`</DL>`
		46	`<HR>`
		47
		48	`<H1>2. Portability</H1>`
		49	`We start by examining some of the problems involved in the writing`
		50	`of portable programs. Although the discussion is very general, and`
		51	`makes no mention of TDF, many of the ideas introduced are of importance`
		52	`in the second half of the paper, which deals with TDF.<P>`
		53	`<A NAME=S3>`
		54	`<HR><H2>2.1. Portable Programs</H2>`
		55	`<A NAME=S4>`
		56	`<H3>2.1.1. Definitions and Preliminary Discussion</H3>`
		57	`Let us firstly say what we mean by a portable program. A program is`
		58	`portable to a number of machines if it can be compiled to give the`
		59	`same functionality on all those machines. Note that this does not`
		60	`mean that exactly the same source code is used on all the machines.`
		61	`One could envisage a program written in, say, 68020 assembly code`
		62	`for a certain machine which has been translated into 80386 assembly`
		63	`code for some other machine to give a program with exactly equivalent`
		64	`functionality. This would, under our definition, be a program which`
		65	`is portable to these two machines. At the other end of the scale,`
		66	`the C program:<P>`
		67	`<PRE>`
		68	`#include <stdio.h>`
		69
		70	`int main ()`
		71	`{`
		72	`fputs ( "Hello world\n", stdout ) ;`
		73	`return ( 0 ) ;`
		74	`}`
		75	`</PRE>`
		76	`which prints the message, "Hello world", onto the standard`
		77	`output stream, will be portable to a vast range of machines without`
		78	`any need for rewriting. Most of the portable programs we shall be`
		79	`considering fall closer to the latter end of the spectrum - they will`
		80	`largely consist of target independent source with small sections of`
		81	`target dependent source for those constructs for which target independent`
		82	`expression is either impossible or of inadequate efficiency.<P>`
		83	`Note that we are defining portability in terms of a set of target`
		84	`machines and not as some universal property. The act of modifying`
		85	`an existing program to make it portable to a new target machine is`
		86	`called porting. Clearly in the examples above, porting the first program`
		87	`would be a highly complex task involving almost an entire rewrite,`
		88	`whereas in the second case it should be trivial.<P>`
		89	`<A NAME=S5>`
		90	`<H3>2.1.2. Separation and Combination of Code</H3>`
		91	`So why is the second example above more portable (in the sense of`
		92	`more easily ported to a new machine) than the first? The first, obvious,`
		93	`point to be made is that it is written in a high-level language, C,`
		94	`rather than the low-level languages, 68020 and 80386 assembly codes,`
		95	`used in the first example. By using a high-level language we have`
		96	`abstracted out the details of the processor to be used and expressed`
		97	`the program in an architecture neutral form. It is one of the jobs`
		98	`of the compiler on the target machine to transform this high-level`
		99	`representation into the appropriate machine dependent low-level representation.`
		100	`<P>`
		101	`The second point is that the second example program is not in itself`
		102	`complete. The objects <CODE>fputs</CODE> and <CODE>stdout</CODE>,`
		103	`representing the procedure to output a string and the standard output`
		104	`stream respectively, are left undefined. Instead the header <CODE>stdio.h</CODE>`
		105	`is included on the understanding that it contains the specification`
		106	`of these objects.<P>`
		107	`A version of this file is to be found on each target machine. On a`
		108	`particular machine it might contain something like:<P>`
		109	`<PRE>`
		110	`typedef struct {`
		111	`int __cnt ;`
		112	`unsigned char *__ptr ;`
		113	`unsigned char *__base ;`
		114	`short __flag ;`
		115	`char __file ;`
		116	`} FILE ;`
		117
		118	`extern FILE __iob [60] ;`
		119	`#define stdout ( &__iob [1] )`
		120
		121	`extern int fputs ( const char , FILE ) ;`
		122	`</PRE>`
		123	`meaning that the type <CODE>FILE</CODE> is defined by the given structure,`
		124	`<CODE>__iob</CODE> is an external array of 60 <CODE>FILE</CODE>'s,`
		125	`<CODE>stdout</CODE> is a pointer to the second element of this array,`
		126	`and that <CODE>fputs</CODE> is an external procedure which takes a`
		127	`<CODE>const char </CODE> and a <CODE>FILE </CODE> and returns an`
		128	`<CODE>int</CODE>. On a different machine, the details may be different`
		129	`(exactly what we can, or cannot, assume is the same on all target`
		130	`machines is discussed below).<P>`
		131	`These details are fed into the program by the pre-processing phase`
		132	`of the compiler. (The various compilation phases are discussed in`
		133	`more detail later - see Fig. 1.) This is a simple, preliminary textual`
		134	`substitution. It provides the definitions of the type <CODE>FILE</CODE>`
		135	`and the value <CODE>stdout</CODE> (in terms of <CODE>__iob</CODE>),`
		136	`but still leaves the precise definitions of <CODE>__iob</CODE> and`
		137	`<CODE>fputs</CODE> still unresolved (although we do know their types).`
		138	`The definitions of these values are not provided until the final phase`
		139	`of the compilation - linking - where they are linked in from the precompiled`
		140	`system libraries.<P>`
		141	`Note that, even after the pre-processing phase, our portable program`
		142	`has been transformed into a target dependent form, because of the`
		143	`substitution of the target dependent values from <CODE>stdio.h</CODE>.`
		144	`If we had also included the definitions of <CODE>__iob</CODE> and,`
		145	`more particularly, <CODE>fputs</CODE>, things would have been even`
		146	`worse - the procedure for outputting a string to the screen is likely`
		147	`to be highly target dependent.<P>`
		148	`To conclude, we have, by including <CODE>stdio.h</CODE>, been able`
		149	`to effectively separate the target independent part of our program`
		150	`(the main program) from the target dependent part (the details of`
		151	`<CODE>stdout</CODE> and <CODE>fputs</CODE>). It is one of the jobs`
		152	`of the compiler to recombine these parts to produce a complete program.<P>`
		153	`<A NAME=S6>`
		154	`<H3>2.1.3. Application Programming Interfaces</H3>`
		155	`As we have seen, the separation of the target dependent sections of`
		156	`a program into the system headers and system libraries greatly facilitates`
		157	`the construction of portable programs. What has been done is to define`
		158	`an interface between the main program and the existing operating system`
		159	`on the target machine in abstract terms. The program should then be`
		160	`portable to any machine which implements this interface correctly.<P>`
		161	`The interface for the "Hello world" program above might`
		162	`be described as follows : defined in the header <CODE>stdio.h</CODE>`
		163	`are a type <CODE>FILE</CODE> representing a file, an object <CODE>stdout</CODE>`
		164	`of type <CODE>FILE *</CODE> representing the standard output file,`
		165	`and a procedure <CODE>fputs</CODE> with prototype:<P>`
		166	`<PRE>`
		167	`int fputs ( const char s, FILE f ) ;`
		168	`</PRE>`
		169	`which prints the string <CODE>s</CODE> to the file <CODE>f</CODE>.`
		170	`This is an example of an Application Programming Interface (API).`
		171	`Note that it can be split into two aspects, the syntactic (what they`
		172	`are) and the semantic (what they mean). On any machine which implements`
		173	`this API our program is both syntactically correct and does what we`
		174	`expect it to.<P>`
		175	`The benefit of describing the API at this fairly high level is that`
		176	`it leaves scope for a range of implementation (and thus more machines`
		177	`which implement it) while still encapsulating the main program's requirements.`
		178	`<P>`
		179	`In the example implementation of <CODE>stdio.h</CODE> above we see`
		180	`that this machine implements this API correctly syntactically, but`
		181	`not necessarily semantically. One would have to read the documentation`
		182	`provided on the system to be sure of the semantics.<P>`
		183	`Another way of defining an API for this program would be to note that`
		184	`the given API is a subset of the ANSI C standard. Thus we could take`
		185	`ANSI C as an "off the shelf" API. It is then clear that`
		186	`our program should be portable to any ANSI-compliant machine.<P>`
		187	`It is worth emphasising that all programs have an API, even if it`
		188	`is implicit rather than explicit. However it is probably fair to say`
		189	`that programs without an explicit API are only portable by accident.`
		190	`We shall have more to say on this subject later.<P>`
		191	`<A NAME=S7>`
		192	`<H3>2.1.4. Compilation Phases</H3>`
		193	`The general plan for how to write the extreme example of a portable`
		194	`program, namely one which contains no target dependent code, is now`
		195	`clear. It is shown in the compilation diagram in Fig. 1 which represents`
		196	`the traditional compilation process. This diagram is divided into`
		197	`four sections. The left half of the diagram represents the actual`
		198	`program and the right half the associated API. The top half of the`
		199	`diagram represents target independent material - things which only`
		200	`need to be done once - and the bottom half target dependent material`
		201	`- things which need to be done on every target machine.<P>`
		202	`FIGURE 1. Traditional Compilation Phases`
		203	`<BR>`
		204	`<CENTER>`
		205	`<IMG SRC="../images/trad_scheme.gif">`
		206	`</CENTER>`
		207	`<BR>`
		208	`So, we write our target independent program (top left), conforming`
		209	`to the target independent API specification (top right). All the compilation`
		210	`actually takes place on the target machine. This machine must have`
		211	`the API correctly implemented (bottom right). This implementation`
		212	`will in general be in two parts - the system headers, providing type`
		213	`definitions, macros, procedure prototypes and so on, and the system`
		214	`libraries, providing the actual procedure definitions. Another way`
		215	`of characterising this division is between syntax (the system headers)`
		216	`and semantics (the system libraries).<P>`
		217	`The compilation is divided into three main phases. Firstly the system`
		218	`headers are inserted into the program by the pre-processor. This produces,`
		219	`in effect, a target dependent version of the original program. This`
		220	`is then compiled into a binary object file. During the compilation`
		221	`process the compiler inserts all the information it has about the`
		222	`machine - including the Application Binary Interface (ABI) - the sizes`
		223	`of the basic C types, how they are combined into compound types, the`
		224	`system procedure calling conventions and so on. This ensures that`
		225	`in the final linking phase the binary object file and the system libraries`
		226	`are obeying the same ABI, thereby producing a valid executable. (On`
		227	`a dynamically linked system this final linking phase takes place partially`
		228	`at run time rather than at compile time, but this does not really`
		229	`affect the general scheme.)<P>`
		230	`The compilation scheme just described consists of a series of phases`
		231	`of two types ; code combination (the pre-processing and system linking`
		232	`phases) and code transformation (the actual compilation phases). The`
		233	`existence of the combination phases allows for the effective separation`
		234	`of the target independent code (in this case, the whole program) from`
		235	`the target dependent code (in this case, the API implementation),`
		236	`thereby aiding the construction of portable programs. These ideas`
		237	`on the separation, combination and transformation of code underlie`
		238	`the TDF approach to portability.<P>`
		239	`<A NAME=S8>`
		240	`<HR><H2>2.2. Portability Problems</H2>`
		241	`We have set out a scheme whereby it should be possible to write portable`
		242	`programs with a minimum of difficulties. So why, in reality, does`
		243	`it cause so many problems? Recall that we are still primarily concerned`
		244	`with programs which contain no target dependent code, although most`
		245	`of the points raised apply by extension to all programs.<P>`
		246	`<A NAME=S9>`
		247	`<H3>2.2.1. Programming Problems</H3>`
		248	`A first, obvious class of problems concern the program itself. It`
		249	`is to be assumed that as many bugs as possible have been eliminated`
		250	`by testing and debugging on at least one platform before a program`
		251	`is considered as a candidate for being a portable program. But for`
		252	`even the most self-contained program, working on one platform is no`
		253	`guarantee of working on another. The program may use undefined behaviour`
		254	`- using uninitialised values or dereferencing null pointers, for example`
		255	`- or have built-in assumptions about the target machine - whether`
		256	`it is big-endian or little-endian, or what the sizes of the basic`
		257	`integer types are, for example. This latter point is going to become`
		258	`increasingly important over the next couple of years as 64-bit architectures`
		259	`begin to be introduced. How many existing programs implicitly assume`
		260	`a 32-bit architecture?<P>`
		261	`Many of these built-in assumptions may arise because of the conventional`
		262	`porting process. A program is written on one machine, modified slightly`
		263	`to make it work on a second machine, and so on. This means that the`
		264	`program is "biased" towards the existing set of target machines,`
		265	`and most particularly to the original machine it was written on. This`
		266	`applies not only to assumptions about endianness, say, but also to`
		267	`the questions of API conformance which we will be discussing below.<P>`
		268	`Most compilers will pick up some of the grosser programming errors,`
		269	`particularly by type checking (including procedure arguments if prototypes`
		270	`are used). Some of the subtler errors can be detected using the <B>-Wall</B>`
		271	`option to the Free Software Foundation's GNU C Compiler (<CODE>gcc</CODE>)`
		272	`or separate program checking tools such as <CODE>lint</CODE>, for`
		273	`example, but this remains a very difficult area.<P>`
		274	`<A NAME=S10>`
		275	`<H3>2.2.2. Code Transformation Problems</H3>`
		276	`We now move on from programming problems to compilation problems.`
		277	`As we mentioned above, compilation may be regarded as a series of`
		278	`phases of two types : combination and transformation. Transformation`
		279	`of code - translating a program in one form into an equivalent program`
		280	`in another form - may lead to a variety of problems. The code may`
		281	`be transformed wrongly, so that the equivalence is broken (a compiler`
		282	`bug), or in an unexpected manner (differing compiler interpretations),`
		283	`or not at all, because it is not recognised as legitimate code (a`
		284	`compiler limitation). The latter two problems are most likely when`
		285	`the input is a high level language, with complex syntax and semantics.<P>`
		286	`Note that in Fig. 1 all the actual compilation takes place on the`
		287	`target machine. So, to port the program to <I>n</I> machines, we need`
		288	`to deal with the bugs and limitations of <I>n</I>, potentially different,`
		289	`compilers. For example, if you have written your program using prototypes,`
		290	`it is going to be a large and rather tedious job porting it to a compiler`
		291	`which does not have prototypes (this particular example can be automated;`
		292	`not all such jobs can). Other compiler limitations can be surprising`
		293	`- not understanding the <CODE>L</CODE> suffix for long numeric literals`
		294	`and not allowing members of enumeration types as array indexes are`
		295	`among the problems drawn from my personal experience.<P>`
		296	`The differing compiler interpretations may be more subtle. For example,`
		297	`there are differences between ANSI and "traditional" C which`
		298	`may trap the unwary. Examples are the promotion of integral types`
		299	`and the resolution of the linkage of static objects.<P>`
		300	`Many of these problems may be reduced by using the "same"`
		301	`compiler on all the target machines. For example, <CODE>gcc</CODE>`
		302	`has a single front end (C -> RTL) which may be combined with an`
		303	`appropriate back end (RTL -> target) to form a suitable compiler`
		304	`for a wide range of target machines. The existence of a single front`
		305	`end virtually eliminates the problems of differing interpretation`
		306	`of code and compiler quirks. It also reduces the exposure to bugs.`
		307	`Instead of being exposed to the bugs in <I>n</I> separate compilers,`
		308	`we are now only exposed to bugs in one half-compiler (the front end)`
		309	`plus <I>n</I> half-compilers (the back ends) - a total of <I>( n +`
		310	`1 ) / 2</I>. (This calculation is not meant totally seriously, but`
		311	`it is true in principle.) Front end bugs, when tracked down, also`
		312	`only require a single workaround.<P>`
		313	`<A NAME=S11>`
		314	`<H3>2.2.3. Code Combination Problems</H3>`
		315	`If code transformation problems may be regarded as a time consuming`
		316	`irritation, involving the rewriting of sections of code or using a`
		317	`different compiler, the second class of problems, those concerned`
		318	`with the combination of code, are far more serious.<P>`
		319	`The first code combination phase is the pre-processor pulling in the`
		320	`system headers. These can contain some nasty surprises. For example,`
		321	`consider a simple ANSI compliant program which contains a linked list`
		322	`of strings arranged in alphabetical order. This might also contain`
		323	`a routine:<P>`
		324	`<PRE>`
		325	`void index ( char * ) ;`
		326	`</PRE>`
		327	`which adds a string to this list in the appropriate position, using`
		328	`<CODE>strcmp</CODE> from <CODE>string.h</CODE> to find it. This works`
		329	`fine on most machines, but on some it gives the error:<P>`
		330	`<PRE>`
		331	`Only 1 argument to macro 'index'`
		332	`</PRE>`
		333	`The reason for this is that the system version of <CODE>string.h</CODE>`
		334	`contains the line:<P>`
		335	`<PRE>`
		336	`#define index ( s, c ) strchr ( s, c )`
		337	`</PRE>`
		338	`But this is nothing to do with ANSI, this macro is defined for compatibility`
		339	`with BSD.<P>`
		340	`In reality the system headers on any given machine are a hodge podge`
		341	`of implementations of different APIs, and it is often virtually impossible`
		342	`to separate them (feature test macros such as <CODE>_POSIX_SOURCE</CODE>`
		343	`are of some use, but are not always implemented and do not always`
		344	`produce a complete separation; they are only provided for "standard"`
		345	`APIs anyway). The problem above arose because there is no transitivity`
		346	`rule of the form : if program <I>P</I> conforms to API <I>A</I>, and`
		347	`API <I>B</I> extends <I>A</I>, then <I>P</I> conforms to <I>B</I>.`
		348	`The only reason this is not true is these namespace problems.<P>`
		349	`A second example demonstrates a slightly different point. The POSIX`
		350	`standard states that <CODE>sys/stat.h</CODE> contains the definition`
		351	`of the structure <CODE>struct stat</CODE>, which includes several`
		352	`members, amongst them:<P>`
		353	`<PRE>`
		354	`time_t st_atime ;`
		355	`</PRE>`
		356	`representing the access time for the corresponding file. So the program:<P>`
		357	`<PRE>`
		358	`#include <sys/types.h>`
		359	`#include <sys/stat.h>`
		360
		361	`time_t st_atime ( struct stat *p )`
		362	`{`
		363	`return ( p->st_atime ) ;`
		364	`}`
		365	`</PRE>`
		366	`should be perfectly valid - the procedure name <CODE>st_atime</CODE>`
		367	`and the field selector <CODE>st_atime</CODE> occupy different namespaces`
		368	`(see however the appendix on namespaces and APIs below). However at`
		369	`least one popular operating system has the implementation:<P>`
		370	`<PRE>`
		371	`struct stat {`
		372	`....`
		373	`union {`
		374	`time_t st__sec ;`
		375	`timestruc_t st__tim ;`
		376	`} st_atim ;`
		377	`....`
		378	`} ;`
		379	`#define st_atime st_atim.st__sec`
		380	`</PRE>`
		381	`This seems like a perfectly legitimate implementation. In the program`
		382	`above the field selector <CODE>st_atime</CODE> is replaced by <CODE>st_atim.st__sec`
		383	`</CODE> by the pre-processor, as intended, but unfortunately so is`
		384	`the procedure name <CODE>st_atime</CODE>, leading to a syntax error.<P>`
		385	`The problem here is not with the program or the implementation, but`
		386	`in the way they were combined. C does not allow individual field selectors`
		387	`to be defined. Instead the indiscriminate sledgehammer of macro substitution`
		388	`was used, leading to the problem described.<P>`
		389	`Problems can also occur in the other combination phase of the traditional`
		390	`compilation scheme, the system linking. Consider the ANSI compliant`
		391	`routine:<P>`
		392	`<PRE>`
		393	`#include <stdio.h>`
		394
		395	`int open ( char *nm )`
		396	`{`
		397	`int c, n = 0 ;`
		398	`FILE *f = fopen ( nm, "r" ) ;`
		399	`if ( f == NULL ) return ( -1 ) ;`
		400	`while ( c = getc ( f ), c != EOF ) n++ ;`
		401	`( void ) fclose ( f ) ;`
		402	`return ( n ) ;`
		403	`}`
		404	`</PRE>`
		405	`which opens the file <CODE>nm</CODE>, returning its size in bytes`
		406	`if it exists and -1 otherwise. As a quick porting exercise, I compiled`
		407	`it under six different operating systems. On three it worked correctly;`
		408	`on one it returned -1 even when the file existed; and on two it crashed`
		409	`with a segmentation error.<P>`
		410	`The reason for this lies in the system linking. On those machines`
		411	`which failed the library routine <CODE>fopen</CODE> calls (either`
		412	`directly or indirectly) the library routine <CODE>open</CODE> (which`
		413	`is in POSIX, but not ANSI). The system linker, however, linked my`
		414	`routine <CODE>open</CODE> instead of the system version, so the call`
		415	`to <CODE>fopen</CODE> did not work correctly.<P>`
		416	`So code combination problems are primarily namespace problems. The`
		417	`task of combining the program with the API implementation on a given`
		418	`platform is complicated by the fact that, because the system headers`
		419	`and system libraries contain things other than the API implementation,`
		420	`or even because of the particular implementation chosen, the various`
		421	`namespaces in which the program is expected to operate become "polluted".`
		422	`<P>`
		423	`<A NAME=S12>`
		424	`<H3>2.2.4. API Problems</H3>`
		425	`We have said that the API defines the interface between the program`
		426	`and the standard library provided with the operating system on the`
		427	`target machine. There are three main problems concerned with APIs.`
		428	`The first, how to choose the API in the first place, is discussed`
		429	`separately. Here we deal with the compilation aspects : how to check`
		430	`that the program conforms to its API, and what to do about incorrect`
		431	`API implementations on the target machine(s).<P>`
		432	`<A NAME=S13>`
		433	`<H4>2.2.4.1. API Checking</H4>`
		434	`The problem of whether or not a program conforms to its API - not`
		435	`using any objects from the operating system other than those specified`
		436	`in the API, and not making any unwarranted assumptions about these`
		437	`objects - is one which does not always receive sufficient attention,`
		438	`mostly because the necessary checking tools do not exist (or at least`
		439	`are not widely available). Compiling the program on a number of API`
		440	`compliant machines merely checks the program against the system headers`
		441	`for these machines. For a genuine portability check we need to check`
		442	`against the abstract API description, thereby in effect checking against`
		443	`all possible implementations.<P>`
		444	`Recall from above that the system headers on a given machine are an`
		445	`amalgam of all the APIs it implements. This can cause programs which`
		446	`should compile not to, because of namespace clashes; but it may also`
		447	`cause programs to compile which should not, because they have used`
		448	`objects which are not in their API, but which are in the system headers.`
		449	`For example, the supposedly ANSI compliant program:<P>`
		450	`<PRE>`
		451	`#include <signal.h>`
		452	`int sig = SIGKILL ;`
		453	`</PRE>`
		454	`will compile on most systems, despite the fact that <CODE>SIGKILL</CODE>`
		455	`is not an ANSI signal, because <CODE>SIGKILL</CODE> is in POSIX, which`
		456	`is also implemented in the system <CODE>signal.h</CODE>. Again, feature`
		457	`test macros are of some use in trying to isolate the implementation`
		458	`of a single API from the rest of the system headers. However they`
		459	`are highly unlikely to detect the error in the following supposedly`
		460	`POSIX compliant program which prints the entries of the directory`
		461	`<CODE>nm</CODE>, together with their inode numbers:<P>`
		462	`<PRE>`
		463	`#include <stdio.h>`
		464	`#include <sys/types.h>`
		465	`#include <dirent.h>`
		466
		467	`void listdir ( char *nm )`
		468	`{`
		469	`struct dirent *entry ;`
		470	`DIR *dir = opendir ( nm ) ;`
		471	`if ( dir == NULL ) return ;`
		472	`while ( entry = readdir ( dir ), entry != NULL ) {`
		473	`printf ( "%s : %d\n", entry->d_name, ( int ) entry->d_ino ) ;`
		474	`}`
		475	`( void ) closedir ( dir ) ;`
		476	`return ;`
		477	`}`
		478	`</PRE>`
		479	`This is not POSIX compliant because, whereas the <CODE>d_name</CODE>`
		480	`field of <CODE>struct dirent</CODE> is in POSIX, the <CODE>d_ino</CODE>`
		481	`field is not. It is however in XPG3, so it is likely to be in many`
		482	`system implementations.<P>`
		483	`The previous examples have been concerned with simply telling whether`
		484	`or not a particular object is in an API. A more difficult, and in`
		485	`a way more important, problem is that of assuming too much about the`
		486	`objects which are in the API. For example, in the program:<P>`
		487	`<PRE>`
		488	`#include <stdio.h>`
		489	`#include <stdlib.h>`
		490
		491	`div_t d = { 3, 4 } ;`
		492
		493	`int main ()`
		494	`{`
		495	`printf ( "%d,%d\n", d.quot, d.rem ) ;`
		496	`return ( 0 ) ;`
		497	`}`
		498	`</PRE>`
		499	`the ANSI standard specifies that the type <CODE>div_t</CODE> is a`
		500	`structure containing two fields, <CODE>quot</CODE> and <CODE>rem</CODE>,`
		501	`of type <CODE>int</CODE>, but it does not specify which order these`
		502	`fields appear in, or indeed if there are other fields. Therefore the`
		503	`initialisation of <CODE>d</CODE> is not portable. Again, the type`
		504	`<CODE>time_t</CODE> is used to represent times in seconds since a`
		505	`certain fixed date. On most systems this is implemented as <CODE>long</CODE>,`
		506	`so it is tempting to use <CODE>( t & 1 )</CODE> to determine for`
		507	`a <CODE>time_t</CODE> <CODE>t</CODE> whether this number of seconds`
		508	`is odd or even. But ANSI actually says that <CODE>time_t</CODE> is`
		509	`an arithmetic, not an integer, type, so it would be possible for it`
		510	`to be implemented as <CODE>double</CODE>. But in this case <CODE>(`
		511	`t & 1 )</CODE> is not even type correct, so it is not a portable`
		512	`way of finding out whether <CODE>t</CODE> is odd or even.<P>`
		513	`<A NAME=S14>`
		514	`<H4>2.2.4.2. API Implementation Errors</H4>`
		515	`Undoubtedly the problem which causes the writer of portable programs`
		516	`the greatest headache (and heartache) is that of incorrect API implementations.`
		517	`However carefully you have chosen your API and checked that your program`
		518	`conforms to it, you are still reliant on someone (usually the system`
		519	`vendor) having implemented this API correctly on the target machine.`
		520	`Machines which do not implement the API at all do not enter the equation`
		521	`(they are not suitable target machines), what causes problems is incorrect`
		522	`implementations. As the implementation may be divided into two parts`
		523	`- system headers and system libraries - we shall similarly divide`
		524	`our discussion. Inevitably the choice of examples is personal; anyone`
		525	`who has ever attempted to port a program to a new machine is likely`
		526	`to have their own favourite examples.<P>`
		527	`<A NAME=S15>`
		528	`<H4>2.2.4.3. System Header Problems</H4>`
		529	`Some header problems are immediately apparent because they are syntactic`
		530	`and cause the program to fail to compile. For example, values may`
		531	`not be defined or be defined in the wrong place (not in the header`
		532	`prescribed by the API).<P>`
		533	`A common example (one which I have to include a workaround for in`
		534	`virtually every program I write) is that <CODE>EXIT_SUCCESS</CODE>`
		535	`and <CODE>EXIT_FAILURE</CODE> are not always defined (ANSI specifies`
		536	`that they should be in <CODE>stdlib.h</CODE>). It is tempting to change`
		537	`<CODE>exit (EXIT_FAILURE)</CODE> to <CODE>exit (1)</CODE> because`
		538	`"everyone knows" that <CODE>EXIT_FAILURE</CODE> is 1. But`
		539	`this is to decrease the portability of the program because it ties`
		540	`it to a particular class of implementations. A better workaround would`
		541	`be:<P>`
		542	`<PRE>`
		543	`#include <stdlib.h>`
		544	`#ifndef EXIT_FAILURE`
		545	`#define EXIT_FAILURE 1`
		546	`#endif`
		547	`</PRE>`
		548	`which assumes that anyone choosing a non-standard value for <CODE>EXIT_FAILURE`
		549	`</CODE> is more likely to put it in <CODE>stdlib.h</CODE>. Of course,`
		550	`if one subsequently came across a machine on which not only is <CODE>EXIT_FAILURE`
		551	`</CODE> not defined, but also the value it should have is not 1, then`
		552	`it would be necessary to resort to <CODE>#ifdef machine_name</CODE>`
		553	`statements. The same is true of all the API implementation problems`
		554	`we shall be discussing : non-conformant machines require workarounds`
		555	`involving conditional compilation. As more machines are considered,`
		556	`so these conditional compilations multiply.<P>`
		557	`As an example of things being defined in the wrong place, ANSI specifies`
		558	`that <CODE>SEEK_SET</CODE>, <CODE>SEEK_CUR</CODE> and <CODE>SEEK_END</CODE>`
		559	`should be defined in <CODE>stdio.h</CODE>, whereas POSIX specifies`
		560	`that they should also be defined in <CODE>unistd.h</CODE>. It is not`
		561	`uncommon to find machines on which they are defined in the latter`
		562	`but not in the former. A possible workaround in this case would be:<P>`
		563	`<PRE>`
		564	`#include <stdio.h>`
		565	`#ifndef SEEK_SET`
		566	`#include <unistd.h>`
		567	`#endif`
		568	`</PRE>`
		569	`Of course, by including "unnecessary" headers like <CODE>unistd.h`
		570	`</CODE> the risk of namespace clashes such as those discussed above`
		571	`is increased.<P>`
		572	`A final syntactic problem, which perhaps should belong with the system`
		573	`header problems above, concerns dependencies between the headers themselves.`
		574	`For example, the POSIX header <CODE>unistd.h</CODE> declares functions`
		575	`involving some of the types <CODE>pid_t</CODE>, <CODE>uid_t</CODE>`
		576	`etc, defined in <CODE>sys/types.h</CODE>. Is it necessary to include`
		577	`<CODE>sys/types.h</CODE> before including <CODE>unistd.h</CODE>, or`
		578	`does <CODE>unistd.h</CODE> automatically include <CODE>sys/types.h</CODE>?`
		579	`The approach of playing safe and including everything will normally`
		580	`work, but this can lead to multiple inclusions of a header. This will`
		581	`normally cause no problems because the system headers are protected`
		582	`against multiple inclusions by means of macros, but it is not unknown`
		583	`for certain headers to be left unprotected. Also not all header dependencies`
		584	`are as clear cut as the one given, so that what headers need to be`
		585	`included, and in what order, is in fact target dependent.<P>`
		586	`There can also be semantic errors in the system headers : namely wrongly`
		587	`defined values. The following two examples are taken from real operating`
		588	`systems. Firstly the definition:<P>`
		589	`<PRE>`
		590	`#define DBL_MAX 1.797693134862316E+308`
		591	`</PRE>`
		592	`in <CODE>float.h</CODE> on an IEEE-compliant machine is subtly wrong`
		593	`- the given value does not fit into a <CODE>double</CODE> - the correct`
		594	`value is:<P>`
		595	`<PRE>`
		596	`#define DBL_MAX 1.7976931348623157E+308`
		597	`</PRE>`
		598	`Again, the type definition:<P>`
		599	`<PRE>`
		600	`typedef int size_t ; /* ??? */`
		601	`</PRE>`
		602	`(sic) is not compliant with ANSI, which says that <CODE>size_t</CODE>`
		603	`is an unsigned integer type. (I'm not sure if this is better or worse`
		604	`than another system which defines <CODE>ptrdiff_t</CODE> to be <CODE>unsigned`
		605	`int</CODE> when it is meant to be signed. This would mean that the`
		606	`difference between any two pointers is always positive.) These particular`
		607	`examples are irritating because it would have cost nothing to get`
		608	`things right, correcting the value of <CODE>DBL_MAX</CODE> and changing`
		609	`the definition of <CODE>size_t</CODE> to <CODE>unsigned int</CODE>.`
		610	`These corrections are so minor that the modified system headers would`
		611	`still be a valid interface for the existing system libraries (we shall`
		612	`have more to say about this later). However it is not possible to`
		613	`change the system headers, so it is necessary to build workarounds`
		614	`into the program. Whereas in the first case it is possible to devise`
		615	`such a workaround:<P>`
		616	`<PRE>`
		617	`#include <float.h>`
		618	`#ifdef machine_name`
		619	`#undef DBL_MAX`
		620	`#define DBL_MAX 1.7976931348623157E+308`
		621	`#endif`
		622	`</PRE>`
		623	`for example, in the second, because <CODE>size_t</CODE> is defined`
		624	`by a <CODE>typedef</CODE> it is virtually impossible to correct in`
		625	`a simple fashion. Thus any program which relies on the fact that <CODE>size_t`
		626	`</CODE> is unsigned will require considerable rewriting before it`
		627	`can be ported to this machine.<P>`
		628	`<A NAME=S16>`
		629	`<H4>2.2.4.4. System Library Problems</H4>`
		630	`The system header problems just discussed are primarily syntactic`
		631	`problems. By contrast, system library problems are primarily semantic`
		632	`- the provided library routines do not behave in the way specified`
		633	`by the API. This makes them harder to detect. For example, consider`
		634	`the routine:<P>`
		635	`<PRE>`
		636	`void realloc ( void p, size_t s ) ;`
		637	`</PRE>`
		638	`which reallocates the block of memory <CODE>p</CODE> to have size`
		639	`<CODE>s</CODE> bytes, returning the new block of memory. The ANSI`
		640	`standard says that if <CODE>p</CODE> is the null pointer, then the`
		641	`effect of <CODE>realloc ( p, s )</CODE> is the same as <CODE>malloc`
		642	`( s )</CODE>, that is, to allocate a new block of memory of size <CODE>s</CODE>.`
		643	`This behaviour is exploited in the following program, in which the`
		644	`routine <CODE>add_char</CODE> adds a character to the expanding array,`
		645	`<CODE>buffer</CODE>:<P>`
		646	`<PRE>`
		647	`#include <stdio.h>`
		648	`#include <stdlib.h>`
		649
		650	`char *buffer = NULL ;`
		651	`int buff_sz = 0, buff_posn = 0 ;`
		652
		653	`void add_char ( char c )`
		654	`{`
		655	`if ( buff_posn >= buff_sz ) {`
		656	`buff_sz += 100 ;`
		657	`buffer = ( char * ) realloc ( ( void * ) buffer, buff_sz * sizeof ( char ) ) ;`
		658	`if ( buffer == NULL ) {`
		659	`fprintf ( stderr, "Memory allocation error\n" ) ;`
		660	`exit ( EXIT_FAILURE ) ;`
		661	`}`
		662	`}`
		663	`buffer [ buff_posn++ ] = c ;`
		664	`return ;`
		665	`}`
		666	`</PRE>`
		667	`On the first call of <CODE>add_char</CODE>, <CODE>buffer</CODE> is`
		668	`set to a real block of memory (as opposed to <CODE>NULL</CODE>) by`
		669	`a call of the form <CODE>realloc ( NULL, s )</CODE>. This is extremely`
		670	`convenient and efficient - if it was not for this behaviour we would`
		671	`have to have an explicit initialisation of <CODE>buffer</CODE>, either`
		672	`as a special case in <CODE>add_char</CODE> or in a separate initialisation`
		673	`routine.<P>`
		674	`Of course this all depends on the behaviour of <CODE>realloc ( NULL,`
		675	`s )</CODE> having been implemented precisely as described in the ANSI`
		676	`standard. The first indication that this is not so on a particular`
		677	`target machine might be when the program is compiled and run on that`
		678	`machine for the first time and does not perform as expected. To track`
		679	`the problem down will demand time debugging the program.<P>`
		680	`Once the problem has been identified as being with <CODE>realloc</CODE>`
		681	`a number of possible workarounds are possible. Perhaps the most interesting`
		682	`is to replace the inclusion of <CODE>stdlib.h</CODE> by the following:<P>`
		683	`<PRE>`
		684	`#include <stdlib.h>`
		685	`#ifdef machine_name`
		686	`#define realloc ( p, s )\`
		687	`( ( p ) ? ( realloc ) ( p, s ) : malloc ( s ) )`
		688	`#endif`
		689	`</PRE>`
		690	`where <CODE>realloc ( p, s )</CODE> is redefined as a macro which`
		691	`is the result of the procedure <CODE>realloc</CODE> if <CODE>p</CODE>`
		692	`is not null, and <CODE>malloc ( s )</CODE> otherwise. (In fact this`
		693	`macro will not always have the desired effect, although it does in`
		694	`this case. Why (exercise)?)<P>`
		695	`The only alternative to this trial and error approach to finding API`
		696	`implementation problems is the application of personal experience,`
		697	`either of the particular target machine or of things that are implemented`
		698	`wrongly by many machines and as such should be avoided. This sort`
		699	`of detailed knowledge is not easily acquired. Nor can it ever be complete:`
		700	`new operating system releases are becoming increasingly regular and`
		701	`are on occasions quite as likely to introduce new implementation errors`
		702	`as to solve existing ones. It is in short a "black art".<P>`
		703	`<A NAME=S17>`
		704	`<HR><H2>2.3. APIs and Portability</H2>`
		705	`We now return to our discussion of the general issues involved in`
		706	`portability to more closely examine the role of the API.<P>`
		707	`<A NAME=S18>`
		708	`<H3>2.3.1. Target Dependent Code</H3>`
		709	`So far we have been considering programs which contain no conditional`
		710	`compilation, in which the API forms the basis of the separation of`
		711	`the target independent code (the whole program) and the target dependent`
		712	`code (the API implementation). But a glance at most large C programs`
		713	`will reveal that they do contain conditional compilation. The code`
		714	`is scattered with <CODE>#if</CODE>'s and <CODE>#ifdef</CODE>'s which,`
		715	`in effect, cause the pre-processor to construct slightly different`
		716	`programs on different target machines. So here we do not have a clean`
		717	`division between the target independent and the target dependent code`
		718	`- there are small sections of target dependent code spread throughout`
		719	`the program.<P>`
		720	`Let us briefly consider some of the reasons why it is necessary to`
		721	`introduce this conditional compilation. Some have already been mentioned`
		722	`- workarounds for compiler bugs, compiler limitations, and API implementation`
		723	`errors; others will be considered later. However the most interesting`
		724	`and important cases concern things which need to be done genuinely`
		725	`differently on different machines. This can be because they really`
		726	`cannot be expressed in a target independent manner, or because the`
		727	`target independent way of doing them is unacceptably inefficient.<P>`
		728	`Efficiency (either in terms of time or space) is a key issue in many`
		729	`programs. The argument is often advanced that writing a program portably`
		730	`means using the, often inefficient, lowest common denominator approach.`
		731	`But under our definition of portability it is the functionality that`
		732	`matters, not the actual source code. There is nothing to stop different`
		733	`code being used on different machines for reasons of efficiency.<P>`
		734	`To examine the relationship between target dependent code and APIs,`
		735	`consider the simple program:<P>`
		736	`<PRE>`
		737	`#include <stdio.h>`
		738
		739	`int main ()`
		740	`{`
		741	`#ifdef mips`
		742	`fputs ( "This machine is a mips\n", stdout ) ;`
		743	`#endif`
		744	`return ( 0 ) ;`
		745	`}`
		746	`</PRE>`
		747	`which prints a message if the target machine is a mips. What is the`
		748	`API of this program? Basically it is the same as in the "Hello`
		749	`world" example discussed in sections 2.1.1</A> and 2.1.2</A>,`
		750	`but if we wish the API to fully describe the interface between the`
		751	`program and the target machine, we must also say that whether or not`
		752	`the macro <CODE>mips</CODE> is defined is part of the API. Like the`
		753	`rest of the API, this has a semantic aspect as well as a syntactic`
		754	`- in this case that <CODE>mips</CODE> is only defined on mips machines.`
		755	`Where it differs is in its implementation. Whereas the main part of`
		756	`the API is implemented in the system headers and the system libraries,`
		757	`the implementation of either defining, or not defining, <CODE>mips</CODE>`
		758	`ultimately rests with the person performing the compilation. (In this`
		759	`particular example, the macro <CODE>mips</CODE> is normally built`
		760	`into the compiler on mips machines, but this is only a convention.)<P>`
		761	`So the API in this case has two components : a system-defined part`
		762	`which is implemented in the system headers and system libraries, and`
		763	`a user-defined part which ultimately relies on the person performing`
		764	`the compilation to provide an implementation. The main point to be`
		765	`made in this section is that introducing target dependent code is`
		766	`equivalent to introducing a user-defined component to the API. The`
		767	`actual compilation process in the case of programs containing target`
		768	`dependent code is basically the same as that shown in Fig. 1. But`
		769	`whereas previously the vertical division of the diagram also reflects`
		770	`a division of responsibility - the left hand side is the responsibility`
		771	`of the programmer (the person writing the program), and the right`
		772	`hand side of the API specifier (for example, a standards defining`
		773	`body) and the API implementor (the system vendor) - now the right`
		774	`hand side is partially the responsibility of the programmer and the`
		775	`person performing the compilation. The programmer specifies the user-defined`
		776	`component of the API, and the person compiling the program either`
		777	`implements this API (as in the mips example above) or chooses between`
		778	`a number of alternative implementations provided by the programmer`
		779	`(as in the example below).<P>`
		780	`Let us consider a more complex example. Consider the following program`
		781	`which assumes, for simplicity, that an <CODE>unsigned int</CODE> contains`
		782	`32 bits:<P>`
		783	`<PRE>`
		784	`#include <stdio.h>`
		785	`#include "config.h"`
		786
		787	`#ifndef SLOW_SHIFT`
		788	`#define MSB ( a ) ( ( unsigned char ) ( a >> 24 ) )`
		789	`#else`
		790	`#ifdef BIG_ENDIAN`
		791	`#define MSB ( a ) ( ( unsigned char ) &( a ) )`
		792	`#else`
		793	`#define MSB ( a ) ( ( unsigned char ) &( a ) + 3 )`
		794	`#endif`
		795	`#endif`
		796
		797	`unsigned int x = 100000000 ;`
		798
		799	`int main ()`
		800	`{`
		801	`printf ( "%u\n", MSB ( x ) ) ;`
		802	`return ( 0 ) ;`
		803	`}`
		804	`</PRE>`
		805	`The intention is to print the most significant byte of <CODE>x</CODE>.`
		806	`Three alternative definitions of the macro <CODE>MSB</CODE> used to`
		807	`extract this value are provided. The first, if <CODE>SLOW_SHIFT</CODE>`
		808	`is not defined, is simply to shift the value right by 24 bits. This`
		809	`will work on all 32-bit machines, but may be inefficient (depending`
		810	`on the nature of the machine's shift instruction). So two alternatives`
		811	`are provided. An <CODE>unsigned int</CODE> is assumed to consist of`
		812	`four <CODE>unsigned char</CODE>'s. On a big-endian machine, the most`
		813	`significant byte is the first of these <CODE>unsigned char</CODE>'s;`
		814	`on a little-endian machine it is the fourth. The second definition`
		815	`of <CODE>MSB</CODE> is intended to reflect the former case, and the`
		816	`third the latter.<P>`
		817	`The person compiling the program has to choose between the three possible`
		818	`implementations of <CODE>MSB</CODE> provided by the programmer. This`
		819	`is done by either defining, or not defining, the macros <CODE>SLOW_SHIFT</CODE>`
		820	`and <CODE>BIG_ENDIAN</CODE>. This could be done as command line options,`
		821	`but we have chosen to reflect another commonly used device, the configuration`
		822	`file. For each target machine, the programmer provides a version of`
		823	`the file <CODE>config.h</CODE> which defines the appropriate combination`
		824	`of the macros <CODE>SLOW_SHIFT</CODE> and <CODE>BIG_ENDIAN</CODE>.`
		825	`The person performing the compilation simply chooses the appropriate`
		826	`<CODE>config.h</CODE> for the target machine.<P>`
		827	`There are two possible ways of looking at what the user-defined API`
		828	`of this program is. Possibly it is most natural to say that it is`
		829	`<CODE>MSB</CODE>, but it could also be argued that it is the macros`
		830	`<CODE>SLOW_SHIFT</CODE> and <CODE>BIG_ENDIAN</CODE>. The former more`
		831	`accurately describes the target dependent code, but is only implemented`
		832	`indirectly, via the latter.<P>`
		833	`<A NAME=S19>`
		834	`<H3>2.3.2. Making APIs Explicit</H3>`
		835	`As we have said, every program has an API even if it is implicit rather`
		836	`than explicit. Every system header included, every type or value used`
		837	`from it, and every library routine used, adds to the system-defined`
		838	`component of the API, and every conditional compilation adds to the`
		839	`user-defined component. What making the API explicit does is to encapsulate`
		840	`the set of requirements that the program has of the target machine`
		841	`(including requirements like, I need to know whether or not the target`
		842	`machine is big-endian, as well as, I need <CODE>fputs</CODE> to be`
		843	`implemented as in the ANSI standard). By making these requirements`
		844	`explicit it is made absolutely clear what is needed on a target machine`
		845	`if a program is to be ported to it. If the requirements are not explicit`
		846	`this can only be found by trial and error. This is what we meant earlier`
		847	`by saying that a program without an explicit API is only portable`
		848	`by accident.<P>`
		849	`Another advantage of specifying the requirements of a program is that`
		850	`it may increase their chances of being implemented. We have spoken`
		851	`as if porting is a one-way process; program writers porting their`
		852	`programs to new machines. But there is also traffic the other way.`
		853	`Machine vendors may wish certain programs to be ported to their machines.`
		854	`If these programs come with a list of requirements then the vendor`
		855	`knows precisely what to implement in order to make such a port possible.<P>`
		856	`<A NAME=S20>`
		857	`<H3>2.3.3. Choosing an API</H3>`
		858	`So how does one go about choosing an API? In a sense the user-defined`
		859	`component is easier to specify than the system-defined component because`
		860	`it is less tied to particular implementation models. What is required`
		861	`is to abstract out what exactly needs to be done in a target dependent`
		862	`manner and to decide how best to separate it out. The most difficult`
		863	`problem is how to make the implementation of this API as simple as`
		864	`possible for the person performing the compilation, if necessary providing`
		865	`a number of alternative implementations to choose between and a simple`
		866	`method of making this choice (for example, the <CODE>config.h</CODE>`
		867	`file above). With the system-defined component the question is more`
		868	`likely to be, how do the various target machines I have in mind implement`
		869	`what I want to do? The abstraction of this is usually to choose a`
		870	`standard and widely implemented API, such as POSIX, which provides`
		871	`all the necessary functionality.<P>`
		872	`The choice of "standard" API is of course influenced by`
		873	`the type of target machines one has in mind. Within the Unix world,`
		874	`the increasing adoption of Open Standards, such as POSIX, means that`
		875	`choosing a standard API which is implemented on a wide variety Unix`
		876	`boxes is becoming easier. Similarly, choosing an API which will work`
		877	`on most MSDOS machines should cause few problems. The difficulty is`
		878	`that these are disjoint worlds; it is very difficult to find a standard`
		879	`API which is implemented on both Unix and MSDOS machines. At present`
		880	`not much can be done about this, it reflects the disjoint nature of`
		881	`the computer market.<P>`
		882	`To develop a similar point : the drawback of choosing POSIX (for example)`
		883	`as an API is that it restricts the range of possible target machines`
		884	`to machines which implement POSIX. Other machines, for example, BSD`
		885	`compliant machines, might offer the same functionality (albeit using`
		886	`different methods), so they should be potential target machines, but`
		887	`they have been excluded by the choice of API. One approach to the`
		888	`problem is the "alternative API" approach. Both the POSIX`
		889	`and the BSD variants are built into the program, but only one is selected`
		890	`on any given target machine by means of conditional compilation. Under`
		891	`our "equivalent functionality" definition of portability,`
		892	`this is a program which is portable to both POSIX and BSD compliant`
		893	`machines. But viewed in the light of the discussion above, if we regard`
		894	`a program as a program-API pair, it could be regarded as two separate`
		895	`programs combined on a single source code tree. A more interesting`
		896	`approach would be to try to abstract out what exactly the functionality`
		897	`which both POSIX and BSD offer is and use that as the API. Then instead`
		898	`of two separate APIs we would have a single API with two broad classes`
		899	`of implementations. The advantage of this latter approach becomes`
		900	`clear if wished to port the program to a machine which implements`
		901	`neither POSIX nor BSD, but provides the equivalent functionality in`
		902	`a third way.<P>`
		903	`As a simple example, both POSIX and BSD provide very similar methods`
		904	`for scanning the entries of a directory. The main difference is that`
		905	`the POSIX version is defined in <CODE>dirent.h</CODE> and uses a structure`
		906	`called <CODE>struct dirent</CODE>, whereas the BSD version is defined`
		907	`in <CODE>sys/dir.h</CODE> and calls the corresponding structure <CODE>struct`
		908	`direct</CODE>. The actual routines for manipulating directories are`
		909	`the same in both cases. So the only abstraction required to unify`
		910	`these two APIs is to introduce an abstract type, <CODE>dir_entry</CODE>`
		911	`say, which can be defined by:<P>`
		912	`<PRE>`
		913	`typedef struct dirent dir_entry ;`
		914	`</PRE>`
		915	`on POSIX machines, and:<P>`
		916	`<PRE>`
		917	`typedef struct direct dir_entry ;`
		918	`</PRE>`
		919	`on BSD machines. Note how this portion of the API crosses the system-user`
		920	`boundary. The object <CODE>dir_entry</CODE> is defined in terms of`
		921	`the objects in the system headers, but the precise definition depends`
		922	`on a user-defined value (whether the target machine implements POSIX`
		923	`or BSD).<P>`
		924	`<A NAME=S21>`
		925	`<H3>2.3.4. Alternative Program Versions</H3>`
		926	`Another reason for introducing conditional compilation which relates`
		927	`to APIs is the desire to combine several programs, or versions of`
		928	`programs, on a single source tree. There are several cases to be distinguished`
		929	`between. The reuse of code between genuinely different programs does`
		930	`not really enter the argument : any given program will only use one`
		931	`route through the source tree, so there is no real conditional compilation`
		932	`per se in the program. What is more interesting is the use of conditional`
		933	`compilation to combine several versions of the same program on the`
		934	`same source tree to provide additional or alternative features.<P>`
		935	`It could be argued that the macros (or whatever) used to select between`
		936	`the various versions of the program are just part of the user-defined`
		937	`API as before. But consider a simple program which reads in some numerical`
		938	`input, say, processes it, and prints the results. This might, for`
		939	`example, have POSIX as its API. We may wish to optionally enhance`
		940	`this by displaying the results graphically rather than textually on`
		941	`machines which have X Windows, the compilation being conditional on`
		942	`some boolean value, <CODE>HAVE_X_WINDOWS</CODE>, say. What is the`
		943	`API of the resultant program? The answer from the point of view of`
		944	`the program is the union of POSIX, X Windows and the user-defined`
		945	`value <CODE>HAVE_X_WINDOWS</CODE>. But from the implementation point`
		946	`of view we can either implement POSIX and set <CODE>HAVE_X_WINDOWS</CODE>`
		947	`to false, or implement both POSIX and X Windows and set <CODE>HAVE_X_WINDOWS`
		948	`</CODE> to true. So what introducing <CODE>HAVE_X_WINDOWS</CODE> does`
		949	`is to allow flexibility in the API implementation.<P>`
		950	`This is very similar to the alternative APIs discussed above. However`
		951	`the approach outlined will really only work for optional API extensions.`
		952	`To work in the alternative API case, we would need to have the union`
		953	`of POSIX, BSD and a boolean value, say, as the API. Although this`
		954	`is possible in theory, it is likely to lead to namespace clashes between`
		955	`POSIX and BSD.<P>`
		956	`<HR><H2>Appendix: Namespaces and APIs</H2>`
		957	`Namespace problems are amongst the most difficult faced by standard`
		958	`defining bodies (for example, the ANSI and POSIX committees) and they`
		959	`often go to great lengths to specify which names should, and should`
		960	`not, appear when certain headers are included. (The position is set`
		961	`out in D. F. Prosser, <I>Header and name space rules for UNIX systems</I>`
		962	`(private communication), USL, 1993.)<P>`
		963	`For example, the intention, certainly in ANSI, is that each header`
		964	`should operate as an independent sub-API. Thus <CODE>va_list</CODE>`
		965	`is prohibited from appearing in the namespace when <CODE>stdio.h</CODE>`
		966	`is included (it is defined only in <CODE>stdarg.h</CODE>) despite`
		967	`the fact that it appears in the prototype:<P>`
		968	`<PRE>`
		969	`int vprintf ( char *, va_list ) ;`
		970	`</PRE>`
		971	`This seeming contradiction is worked round on most implementations`
		972	`by defining a type <CODE>__va_list</CODE> in <CODE>stdio.h</CODE>`
		973	`which has exactly the same definition as <CODE>va_list</CODE>, and`
		974	`declaring <CODE>vprintf</CODE> as:<P>`
		975	`<PRE>`
		976	`int vprintf ( char *, __va_list ) ;`
		977	`</PRE>`
		978	`This is only legal because <CODE>__va_list</CODE> is deemed not to`
		979	`corrupt the namespace because of the convention that names beginning`
		980	`with <CODE>__</CODE> are reserved for implementation use.<P>`
		981	`This particular namespace convention is well-known, but there are`
		982	`others defined in these standards which are not generally known (and`
		983	`since no compiler I know tests them, not widely adhered to). For example,`
		984	`the ANSI header <CODE>errno.h</CODE> reserves all names given by the`
		985	`regular expression:<P>`
		986	`<PRE>`
		987	`E[0-9A-Z][0-9a-z_A-Z]+`
		988	`</PRE>`
		989	`against macros (i.e. in all namespaces). By prohibiting the user from`
		990	`using names of this form, the intention is to protect against namespace`
		991	`clashes with extensions of the ANSI API which introduce new error`
		992	`numbers. It also protects against a particular implementation of these`
		993	`extensions - namely that new error numbers will be defined as macros.<P>`
		994	`A better example of protecting against particular implementations`
		995	`comes from POSIX. If <CODE>sys/stat.h</CODE> is included names of`
		996	`the form:<P>`
		997	`<PRE>`
		998	`st_[0-9a-z_A-Z]+`
		999	`</PRE>`
		1000	`are reserved against macros (as member names). The intention here`
		1001	`is not only to reserve field selector names for future extensions`
		1002	`to <CODE>struct stat</CODE> (which would only affect API implementors,`
		1003	`not ordinary users), but also to reserve against the possibility that`
		1004	`these field selectors might be implemented by macros. So our <CODE>st_atime`
		1005	`</CODE> example in section 2.2.3</A> is strictly illegal because the`
		1006	`procedure name <CODE>st_atime</CODE> lies in a restricted namespace.`
		1007	`Indeed the namespace is restricted precisely to disallow this program.<P>`
		1008	`As an exercise to the reader, how many of your programs use names`
		1009	`from the following restricted namespaces (all drawn from ANSI, all`
		1010	`applying to all namespaces)?<P>`
		1011	`<PRE>`
		1012	`is[a-z][0-9a-z_A-Z]+ (ctype.h)`
		1013	`to[a-z][0-9a-z_A-Z]+ (ctype.h)`
		1014	`str[a-z][0-9a-z_A-Z]+ (stdlib.h)`
		1015	`</PRE>`
		1016	`With the TDF approach of describing APIs in abstract terms using the`
		1017	`<CODE>#pragma token</CODE> syntax most of these namespace restrictions`
		1018	`are seen to be superfluous. When a target independent header is included`
		1019	`precisely the objects defined in that header in that version of the`
		1020	`API appear in the namespace. There are no worries about what else`
		1021	`might happen to be in the header, because there is nothing else. Also`
		1022	`implementation details are separated off to the TDF library building,`
		1023	`so possible namespace pollution through particular implementations`
		1024	`does not arise.<P>`
		1025	`Currently TDF does not have a neat way of solving the <CODE>va_list</CODE>`
		1026	`problem. The present target independent headers use a similar workaround`
		1027	`to that described above (exploiting a reserved namespace). (See the`
		1028	`footnote in section 3.4.1.1.)<P>`
		1029	`None of this is intended as criticism of the ANSI or POSIX standards.`
		1030	`It merely shows some of the problems that can arise from the insufficient`
		1031	`separation of code.<P>`
		1032	`<HR>`
		1033	`<P><I>Part of the <A HREF="../index.html">TenDRA Web</A>.<BR>Crown`
		1034	`Copyright © 1998.</I></P>`
		1035	`</BODY>`
		1036	`</HTML>`

Subversion Repositories tendra.SVN

(root)/trunk/doc/port/port3.html @ 2 – Rev