6 |
7u83 |
1 |
<?xml version="1.0" standalone="no"?>
|
|
|
2 |
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
|
|
|
3 |
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
|
|
|
4 |
|
|
|
5 |
<!--
|
|
|
6 |
$Id$
|
|
|
7 |
-->
|
|
|
8 |
|
|
|
9 |
<book>
|
|
|
10 |
<bookinfo>
|
|
|
11 |
<title>TDF and Portability</title>
|
|
|
12 |
|
|
|
13 |
<corpauthor>The TenDRA Project</corpauthor>
|
|
|
14 |
|
|
|
15 |
<author>
|
|
|
16 |
<firstname>Jeroen</firstname>
|
|
|
17 |
<surname>Ruigrok van der Werven</surname>
|
|
|
18 |
</author>
|
|
|
19 |
<authorinitials>JRvdW</authorinitials>
|
|
|
20 |
<pubdate>2005</pubdate>
|
|
|
21 |
|
|
|
22 |
<copyright>
|
|
|
23 |
<year>2004</year>
|
|
|
24 |
<year>2005</year>
|
|
|
25 |
|
|
|
26 |
<holder>The TenDRA Project</holder>
|
|
|
27 |
</copyright>
|
|
|
28 |
|
|
|
29 |
<copyright>
|
|
|
30 |
<year>1998</year>
|
|
|
31 |
|
|
|
32 |
<holder>DERA</holder>
|
|
|
33 |
</copyright>
|
|
|
34 |
</bookinfo>
|
|
|
35 |
|
|
|
36 |
<chapter id="introduction">
|
|
|
37 |
<title>Introduction</title>
|
|
|
38 |
|
|
|
39 |
<para>TDF is the name of the technology developed at DRA which has been
|
|
|
40 |
adopted by the Open Software Foundation (OSF), Unix System Laboratories
|
|
|
41 |
(USL), the European Community's Esprit Programme and others as their
|
|
|
42 |
Architecture Neutral Distribution Format (ANDF). To date much of the
|
|
|
43 |
discussion surrounding it has centred on the question, "How do you
|
|
|
44 |
distribute portable software?". This paper concentrates on the more
|
|
|
45 |
difficult question, "How do you write portable software in the first
|
|
|
46 |
place?" and shows how TDF can be a valuable tool to aid the writing of
|
|
|
47 |
portable software. Most of the discussion centres on programs written in
|
|
|
48 |
C and is Unix specific. This is because most of the experience of TDF to
|
|
|
49 |
date has been in connection with C in a Unix environment, and not
|
|
|
50 |
because of any inbuilt bias in TDF.</para>
|
|
|
51 |
|
|
|
52 |
<para>It is assumed that the reader is familiar with the ANDF concept
|
|
|
53 |
(although not necessarily with the details of TDF), and with the
|
|
|
54 |
problems involved in writing portable C code.</para>
|
|
|
55 |
|
|
|
56 |
<para>The discussion is divided into two sections. Firstly some of the
|
|
|
57 |
problems involved in writing portable programs are considered. The
|
|
|
58 |
intention is not only to catalogue what these problems are, but to
|
|
|
59 |
introduce ways of looking at them which will be important in the second
|
|
|
60 |
section. This deals with the TDF approach to portability.</para>
|
|
|
61 |
</chapter>
|
|
|
62 |
|
|
|
63 |
<chapter>
|
|
|
64 |
<sect1 id="portability">
|
|
|
65 |
<title>Portability</title>
|
|
|
66 |
|
|
|
67 |
<para>We start by examining some of the problems
|
|
|
68 |
involved in the writing of portable programs. Although the
|
|
|
69 |
discussion is very general, and makes no mention of TDF, many of
|
|
|
70 |
the ideas introduced are of importance in the second half of the
|
|
|
71 |
paper, which deals with TDF.</para>
|
|
|
72 |
|
|
|
73 |
<sect2 id="S3">
|
|
|
74 |
<title>2.1. Portable Programs</title>
|
|
|
75 |
|
|
|
76 |
<sect3 id="S4">
|
|
|
77 |
<title>2.1.1. Definitions and Preliminary Discussion</title>
|
|
|
78 |
|
|
|
79 |
<para>Let us firstly say what we mean by a portable program. A
|
|
|
80 |
program is portable to a number of machines if it can be compiled
|
|
|
81 |
to give the same functionality on all those machines. Note that
|
|
|
82 |
this does not mean that exactly the same source code is used on
|
|
|
83 |
all the machines. One could envisage a program written in, say,
|
|
|
84 |
68020 assembly code for a certain machine which has been
|
|
|
85 |
translated into 80386 assembly code for some other machine to give
|
|
|
86 |
a program with exactly equivalent functionality. This would, under
|
|
|
87 |
our definition, be a program which is portable to these two
|
|
|
88 |
machines. At the other end of the scale, the C program:
|
|
|
89 |
|
|
|
90 |
<programlisting>
|
|
|
91 |
#include <stdio.h>
|
|
|
92 |
|
|
|
93 |
int
|
|
|
94 |
main()
|
|
|
95 |
{
|
|
|
96 |
fputs("Hello world\n", stdout);
|
|
|
97 |
return(0);
|
|
|
98 |
}
|
|
|
99 |
</programlisting>
|
|
|
100 |
|
|
|
101 |
which prints the message, "Hello world", onto the standard output
|
|
|
102 |
stream, will be portable to a vast range of machines without any
|
|
|
103 |
need for rewriting. Most of the portable programs we shall be
|
|
|
104 |
considering fall closer to the latter end of the spectrum - they
|
|
|
105 |
will largely consist of target independent source with small
|
|
|
106 |
sections of target dependent source for those constructs for which
|
|
|
107 |
target independent expression is either impossible or of
|
|
|
108 |
inadequate efficiency.</para>
|
|
|
109 |
|
|
|
110 |
<para>Note that we are defining portability in terms of a set of
|
|
|
111 |
target machines and not as some universal property. The act of
|
|
|
112 |
modifying an existing program to make it portable to a new target
|
|
|
113 |
machine is called porting. Clearly in the examples above, porting
|
|
|
114 |
the first program would be a highly complex task involving almost
|
|
|
115 |
an entire rewrite, whereas in the second case it should be
|
|
|
116 |
trivial.</para>
|
|
|
117 |
</sect3>
|
|
|
118 |
|
|
|
119 |
<sect3 id="S5">
|
|
|
120 |
<title>2.1.2. Separation and Combination of Code</title>
|
|
|
121 |
|
|
|
122 |
<para>So why is the second example above more portable (in the sense
|
|
|
123 |
of more easily ported to a new machine) than the first? The
|
|
|
124 |
first, obvious, point to be made is that it is written in a
|
|
|
125 |
high-level language, C, rather than the low-level languages, 68020
|
|
|
126 |
and 80386 assembly codes, used in the first example. By using a
|
|
|
127 |
high-level language we have abstracted out the details of the
|
|
|
128 |
processor to be used and expressed the program in an architecture
|
|
|
129 |
neutral form. It is one of the jobs of the compiler on the target
|
|
|
130 |
machine to transform this high-level representation into the
|
|
|
131 |
appropriate machine dependent low-level representation.</para>
|
|
|
132 |
|
|
|
133 |
<para>The second point is that the second example program is not in
|
|
|
134 |
itself complete. The objects <code>fputs</code> and
|
|
|
135 |
<code>stdout</code>, representing the procedure to output a string
|
|
|
136 |
and the standard output stream respectively, are left undefined.
|
|
|
137 |
Instead the header <code>stdio.h</code> is included on the
|
|
|
138 |
understanding that it contains the specification of these
|
|
|
139 |
objects.</para>
|
|
|
140 |
|
|
|
141 |
<para>A version of this file is to be found on each target machine.
|
|
|
142 |
On a particular machine it might contain something like:
|
|
|
143 |
|
|
|
144 |
<programlisting>
|
|
|
145 |
typedef struct {
|
|
|
146 |
int __cnt ;
|
|
|
147 |
unsigned char *__ptr ;
|
|
|
148 |
unsigned char *__base ;
|
|
|
149 |
short __flag ;
|
|
|
150 |
char __file ;
|
|
|
151 |
} FILE ;
|
|
|
152 |
|
|
|
153 |
extern FILE __iob[60];
|
|
|
154 |
#define stdout (&__iob[1])
|
|
|
155 |
|
|
|
156 |
extern int fputs(const char *, FILE *);
|
|
|
157 |
|
|
|
158 |
</programlisting>
|
|
|
159 |
|
|
|
160 |
meaning that the type <code>FILE</code> is defined by the given
|
|
|
161 |
structure, <code>__iob</code> is an external array of 60
|
|
|
162 |
<code>FILE</code>'s, <code>stdout</code> is a pointer to the
|
|
|
163 |
second element of this array, and that <code>fputs</code> is an
|
|
|
164 |
external procedure which takes a <code>const char *</code> and a
|
|
|
165 |
<code>FILE *</code> and returns an <code>int</code>. On a
|
|
|
166 |
different machine, the details may be different (exactly what we
|
|
|
167 |
can, or cannot, assume is the same on all target machines is
|
|
|
168 |
discussed below).</para>
|
|
|
169 |
|
|
|
170 |
<para>These details are fed into the program by the pre-processing
|
|
|
171 |
phase of the compiler. (The various compilation phases are
|
|
|
172 |
discussed in more detail later - see Fig. 1.) This is a simple,
|
|
|
173 |
preliminary textual substitution. It provides the definitions of
|
|
|
174 |
the type <code>FILE</code> and the value <code>stdout</code> (in
|
|
|
175 |
terms of <code>__iob</code>), but still leaves the precise
|
|
|
176 |
definitions of <code>__iob</code> and <code>fputs</code> still
|
|
|
177 |
unresolved (although we do know their types). The definitions of
|
|
|
178 |
these values are not provided until the final phase of the
|
|
|
179 |
compilation - linking - where they are linked in from the
|
|
|
180 |
precompiled system libraries.</para>
|
|
|
181 |
|
|
|
182 |
<para>Note that, even after the pre-processing phase, our portable
|
|
|
183 |
program has been transformed into a target dependent form, because
|
|
|
184 |
of the substitution of the target dependent values from
|
|
|
185 |
<code>stdio.h</code>. If we had also included the definitions of
|
|
|
186 |
<code>__iob</code> and, more particularly, <code>fputs</code>,
|
|
|
187 |
things would have been even worse - the procedure for outputting a
|
|
|
188 |
string to the screen is likely to be highly target
|
|
|
189 |
dependent.</para>
|
|
|
190 |
|
|
|
191 |
<para>To conclude, we have, by including <code>stdio.h</code>, been
|
|
|
192 |
able to effectively separate the target independent part of our
|
|
|
193 |
program (the main program) from the target dependent part (the
|
|
|
194 |
details of <code>stdout</code> and <code>fputs</code>). It is one
|
|
|
195 |
of the jobs of the compiler to recombine these parts to produce a
|
|
|
196 |
complete program.</para>
|
|
|
197 |
</sect3>
|
|
|
198 |
|
|
|
199 |
<sect3 id="S6">
|
|
|
200 |
<title>2.1.3. Application Programming Interfaces</title>
|
|
|
201 |
|
|
|
202 |
<para>As we have seen, the separation of the target dependent
|
|
|
203 |
sections of a program into the system headers and system libraries
|
|
|
204 |
greatly facilitates the construction of portable programs. What
|
|
|
205 |
has been done is to define an interface between the main program
|
|
|
206 |
and the existing operating system on the target machine in
|
|
|
207 |
abstract terms. The program should then be portable to any machine
|
|
|
208 |
which implements this interface correctly.</para>
|
|
|
209 |
|
|
|
210 |
<para>The interface for the "Hello world" program above might be
|
|
|
211 |
described as follows : defined in the header <code>stdio.h</code>
|
|
|
212 |
are a type <code>FILE</code> representing a file, an object
|
|
|
213 |
<code>stdout</code> of type <code>FILE *</code> representing the
|
|
|
214 |
standard output file, and a procedure <code>fputs</code> with
|
|
|
215 |
prototype:
|
|
|
216 |
|
|
|
217 |
<programlisting>
|
|
|
218 |
int fputs(const char *s, FILE *f);
|
|
|
219 |
</programlisting>
|
|
|
220 |
|
|
|
221 |
which prints the string <code>s</code> to the file <code>f</code>.
|
|
|
222 |
This is an example of an Application Programming Interface (API).
|
|
|
223 |
Note that it can be split into two aspects, the syntactic (what
|
|
|
224 |
they are) and the semantic (what they mean). On any machine which
|
|
|
225 |
implements this API our program is both syntactically correct and
|
|
|
226 |
does what we expect it to.</para>
|
|
|
227 |
|
|
|
228 |
<para>The benefit of describing the API at this fairly high level is
|
|
|
229 |
that it leaves scope for a range of implementation (and thus more
|
|
|
230 |
machines which implement it) while still encapsulating the main
|
|
|
231 |
program's requirements.</para>
|
|
|
232 |
|
|
|
233 |
<para>In the example implementation of <code>stdio.h</code> above we
|
|
|
234 |
see that this machine implements this API correctly syntactically,
|
|
|
235 |
but not necessarily semantically. One would have to read the
|
|
|
236 |
documentation provided on the system to be sure of the
|
|
|
237 |
semantics.</para>
|
|
|
238 |
|
|
|
239 |
<para>Another way of defining an API for this program would be to
|
|
|
240 |
note that the given API is a subset of the ANSI C standard. Thus
|
|
|
241 |
we could take ANSI C as an "off the shelf" API. It is then clear
|
|
|
242 |
that our program should be portable to any ANSI-compliant
|
|
|
243 |
machine.</para>
|
|
|
244 |
|
|
|
245 |
<para>It is worth emphasising that all programs have an API, even if
|
|
|
246 |
it is implicit rather than explicit. However it is probably fair
|
|
|
247 |
to say that programs without an explicit API are only portable by
|
|
|
248 |
accident. We shall have more to say on this subject later.</para>
|
|
|
249 |
</sect3>
|
|
|
250 |
|
|
|
251 |
<sect3 id="S7">
|
|
|
252 |
<title>2.1.4. Compilation Phases</title>
|
|
|
253 |
|
|
|
254 |
<para>The general plan for how to write the extreme example of a
|
|
|
255 |
portable program, namely one which contains no target dependent
|
|
|
256 |
code, is now clear. It is shown in the compilation diagram in Fig.
|
|
|
257 |
1 which represents the traditional compilation process. This
|
|
|
258 |
diagram is divided into four sections. The left half of the
|
|
|
259 |
diagram represents the actual program and the right half the
|
|
|
260 |
associated API. The top half of the diagram represents target
|
|
|
261 |
independent material - things which only need to be done once -
|
|
|
262 |
and the bottom half target dependent material - things which need
|
|
|
263 |
to be done on every target machine.</para>
|
|
|
264 |
|
|
|
265 |
<para>FIGURE 1. Traditional Compilation Phases</para>
|
|
|
266 |
|
|
|
267 |
<img src="../images/trad_scheme.gif" />
|
|
|
268 |
|
|
|
269 |
<para> So, we write our target independent program (top left),
|
|
|
270 |
conforming to the target independent API specification (top
|
|
|
271 |
right). All the compilation actually takes place on the target
|
|
|
272 |
machine. This machine must have the API correctly implemented
|
|
|
273 |
(bottom right). This implementation will in general be in two
|
|
|
274 |
parts - the system headers, providing type definitions, macros,
|
|
|
275 |
procedure prototypes and so on, and the system libraries,
|
|
|
276 |
providing the actual procedure definitions. Another way of
|
|
|
277 |
characterising this division is between syntax (the system
|
|
|
278 |
headers) and semantics (the system libraries).</para>
|
|
|
279 |
|
|
|
280 |
<para>The compilation is divided into three main phases. Firstly the
|
|
|
281 |
system headers are inserted into the program by the pre-processor.
|
|
|
282 |
This produces, in effect, a target dependent version of the
|
|
|
283 |
original program. This is then compiled into a binary object file.
|
|
|
284 |
During the compilation process the compiler inserts all the
|
|
|
285 |
information it has about the machine - including the Application
|
|
|
286 |
Binary Interface (ABI) - the sizes of the basic C types, how they
|
|
|
287 |
are combined into compound types, the system procedure calling
|
|
|
288 |
conventions and so on. This ensures that in the final linking
|
|
|
289 |
phase the binary object file and the system libraries are obeying
|
|
|
290 |
the same ABI, thereby producing a valid executable. (On a
|
|
|
291 |
dynamically linked system this final linking phase takes place
|
|
|
292 |
partially at run time rather than at compile time, but this does
|
|
|
293 |
not really affect the general scheme.)</para>
|
|
|
294 |
|
|
|
295 |
<para>The compilation scheme just described consists of a series of
|
|
|
296 |
phases of two types ; code combination (the pre-processing and
|
|
|
297 |
system linking phases) and code transformation (the actual
|
|
|
298 |
compilation phases). The existence of the combination phases
|
|
|
299 |
allows for the effective separation of the target independent code
|
|
|
300 |
(in this case, the whole program) from the target dependent code
|
|
|
301 |
(in this case, the API implementation), thereby aiding the
|
|
|
302 |
construction of portable programs. These ideas on the separation,
|
|
|
303 |
combination and transformation of code underlie the TDF approach
|
|
|
304 |
to portability.</para>
|
|
|
305 |
</sect3>
|
|
|
306 |
</sect2>
|
|
|
307 |
|
|
|
308 |
<sect2 id="S8">
|
|
|
309 |
<title>2.2. Portability Problems</title>
|
|
|
310 |
|
|
|
311 |
<para>We have set out a scheme whereby it should be possible to write
|
|
|
312 |
portable programs with a minimum of difficulties. So why, in
|
|
|
313 |
reality, does it cause so many problems? Recall that we are still
|
|
|
314 |
primarily concerned with programs which contain no target dependent
|
|
|
315 |
code, although most of the points raised apply by extension to all
|
|
|
316 |
programs.</para>
|
|
|
317 |
|
|
|
318 |
<sect3 id="S9">
|
|
|
319 |
<title>2.2.1. Programming Problems</title>
|
|
|
320 |
|
|
|
321 |
<para>A first, obvious class of problems concern the program itself.
|
|
|
322 |
It is to be assumed that as many bugs as possible have been
|
|
|
323 |
eliminated by testing and debugging on at least one platform
|
|
|
324 |
before a program is considered as a candidate for being a portable
|
|
|
325 |
program. But for even the most self-contained program, working on
|
|
|
326 |
one platform is no guarantee of working on another. The program
|
|
|
327 |
may use undefined behaviour - using uninitialised values or
|
|
|
328 |
dereferencing null pointers, for example - or have built-in
|
|
|
329 |
assumptions about the target machine - whether it is big-endian or
|
|
|
330 |
little-endian, or what the sizes of the basic integer types are,
|
|
|
331 |
for example. This latter point is going to become increasingly
|
|
|
332 |
important over the next couple of years as 64-bit architectures
|
|
|
333 |
begin to be introduced. How many existing programs implicitly
|
|
|
334 |
assume a 32-bit architecture?</para>
|
|
|
335 |
|
|
|
336 |
<para>Many of these built-in assumptions may arise because of the
|
|
|
337 |
conventional porting process. A program is written on one machine,
|
|
|
338 |
modified slightly to make it work on a second machine, and so on.
|
|
|
339 |
This means that the program is "biased" towards the existing set
|
|
|
340 |
of target machines, and most particularly to the original machine
|
|
|
341 |
it was written on. This applies not only to assumptions about
|
|
|
342 |
endianness, say, but also to the questions of API conformance
|
|
|
343 |
which we will be discussing below.</para>
|
|
|
344 |
|
|
|
345 |
<para>Most compilers will pick up some of the grosser programming
|
|
|
346 |
errors, particularly by type checking (including procedure
|
|
|
347 |
arguments if prototypes are used). Some of the subtler errors can
|
|
|
348 |
be detected using the <b>-Wall</b> option to the Free Software
|
|
|
349 |
Foundation's GNU C Compiler (<code>gcc</code>) or separate program
|
|
|
350 |
checking tools such as <code>lint</code>, for example, but this
|
|
|
351 |
remains a very difficult area.</para>
|
|
|
352 |
</sect3>
|
|
|
353 |
|
|
|
354 |
<sect3 id="S10">
|
|
|
355 |
<title>2.2.2. Code Transformation Problems</title>
|
|
|
356 |
|
|
|
357 |
<para>We now move on from programming problems to compilation
|
|
|
358 |
problems. As we mentioned above, compilation may be regarded as a
|
|
|
359 |
series of phases of two types : combination and transformation.
|
|
|
360 |
Transformation of code - translating a program in one form into an
|
|
|
361 |
equivalent program in another form - may lead to a variety of
|
|
|
362 |
problems. The code may be transformed wrongly, so that the
|
|
|
363 |
equivalence is broken (a compiler bug), or in an unexpected manner
|
|
|
364 |
(differing compiler interpretations), or not at all, because it is
|
|
|
365 |
not recognised as legitimate code (a compiler limitation). The
|
|
|
366 |
latter two problems are most likely when the input is a high level
|
|
|
367 |
language, with complex syntax and semantics.</para>
|
|
|
368 |
|
|
|
369 |
<para>Note that in Fig. 1 all the actual compilation takes place on
|
|
|
370 |
the target machine. So, to port the program to
|
|
|
371 |
<varname>n</varname> machines, we need to deal with the bugs and
|
|
|
372 |
limitations of <varname>n</varname>, potentially different,
|
|
|
373 |
compilers. For example, if you have written your program using
|
|
|
374 |
prototypes, it is going to be a large and rather tedious job
|
|
|
375 |
porting it to a compiler which does not have prototypes (this
|
|
|
376 |
particular example can be automated; not all such jobs can). Other
|
|
|
377 |
compiler limitations can be surprising
|
|
|
378 |
- not understanding the <code>L</code> suffix for long numeric
|
|
|
379 |
literals and not allowing members of enumeration types as array
|
|
|
380 |
indexes are among the problems drawn from my personal
|
|
|
381 |
experience.</para>
|
|
|
382 |
|
|
|
383 |
<para>The differing compiler interpretations may be more subtle. For
|
|
|
384 |
example, there are differences between ANSI and "traditional" C
|
|
|
385 |
which may trap the unwary. Examples are the promotion of integral
|
|
|
386 |
types and the resolution of the linkage of static objects.</para>
|
|
|
387 |
|
|
|
388 |
<para>Many of these problems may be reduced by using the "same"
|
|
|
389 |
compiler on all the target machines. For example, <code>gcc</code>
|
|
|
390 |
has a single front end (C -> RTL) which may be combined with an
|
|
|
391 |
appropriate back end (RTL -> target) to form a suitable
|
|
|
392 |
compiler for a wide range of target machines. The existence of a
|
|
|
393 |
single front end virtually eliminates the problems of differing
|
|
|
394 |
interpretation of code and compiler quirks. It also reduces the
|
|
|
395 |
exposure to bugs. Instead of being exposed to the bugs in
|
|
|
396 |
<varname>n</varname> separate compilers, we are now only exposed
|
|
|
397 |
to bugs in one half-compiler (the front end) plus
|
|
|
398 |
<varname>n</varname> half-compilers (the back ends) - a total of
|
|
|
399 |
<varname>(n + 1) / 2</varname>. (This calculation is not meant
|
|
|
400 |
totally seriously, but it is true in principle.) Front end bugs,
|
|
|
401 |
when tracked down, also only require a single workaround.</para>
|
|
|
402 |
</sect3>
|
|
|
403 |
|
|
|
404 |
<sect3>
|
|
|
405 |
<title id="S11">2.2.3. Code Combination Problems</title>
|
|
|
406 |
|
|
|
407 |
<para>If code transformation problems may be regarded as a time
|
|
|
408 |
consuming irritation, involving the rewriting of sections of code
|
|
|
409 |
or using a different compiler, the second class of problems, those
|
|
|
410 |
concerned with the combination of code, are far more
|
|
|
411 |
serious.</para>
|
|
|
412 |
|
|
|
413 |
<para>The first code combination phase is the pre-processor pulling
|
|
|
414 |
in the system headers. These can contain some nasty surprises.
|
|
|
415 |
For example, consider a simple ANSI compliant program which
|
|
|
416 |
contains a linked list of strings arranged in alphabetical order.
|
|
|
417 |
This might also contain a routine:</para>
|
|
|
418 |
|
|
|
419 |
<programlisting>
|
|
|
420 |
void index(char *);
|
|
|
421 |
</programlisting>
|
|
|
422 |
|
|
|
423 |
<para>which adds a string to this list in the appropriate position,
|
|
|
424 |
using <code>strcmp</code> from <code>string.h</code> to find it.
|
|
|
425 |
This works fine on most machines, but on some it gives the
|
|
|
426 |
error:</para>
|
|
|
427 |
|
|
|
428 |
<programlisting>
|
|
|
429 |
Only 1 argument to macro 'index'
|
|
|
430 |
</programlisting>
|
|
|
431 |
|
|
|
432 |
<para>The reason for this is that the system version of
|
|
|
433 |
<code>string.h</code> contains the line:</para>
|
|
|
434 |
|
|
|
435 |
<programlisting>
|
|
|
436 |
#define index(s, c) strchr(s, c)
|
|
|
437 |
</programlisting>
|
|
|
438 |
|
|
|
439 |
<para>But this is nothing to do with ANSI, this macro is defined for
|
|
|
440 |
compatibility with BSD.</para>
|
|
|
441 |
|
|
|
442 |
<para>In reality the system headers on any given machine are a hodge
|
|
|
443 |
podge of implementations of different APIs, and it is often
|
|
|
444 |
virtually impossible to separate them (feature test macros such as
|
|
|
445 |
<code>_POSIX_SOURCE</code> are of some use, but are not always
|
|
|
446 |
implemented and do not always produce a complete separation; they
|
|
|
447 |
are only provided for "standard" APIs anyway). The problem above
|
|
|
448 |
arose because there is no transitivity rule of the form : if
|
|
|
449 |
program <varname>P</varname> conforms to API <varname>A</varname>,
|
|
|
450 |
and API <varname>B</varname> extends <varname>A</varname>, then
|
|
|
451 |
<varname>P</varname> conforms to <varname>B</varname>. The only
|
|
|
452 |
reason this is not true is these namespace problems.</para>
|
|
|
453 |
|
|
|
454 |
<para>A second example demonstrates a slightly different point. The
|
|
|
455 |
POSIX standard states that <code>sys/stat.h</code> contains the
|
|
|
456 |
definition of the structure <code>struct stat</code>, which
|
|
|
457 |
includes several members, amongst them:</para>
|
|
|
458 |
|
|
|
459 |
<programlisting>
|
|
|
460 |
time_t st_atime;
|
|
|
461 |
</programlisting>
|
|
|
462 |
|
|
|
463 |
<para>representing the access time for the corresponding file. So
|
|
|
464 |
the program:</para>
|
|
|
465 |
|
|
|
466 |
<programlisting>
|
|
|
467 |
#include <sys/types.h>
|
|
|
468 |
#include <sys/stat.h>
|
|
|
469 |
|
|
|
470 |
time_t
|
|
|
471 |
st_atime(struct stat *p)
|
|
|
472 |
{
|
|
|
473 |
return(p->st_atime);
|
|
|
474 |
}
|
|
|
475 |
</programlisting>
|
|
|
476 |
|
|
|
477 |
<para>should be perfectly valid - the procedure name
|
|
|
478 |
<code>st_atime</code> and the field selector <code>st_atime</code>
|
|
|
479 |
occupy different namespaces (see however the appendix on
|
|
|
480 |
namespaces and APIs below). However at least one popular operating
|
|
|
481 |
system has the implementation:</para>
|
|
|
482 |
|
|
|
483 |
<programlisting>
|
|
|
484 |
struct stat{
|
|
|
485 |
....
|
|
|
486 |
union {
|
|
|
487 |
time_t st__sec;
|
|
|
488 |
timestruc_t st__tim;
|
|
|
489 |
} st_atim;
|
|
|
490 |
....
|
|
|
491 |
};
|
|
|
492 |
#define st_atime st_atim.st__sec
|
|
|
493 |
</programlisting>
|
|
|
494 |
|
|
|
495 |
<para>This seems like a perfectly legitimate implementation. In the
|
|
|
496 |
program above the field selector <code>st_atime</code> is replaced
|
|
|
497 |
by <code>st_atim.st__sec</code> by the pre-processor, as intended,
|
|
|
498 |
but unfortunately so is the procedure name <code>st_atime</code>,
|
|
|
499 |
leading to a syntax error.</para>
|
|
|
500 |
|
|
|
501 |
<para>The problem here is not with the program or the
|
|
|
502 |
implementation, but in the way they were combined. C does not
|
|
|
503 |
allow individual field selectors to be defined. Instead the
|
|
|
504 |
indiscriminate sledgehammer of macro substitution was used,
|
|
|
505 |
leading to the problem described.</para>
|
|
|
506 |
|
|
|
507 |
<para>Problems can also occur in the other combination phase of the
|
|
|
508 |
traditional compilation scheme, the system linking. Consider the
|
|
|
509 |
ANSI compliant routine:</para>
|
|
|
510 |
|
|
|
511 |
<programlisting>
|
|
|
512 |
#include <stdio.h>
|
|
|
513 |
|
|
|
514 |
int open ( char *nm )
|
|
|
515 |
{
|
|
|
516 |
int c, n = 0 ;
|
|
|
517 |
FILE *f = fopen ( nm, "r" ) ;
|
|
|
518 |
if ( f == NULL ) return ( -1 ) ;
|
|
|
519 |
while ( c = getc ( f ), c != EOF ) n++ ;
|
|
|
520 |
( void ) fclose ( f ) ;
|
|
|
521 |
return ( n ) ;
|
|
|
522 |
}
|
|
|
523 |
</programlisting>
|
|
|
524 |
|
|
|
525 |
<para>which opens the file <code>nm</code>, returning its size in
|
|
|
526 |
bytes if it exists and -1 otherwise. As a quick porting exercise,
|
|
|
527 |
I compiled it under six different operating systems. On three it
|
|
|
528 |
worked correctly; on one it returned -1 even when the file
|
|
|
529 |
existed; and on two it crashed with a segmentation error.</para>
|
|
|
530 |
|
|
|
531 |
<para>The reason for this lies in the system linking. On those
|
|
|
532 |
machines which failed the library routine <code>fopen</code>
|
|
|
533 |
calls (either directly or indirectly) the library routine
|
|
|
534 |
<code>open</code> (which is in POSIX, but not ANSI). The system
|
|
|
535 |
linker, however, linked my routine <code>open</code> instead of
|
|
|
536 |
the system version, so the call to <code>fopen</code> did not
|
|
|
537 |
work correctly.</para>
|
|
|
538 |
|
|
|
539 |
<para>So code combination problems are primarily namespace problems.
|
|
|
540 |
The task of combining the program with the API implementation on
|
|
|
541 |
a given platform is complicated by the fact that, because the
|
|
|
542 |
system headers and system libraries contain things other than the
|
|
|
543 |
API implementation, or even because of the particular
|
|
|
544 |
implementation chosen, the various namespaces in which the
|
|
|
545 |
program is expected to operate become "polluted".</para>
|
|
|
546 |
</sect3>
|
|
|
547 |
|
|
|
548 |
<sect3>
|
|
|
549 |
<title id="S12">2.2.4. API Problems</title>
|
|
|
550 |
<para>We have
|
|
|
551 |
said that the API defines the interface between the program and
|
|
|
552 |
the standard library provided with the operating system on the
|
|
|
553 |
target machine. There are three main problems concerned with
|
|
|
554 |
APIs. The first, how to choose the API in the first place, is
|
|
|
555 |
discussed separately. Here we deal with the compilation aspects :
|
|
|
556 |
how to check that the program conforms to its API, and what to do
|
|
|
557 |
about incorrect API implementations on the target machine(s).</para>
|
|
|
558 |
|
|
|
559 |
<sect4>
|
|
|
560 |
<title id="S13">2.2.4.1. API Checking</title>
|
|
|
561 |
<para>The
|
|
|
562 |
problem of whether or not a program conforms to its API - not
|
|
|
563 |
using any objects from the operating system other than those
|
|
|
564 |
specified in the API, and not making any unwarranted assumptions
|
|
|
565 |
about these objects - is one which does not always receive
|
|
|
566 |
sufficient attention, mostly because the necessary checking tools
|
|
|
567 |
do not exist (or at least are not widely available). Compiling
|
|
|
568 |
the program on a number of API compliant machines merely checks
|
|
|
569 |
the program against the system headers for these machines. For a
|
|
|
570 |
genuine portability check we need to check against the abstract
|
|
|
571 |
API description, thereby in effect checking against all possible
|
|
|
572 |
implementations.</para>
|
|
|
573 |
|
|
|
574 |
<para>Recall from above that the system headers on a given machine
|
|
|
575 |
are an amalgam of all the APIs it implements. This can cause
|
|
|
576 |
programs which should compile not to, because of namespace
|
|
|
577 |
clashes; but it may also cause programs to compile which should
|
|
|
578 |
not, because they have used objects which are not in their API,
|
|
|
579 |
but which are in the system headers. For example, the supposedly
|
|
|
580 |
ANSI compliant program:
|
|
|
581 |
<programlisting>
|
|
|
582 |
#include <signal.h>
|
|
|
583 |
int sig = SIGKILL ;
|
|
|
584 |
</programlisting>
|
|
|
585 |
will compile on most systems, despite the fact that
|
|
|
586 |
<code>SIGKILL</code> is not an ANSI signal, because
|
|
|
587 |
<code>SIGKILL</code> is in POSIX, which is also implemented in the
|
|
|
588 |
system <code>signal.h</code>. Again, feature test macros are of
|
|
|
589 |
some use in trying to isolate the implementation of a single API
|
|
|
590 |
from the rest of the system headers. However they are highly
|
|
|
591 |
unlikely to detect the error in the following supposedly POSIX
|
|
|
592 |
compliant program which prints the entries of the directory <code>
|
|
|
593 |
nm</code>, together with their inode numbers:
|
|
|
594 |
<programlisting>
|
|
|
595 |
#include <stdio.h>
|
|
|
596 |
#include <sys/types.h>
|
|
|
597 |
#include <dirent.h>
|
|
|
598 |
|
|
|
599 |
void listdir ( char *nm )
|
|
|
600 |
{
|
|
|
601 |
struct dirent *entry ;
|
|
|
602 |
DIR *dir = opendir ( nm ) ;
|
|
|
603 |
if ( dir == NULL ) return ;
|
|
|
604 |
while ( entry = readdir ( dir ), entry != NULL ) {
|
|
|
605 |
printf ( "%s : %d\n", entry->d_name, ( int ) entry->d_ino ) ;
|
|
|
606 |
}
|
|
|
607 |
( void ) closedir ( dir ) ;
|
|
|
608 |
return ;
|
|
|
609 |
}
|
|
|
610 |
</programlisting>
|
|
|
611 |
This is not POSIX compliant because, whereas the
|
|
|
612 |
<code>d_name</code> field of <code>struct dirent</code> is in
|
|
|
613 |
POSIX, the <code>d_ino</code> field is not. It is however in XPG3,
|
|
|
614 |
so it is likely to be in many system implementations.</para>
|
|
|
615 |
|
|
|
616 |
<para>The previous examples have been concerned with simply telling
|
|
|
617 |
whether or not a particular object is in an API. A more
|
|
|
618 |
difficult, and in a way more important, problem is that of
|
|
|
619 |
assuming too much about the objects which are in the API. For
|
|
|
620 |
example, in the program:
|
|
|
621 |
<programlisting>
|
|
|
622 |
#include <stdio.h>
|
|
|
623 |
#include <stdlib.h>
|
|
|
624 |
|
|
|
625 |
div_t d = { 3, 4 } ;
|
|
|
626 |
|
|
|
627 |
int main ()
|
|
|
628 |
{
|
|
|
629 |
printf ( "%d,%d\n", d.quot, d.rem ) ;
|
|
|
630 |
return ( 0 ) ;
|
|
|
631 |
}
|
|
|
632 |
</programlisting>
|
|
|
633 |
the ANSI standard specifies that the type <code>div_t</code>
|
|
|
634 |
is a structure containing two fields, <code>quot</code> and <code>
|
|
|
635 |
rem</code>, of type <code>int</code>, but it does not specify
|
|
|
636 |
which order these fields appear in, or indeed if there are other
|
|
|
637 |
fields. Therefore the initialisation of <code>d</code> is not
|
|
|
638 |
portable. Again, the type <code>time_t</code> is used to
|
|
|
639 |
represent times in seconds since a certain fixed date. On most
|
|
|
640 |
systems this is implemented as <code>long</code>, so it is
|
|
|
641 |
tempting to use <code>( t & 1 )</code> to determine for a
|
|
|
642 |
<code>time_t</code> <code>t</code> whether this number of seconds
|
|
|
643 |
is odd or even. But ANSI actually says that <code>time_t</code>
|
|
|
644 |
is an arithmetic, not an integer, type, so it would be possible
|
|
|
645 |
for it to be implemented as <code>double</code>. But in this case
|
|
|
646 |
<code>( t & 1 )</code> is not even type correct, so it is not
|
|
|
647 |
a portable way of finding out whether <code>t</code> is odd or
|
|
|
648 |
even.</para>
|
|
|
649 |
</sect4>
|
|
|
650 |
|
|
|
651 |
<sect4>
|
|
|
652 |
<title id="S14">2.2.4.2. API Implementation Errors</title>
|
|
|
653 |
<para>Undoubtedly the problem which causes the writer of
|
|
|
654 |
portable programs the greatest headache (and heartache) is that
|
|
|
655 |
of incorrect API implementations. However carefully you have
|
|
|
656 |
chosen your API and checked that your program conforms to it, you
|
|
|
657 |
are still reliant on someone (usually the system vendor) having
|
|
|
658 |
implemented this API correctly on the target machine. Machines
|
|
|
659 |
which do not implement the API at all do not enter the equation
|
|
|
660 |
(they are not suitable target machines), what causes problems is
|
|
|
661 |
incorrect implementations. As the implementation may be divided
|
|
|
662 |
into two parts - system headers and system libraries - we shall
|
|
|
663 |
similarly divide our discussion. Inevitably the choice of
|
|
|
664 |
examples is personal; anyone who has ever attempted to port a
|
|
|
665 |
program to a new machine is likely to have their own favourite
|
|
|
666 |
examples.</para>
|
|
|
667 |
</sect4>
|
|
|
668 |
|
|
|
669 |
<sect4>
|
|
|
670 |
<title id="S15">2.2.4.3. System Header Problems</title>
|
|
|
671 |
<para>Some header problems are immediately apparent
|
|
|
672 |
because they are syntactic and cause the program to fail to
|
|
|
673 |
compile. For example, values may not be defined or be defined in
|
|
|
674 |
the wrong place (not in the header prescribed by the API).</para>
|
|
|
675 |
|
|
|
676 |
<para>A common example (one which I have to include a workaround for
|
|
|
677 |
in virtually every program I write) is that
|
|
|
678 |
<code>EXIT_SUCCESS</code> and <code>EXIT_FAILURE</code> are not
|
|
|
679 |
always defined (ANSI specifies that they should be in
|
|
|
680 |
<code>stdlib.h</code>). It is tempting to change <code>exit
|
|
|
681 |
(EXIT_FAILURE)</code> to <code>exit (1)</code> because "everyone
|
|
|
682 |
knows" that <code>EXIT_FAILURE</code> is 1. But this is to
|
|
|
683 |
decrease the portability of the program because it ties it to a
|
|
|
684 |
particular class of implementations. A better workaround would
|
|
|
685 |
be:
|
|
|
686 |
<programlisting>
|
|
|
687 |
#include <stdlib.h>
|
|
|
688 |
#ifndef EXIT_FAILURE
|
|
|
689 |
#define EXIT_FAILURE 1
|
|
|
690 |
#endif
|
|
|
691 |
</programlisting>
|
|
|
692 |
which assumes that anyone choosing a non-standard value for
|
|
|
693 |
<code>EXIT_FAILURE</code> is more likely to put it in
|
|
|
694 |
<code>stdlib.h</code>. Of course, if one subsequently came across a
|
|
|
695 |
machine on which not only is <code>EXIT_FAILURE</code> not defined,
|
|
|
696 |
but also the value it should have is not 1, then it would be
|
|
|
697 |
necessary to resort to <code>#ifdef machine_name</code> statements.
|
|
|
698 |
The same is true of all the API implementation problems we shall be
|
|
|
699 |
discussing : non-conformant machines require workarounds involving
|
|
|
700 |
conditional compilation. As more machines are considered, so these
|
|
|
701 |
conditional compilations multiply.</para>
|
|
|
702 |
|
|
|
703 |
<para>As an example of things being defined in the wrong place, ANSI
|
|
|
704 |
specifies that <code>SEEK_SET</code>, <code>SEEK_CUR</code> and
|
|
|
705 |
<code>SEEK_END</code> should be defined in <code>stdio.h</code>,
|
|
|
706 |
whereas POSIX specifies that they should also be defined in
|
|
|
707 |
<code>unistd.h</code>. It is not uncommon to find machines on
|
|
|
708 |
which they are defined in the latter but not in the former. A
|
|
|
709 |
possible workaround in this case would be:
|
|
|
710 |
<programlisting>
|
|
|
711 |
#include <stdio.h>
|
|
|
712 |
#ifndef SEEK_SET
|
|
|
713 |
#include <unistd.h>
|
|
|
714 |
#endif
|
|
|
715 |
</programlisting>
|
|
|
716 |
Of course, by including "unnecessary" headers like
|
|
|
717 |
<code>unistd.h</code> the risk of namespace clashes such as those
|
|
|
718 |
discussed above is increased.</para>
|
|
|
719 |
|
|
|
720 |
<para>A final syntactic problem, which perhaps should belong with
|
|
|
721 |
the system header problems above, concerns dependencies between
|
|
|
722 |
the headers themselves. For example, the POSIX header
|
|
|
723 |
<code>unistd.h</code> declares functions involving some of the
|
|
|
724 |
types <code>pid_t</code>, <code>uid_t</code> etc, defined in
|
|
|
725 |
<code>sys/types.h</code>. Is it necessary to include
|
|
|
726 |
<code>sys/types.h</code> before including <code>unistd.h</code>,
|
|
|
727 |
or does <code>unistd.h</code> automatically include
|
|
|
728 |
<code>sys/types.h</code>? The approach of playing safe and
|
|
|
729 |
including everything will normally work, but this can lead to
|
|
|
730 |
multiple inclusions of a header. This will normally cause no
|
|
|
731 |
problems because the system headers are protected against
|
|
|
732 |
multiple inclusions by means of macros, but it is not unknown for
|
|
|
733 |
certain headers to be left unprotected. Also not all header
|
|
|
734 |
dependencies are as clear cut as the one given, so that what
|
|
|
735 |
headers need to be included, and in what order, is in fact target
|
|
|
736 |
dependent.</para>
|
|
|
737 |
|
|
|
738 |
<para>There can also be semantic errors in the system headers :
|
|
|
739 |
namely wrongly defined values. The following two examples are
|
|
|
740 |
taken from real operating systems. Firstly the definition:
|
|
|
741 |
<programlisting>
|
|
|
742 |
#define DBL_MAX 1.797693134862316E+308
|
|
|
743 |
</programlisting>
|
|
|
744 |
in <code>float.h</code> on an IEEE-compliant machine is
|
|
|
745 |
subtly wrong - the given value does not fit into a
|
|
|
746 |
<code>double</code> - the correct value is:
|
|
|
747 |
<programlisting>
|
|
|
748 |
#define DBL_MAX 1.7976931348623157E+308
|
|
|
749 |
</programlisting>
|
|
|
750 |
Again, the type definition:
|
|
|
751 |
<programlisting>
|
|
|
752 |
typedef int size_t ; /* ??? */
|
|
|
753 |
</programlisting>
|
|
|
754 |
(sic) is not compliant with ANSI, which says that
|
|
|
755 |
<code>size_t</code> is an unsigned integer type. (I'm not sure if
|
|
|
756 |
this is better or worse than another system which defines
|
|
|
757 |
<code>ptrdiff_t</code> to be <code>unsigned int</code> when it is
|
|
|
758 |
meant to be signed. This would mean that the difference between any
|
|
|
759 |
two pointers is always positive.) These particular examples are
|
|
|
760 |
irritating because it would have cost nothing to get things right,
|
|
|
761 |
correcting the value of <code>DBL_MAX</code> and changing the
|
|
|
762 |
definition of <code>size_t</code> to <code>unsigned int</code>.
|
|
|
763 |
These corrections are so minor that the modified system headers
|
|
|
764 |
would still be a valid interface for the existing system libraries
|
|
|
765 |
(we shall have more to say about this later). However it is not
|
|
|
766 |
possible to change the system headers, so it is necessary to build
|
|
|
767 |
workarounds into the program. Whereas in the first case it is
|
|
|
768 |
possible to devise such a workaround:
|
|
|
769 |
<programlisting>
|
|
|
770 |
#include <float.h>
|
|
|
771 |
#ifdef machine_name
|
|
|
772 |
#undef DBL_MAX
|
|
|
773 |
#define DBL_MAX 1.7976931348623157E+308
|
|
|
774 |
#endif
|
|
|
775 |
</programlisting>
|
|
|
776 |
for example, in the second, because <code>size_t</code> is
|
|
|
777 |
defined by a <code>typedef</code> it is virtually impossible to
|
|
|
778 |
correct in a simple fashion. Thus any program which relies on the
|
|
|
779 |
fact that <code>size_t</code> is unsigned will require considerable
|
|
|
780 |
rewriting before it can be ported to this machine.</para>
|
|
|
781 |
</sect4>
|
|
|
782 |
|
|
|
783 |
<sect4>
|
|
|
784 |
<title id="S16">2.2.4.4. System Library Problems</title>
|
|
|
785 |
<para>The system header problems just discussed are
|
|
|
786 |
primarily syntactic problems. By contrast, system library
|
|
|
787 |
problems are primarily semantic - the provided library routines
|
|
|
788 |
do not behave in the way specified by the API. This makes them
|
|
|
789 |
harder to detect. For example, consider the routine:
|
|
|
790 |
<programlisting>
|
|
|
791 |
void *realloc ( void *p, size_t s ) ;
|
|
|
792 |
</programlisting>
|
|
|
793 |
which reallocates the block of memory <code>p</code> to have
|
|
|
794 |
size <code>s</code> bytes, returning the new block of memory. The
|
|
|
795 |
ANSI standard says that if <code>p</code> is the null pointer, then
|
|
|
796 |
the effect of <code>realloc ( p, s )</code> is the same as
|
|
|
797 |
<code>malloc ( s )</code>, that is, to allocate a new block of
|
|
|
798 |
memory of size <code>s</code>. This behaviour is exploited in the
|
|
|
799 |
following program, in which the routine <code>add_char</code> adds
|
|
|
800 |
a character to the expanding array, <code>buffer</code>:
|
|
|
801 |
<programlisting>
|
|
|
802 |
#include <stdio.h>
|
|
|
803 |
#include <stdlib.h>
|
|
|
804 |
|
|
|
805 |
char *buffer = NULL ;
|
|
|
806 |
int buff_sz = 0, buff_posn = 0 ;
|
|
|
807 |
|
|
|
808 |
void add_char ( char c )
|
|
|
809 |
{
|
|
|
810 |
if ( buff_posn >= buff_sz ) {
|
|
|
811 |
buff_sz += 100 ;
|
|
|
812 |
buffer = ( char * ) realloc ( ( void * ) buffer, buff_sz * sizeof ( char ) ) ;
|
|
|
813 |
if ( buffer == NULL ) {
|
|
|
814 |
fprintf ( stderr, "Memory allocation error\n" ) ;
|
|
|
815 |
exit ( EXIT_FAILURE ) ;
|
|
|
816 |
}
|
|
|
817 |
}
|
|
|
818 |
buffer [ buff_posn++ ] = c ;
|
|
|
819 |
return ;
|
|
|
820 |
}
|
|
|
821 |
</programlisting>
|
|
|
822 |
On the first call of <code>add_char</code>,
|
|
|
823 |
<code>buffer</code> is set to a real block of memory (as opposed to
|
|
|
824 |
<code>NULL</code>) by a call of the form <code>realloc ( NULL, s
|
|
|
825 |
)</code>. This is extremely convenient and efficient - if it was
|
|
|
826 |
not for this behaviour we would have to have an explicit
|
|
|
827 |
initialisation of <code>buffer</code>, either as a special case in
|
|
|
828 |
<code>add_char</code> or in a separate initialisation routine.</para>
|
|
|
829 |
|
|
|
830 |
<para>Of course this all depends on the behaviour of <code>realloc (
|
|
|
831 |
NULL, s )</code> having been implemented precisely as described
|
|
|
832 |
in the ANSI standard. The first indication that this is not so on
|
|
|
833 |
a particular target machine might be when the program is compiled
|
|
|
834 |
and run on that machine for the first time and does not perform
|
|
|
835 |
as expected. To track the problem down will demand time debugging
|
|
|
836 |
the program.</para>
|
|
|
837 |
|
|
|
838 |
<para>Once the problem has been identified as being with
|
|
|
839 |
<code>realloc</code> a number of possible workarounds are
|
|
|
840 |
possible. Perhaps the most interesting is to replace the
|
|
|
841 |
inclusion of <code>stdlib.h</code> by the following:
|
|
|
842 |
<programlisting>
|
|
|
843 |
#include <stdlib.h>
|
|
|
844 |
#ifdef machine_name
|
|
|
845 |
#define realloc ( p, s )\
|
|
|
846 |
( ( p ) ? ( realloc ) ( p, s ) : malloc ( s ) )
|
|
|
847 |
#endif
|
|
|
848 |
</programlisting>
|
|
|
849 |
where <code>realloc ( p, s )</code> is redefined as a macro
|
|
|
850 |
which is the result of the procedure <code>realloc</code> if <code>
|
|
|
851 |
p</code> is not null, and <code>malloc ( s )</code> otherwise.
|
|
|
852 |
(In fact this macro will not always have the desired effect,
|
|
|
853 |
although it does in this case. Why (exercise)?)</para>
|
|
|
854 |
|
|
|
855 |
<para>The only alternative to this trial and error approach to
|
|
|
856 |
finding API implementation problems is the application of
|
|
|
857 |
personal experience, either of the particular target machine or
|
|
|
858 |
of things that are implemented wrongly by many machines and as
|
|
|
859 |
such should be avoided. This sort of detailed knowledge is not
|
|
|
860 |
easily acquired. Nor can it ever be complete: new operating
|
|
|
861 |
system releases are becoming increasingly regular and are on
|
|
|
862 |
occasions quite as likely to introduce new implementation errors
|
|
|
863 |
as to solve existing ones. It is in short a "black art".</para>
|
|
|
864 |
</sect4>
|
|
|
865 |
</sect3>
|
|
|
866 |
</sect2>
|
|
|
867 |
|
|
|
868 |
<sect2>
|
|
|
869 |
<title id="S17">2.3. APIs and Portability</title>
|
|
|
870 |
<para>We now return to our discussion
|
|
|
871 |
of the general issues involved in portability to more closely
|
|
|
872 |
examine the role of the API.</para>
|
|
|
873 |
|
|
|
874 |
<sect3>
|
|
|
875 |
<title id="S18">2.3.1. Target Dependent Code</title>
|
|
|
876 |
<para>So far we have been considering programs which
|
|
|
877 |
contain no conditional compilation, in which the API forms the
|
|
|
878 |
basis of the separation of the target independent code (the whole
|
|
|
879 |
program) and the target dependent code (the API implementation).
|
|
|
880 |
But a glance at most large C programs will reveal that they do
|
|
|
881 |
contain conditional compilation. The code is scattered with
|
|
|
882 |
<code>#if</code>'s and <code>#ifdef</code>'s which, in effect,
|
|
|
883 |
cause the pre-processor to construct slightly different programs
|
|
|
884 |
on different target machines. So here we do not have a clean
|
|
|
885 |
division between the target independent and the target dependent
|
|
|
886 |
code - there are small sections of target dependent code spread
|
|
|
887 |
throughout the program.</para>
|
|
|
888 |
|
|
|
889 |
<para>Let us briefly consider some of the reasons why it is
|
|
|
890 |
necessary to introduce this conditional compilation. Some have
|
|
|
891 |
already been mentioned - workarounds for compiler bugs, compiler
|
|
|
892 |
limitations, and API implementation errors; others will be
|
|
|
893 |
considered later. However the most interesting and important
|
|
|
894 |
cases concern things which need to be done genuinely differently
|
|
|
895 |
on different machines. This can be because they really cannot be
|
|
|
896 |
expressed in a target independent manner, or because the target
|
|
|
897 |
independent way of doing them is unacceptably inefficient.</para>
|
|
|
898 |
|
|
|
899 |
<para>Efficiency (either in terms of time or space) is a key issue
|
|
|
900 |
in many programs. The argument is often advanced that writing a
|
|
|
901 |
program portably means using the, often inefficient, lowest
|
|
|
902 |
common denominator approach. But under our definition of
|
|
|
903 |
portability it is the functionality that matters, not the actual
|
|
|
904 |
source code. There is nothing to stop different code being used
|
|
|
905 |
on different machines for reasons of efficiency.</para>
|
|
|
906 |
|
|
|
907 |
<para>To examine the relationship between target dependent code and
|
|
|
908 |
APIs, consider the simple program:
|
|
|
909 |
<programlisting>
|
|
|
910 |
#include <stdio.h>
|
|
|
911 |
|
|
|
912 |
int main ()
|
|
|
913 |
{
|
|
|
914 |
#ifdef mips
|
|
|
915 |
fputs ( "This machine is a mips\n", stdout ) ;
|
|
|
916 |
#endif
|
|
|
917 |
return ( 0 ) ;
|
|
|
918 |
}
|
|
|
919 |
</programlisting>
|
|
|
920 |
which prints a message if the target machine is a mips. What
|
|
|
921 |
is the API of this program? Basically it is the same as in the
|
|
|
922 |
"Hello world" example discussed in sections 2.1.1 and 2.1.2, but if
|
|
|
923 |
we wish the API to fully describe the interface between the program
|
|
|
924 |
and the target machine, we must also say that whether or not the
|
|
|
925 |
macro <code>mips</code> is defined is part of the API. Like the
|
|
|
926 |
rest of the API, this has a semantic aspect as well as a syntactic
|
|
|
927 |
- in this case that <code>mips</code> is only defined on mips
|
|
|
928 |
machines. Where it differs is in its implementation. Whereas the
|
|
|
929 |
main part of the API is implemented in the system headers and the
|
|
|
930 |
system libraries, the implementation of either defining, or not
|
|
|
931 |
defining, <code>mips</code> ultimately rests with the person
|
|
|
932 |
performing the compilation. (In this particular example, the macro
|
|
|
933 |
<code>mips</code> is normally built into the compiler on mips
|
|
|
934 |
machines, but this is only a convention.)</para>
|
|
|
935 |
|
|
|
936 |
<para>So the API in this case has two components : a system-defined
|
|
|
937 |
part which is implemented in the system headers and system
|
|
|
938 |
libraries, and a user-defined part which ultimately relies on the
|
|
|
939 |
person performing the compilation to provide an implementation.
|
|
|
940 |
The main point to be made in this section is that introducing
|
|
|
941 |
target dependent code is equivalent to introducing a user-defined
|
|
|
942 |
component to the API. The actual compilation process in the case
|
|
|
943 |
of programs containing target dependent code is basically the
|
|
|
944 |
same as that shown in Fig. 1. But whereas previously the vertical
|
|
|
945 |
division of the diagram also reflects a division of
|
|
|
946 |
responsibility - the left hand side is the responsibility of the
|
|
|
947 |
programmer (the person writing the program), and the right hand
|
|
|
948 |
side of the API specifier (for example, a standards defining
|
|
|
949 |
body) and the API implementor (the system vendor) - now the right
|
|
|
950 |
hand side is partially the responsibility of the programmer and
|
|
|
951 |
the person performing the compilation. The programmer specifies
|
|
|
952 |
the user-defined component of the API, and the person compiling
|
|
|
953 |
the program either implements this API (as in the mips example
|
|
|
954 |
above) or chooses between a number of alternative implementations
|
|
|
955 |
provided by the programmer (as in the example below).</para>
|
|
|
956 |
|
|
|
957 |
<para>Let us consider a more complex example. Consider the following
|
|
|
958 |
program which assumes, for simplicity, that an <code>unsigned
|
|
|
959 |
int</code> contains 32 bits:
|
|
|
960 |
<programlisting>
|
|
|
961 |
#include <stdio.h>
|
|
|
962 |
#include "config.h"
|
|
|
963 |
|
|
|
964 |
#ifndef SLOW_SHIFT
|
|
|
965 |
#define MSB ( a ) ( ( unsigned char ) ( a >> 24 ) )
|
|
|
966 |
#else
|
|
|
967 |
#ifdef BIG_ENDIAN
|
|
|
968 |
#define MSB ( a ) *( ( unsigned char * ) &( a ) )
|
|
|
969 |
#else
|
|
|
970 |
#define MSB ( a ) *( ( unsigned char * ) &( a ) + 3 )
|
|
|
971 |
#endif
|
|
|
972 |
#endif
|
|
|
973 |
|
|
|
974 |
unsigned int x = 100000000 ;
|
|
|
975 |
|
|
|
976 |
int main ()
|
|
|
977 |
{
|
|
|
978 |
printf ( "%u\n", MSB ( x ) ) ;
|
|
|
979 |
return ( 0 ) ;
|
|
|
980 |
}
|
|
|
981 |
</programlisting>
|
|
|
982 |
The intention is to print the most significant byte of <code>
|
|
|
983 |
x</code>. Three alternative definitions of the macro
|
|
|
984 |
<code>MSB</code> used to extract this value are provided. The
|
|
|
985 |
first, if <code>SLOW_SHIFT</code> is not defined, is simply to
|
|
|
986 |
shift the value right by 24 bits. This will work on all 32-bit
|
|
|
987 |
machines, but may be inefficient (depending on the nature of the
|
|
|
988 |
machine's shift instruction). So two alternatives are provided.
|
|
|
989 |
An <code>unsigned int</code> is assumed to consist of four
|
|
|
990 |
<code>unsigned char</code>'s. On a big-endian machine, the most
|
|
|
991 |
significant byte is the first of these <code>unsigned
|
|
|
992 |
char</code>'s; on a little-endian machine it is the fourth. The
|
|
|
993 |
second definition of <code>MSB</code> is intended to reflect the
|
|
|
994 |
former case, and the third the latter.</para>
|
|
|
995 |
|
|
|
996 |
<para>The person compiling the program has to choose between the
|
|
|
997 |
three possible implementations of <code>MSB</code> provided by
|
|
|
998 |
the programmer. This is done by either defining, or not defining,
|
|
|
999 |
the macros <code>SLOW_SHIFT</code> and <code>BIG_ENDIAN</code>.
|
|
|
1000 |
This could be done as command line options, but we have chosen to
|
|
|
1001 |
reflect another commonly used device, the configuration file. For
|
|
|
1002 |
each target machine, the programmer provides a version of the
|
|
|
1003 |
file <code>config.h</code> which defines the appropriate
|
|
|
1004 |
combination of the macros <code>SLOW_SHIFT</code> and
|
|
|
1005 |
<code>BIG_ENDIAN</code>. The person performing the compilation
|
|
|
1006 |
simply chooses the appropriate <code>config.h</code> for the
|
|
|
1007 |
target machine.</para>
|
|
|
1008 |
|
|
|
1009 |
<para>There are two possible ways of looking at what the
|
|
|
1010 |
user-defined API of this program is. Possibly it is most natural
|
|
|
1011 |
to say that it is <code>MSB</code>, but it could also be argued
|
|
|
1012 |
that it is the macros <code>SLOW_SHIFT</code> and
|
|
|
1013 |
<code>BIG_ENDIAN</code>. The former more accurately describes the
|
|
|
1014 |
target dependent code, but is only implemented indirectly, via
|
|
|
1015 |
the latter.</para>
|
|
|
1016 |
</sect3>
|
|
|
1017 |
|
|
|
1018 |
<sect3>
|
|
|
1019 |
<title id="S19">2.3.2. Making APIs Explicit</title>
|
|
|
1020 |
<para>As
|
|
|
1021 |
we have said, every program has an API even if it is implicit
|
|
|
1022 |
rather than explicit. Every system header included, every type or
|
|
|
1023 |
value used from it, and every library routine used, adds to the
|
|
|
1024 |
system-defined component of the API, and every conditional
|
|
|
1025 |
compilation adds to the user-defined component. What making the
|
|
|
1026 |
API explicit does is to encapsulate the set of requirements that
|
|
|
1027 |
the program has of the target machine (including requirements
|
|
|
1028 |
like, I need to know whether or not the target machine is
|
|
|
1029 |
big-endian, as well as, I need <code>fputs</code> to be
|
|
|
1030 |
implemented as in the ANSI standard). By making these
|
|
|
1031 |
requirements explicit it is made absolutely clear what is needed
|
|
|
1032 |
on a target machine if a program is to be ported to it. If the
|
|
|
1033 |
requirements are not explicit this can only be found by trial and
|
|
|
1034 |
error. This is what we meant earlier by saying that a program
|
|
|
1035 |
without an explicit API is only portable by accident.</para>
|
|
|
1036 |
|
|
|
1037 |
<para>Another advantage of specifying the requirements of a program
|
|
|
1038 |
is that it may increase their chances of being implemented. We
|
|
|
1039 |
have spoken as if porting is a one-way process; program writers
|
|
|
1040 |
porting their programs to new machines. But there is also traffic
|
|
|
1041 |
the other way. Machine vendors may wish certain programs to be
|
|
|
1042 |
ported to their machines. If these programs come with a list of
|
|
|
1043 |
requirements then the vendor knows precisely what to implement in
|
|
|
1044 |
order to make such a port possible.</para>
|
|
|
1045 |
</sect3>
|
|
|
1046 |
|
|
|
1047 |
<sect3>
|
|
|
1048 |
<title id="S20">2.3.3. Choosing an API</title>
|
|
|
1049 |
<para>So how
|
|
|
1050 |
does one go about choosing an API? In a sense the user-defined
|
|
|
1051 |
component is easier to specify than the system-defined component
|
|
|
1052 |
because it is less tied to particular implementation models. What
|
|
|
1053 |
is required is to abstract out what exactly needs to be done in a
|
|
|
1054 |
target dependent manner and to decide how best to separate it
|
|
|
1055 |
out. The most difficult problem is how to make the implementation
|
|
|
1056 |
of this API as simple as possible for the person performing the
|
|
|
1057 |
compilation, if necessary providing a number of alternative
|
|
|
1058 |
implementations to choose between and a simple method of making
|
|
|
1059 |
this choice (for example, the <code>config.h</code> file above).
|
|
|
1060 |
With the system-defined component the question is more likely to
|
|
|
1061 |
be, how do the various target machines I have in mind implement
|
|
|
1062 |
what I want to do? The abstraction of this is usually to choose a
|
|
|
1063 |
standard and widely implemented API, such as POSIX, which
|
|
|
1064 |
provides all the necessary functionality.</para>
|
|
|
1065 |
|
|
|
1066 |
<para>The choice of "standard" API is of course influenced by the
|
|
|
1067 |
type of target machines one has in mind. Within the Unix world,
|
|
|
1068 |
the increasing adoption of Open Standards, such as POSIX, means
|
|
|
1069 |
that choosing a standard API which is implemented on a wide
|
|
|
1070 |
variety Unix boxes is becoming easier. Similarly, choosing an API
|
|
|
1071 |
which will work on most MSDOS machines should cause few problems.
|
|
|
1072 |
The difficulty is that these are disjoint worlds; it is very
|
|
|
1073 |
difficult to find a standard API which is implemented on both
|
|
|
1074 |
Unix and MSDOS machines. At present not much can be done about
|
|
|
1075 |
this, it reflects the disjoint nature of the computer market.</para>
|
|
|
1076 |
|
|
|
1077 |
<para>To develop a similar point : the drawback of choosing POSIX
|
|
|
1078 |
(for example) as an API is that it restricts the range of
|
|
|
1079 |
possible target machines to machines which implement POSIX. Other
|
|
|
1080 |
machines, for example, BSD compliant machines, might offer the
|
|
|
1081 |
same functionality (albeit using different methods), so they
|
|
|
1082 |
should be potential target machines, but they have been excluded
|
|
|
1083 |
by the choice of API. One approach to the problem is the
|
|
|
1084 |
"alternative API" approach. Both the POSIX and the BSD variants
|
|
|
1085 |
are built into the program, but only one is selected on any given
|
|
|
1086 |
target machine by means of conditional compilation. Under our
|
|
|
1087 |
"equivalent functionality" definition of portability, this is a
|
|
|
1088 |
program which is portable to both POSIX and BSD compliant
|
|
|
1089 |
machines. But viewed in the light of the discussion above, if we
|
|
|
1090 |
regard a program as a program-API pair, it could be regarded as
|
|
|
1091 |
two separate programs combined on a single source code tree. A
|
|
|
1092 |
more interesting approach would be to try to abstract out what
|
|
|
1093 |
exactly the functionality which both POSIX and BSD offer is and
|
|
|
1094 |
use that as the API. Then instead of two separate APIs we would
|
|
|
1095 |
have a single API with two broad classes of implementations. The
|
|
|
1096 |
advantage of this latter approach becomes clear if wished to port
|
|
|
1097 |
the program to a machine which implements neither POSIX nor BSD,
|
|
|
1098 |
but provides the equivalent functionality in a third way.</para>
|
|
|
1099 |
|
|
|
1100 |
<para>As a simple example, both POSIX and BSD provide very similar
|
|
|
1101 |
methods for scanning the entries of a directory. The main
|
|
|
1102 |
difference is that the POSIX version is defined in
|
|
|
1103 |
<code>dirent.h</code> and uses a structure called <code>struct
|
|
|
1104 |
dirent</code>, whereas the BSD version is defined in
|
|
|
1105 |
<code>sys/dir.h</code> and calls the corresponding structure
|
|
|
1106 |
<code>struct direct</code>. The actual routines for manipulating
|
|
|
1107 |
directories are the same in both cases. So the only abstraction
|
|
|
1108 |
required to unify these two APIs is to introduce an abstract
|
|
|
1109 |
type, <code>dir_entry</code> say, which can be defined by:
|
|
|
1110 |
<programlisting>
|
|
|
1111 |
typedef struct dirent dir_entry ;
|
|
|
1112 |
</programlisting>
|
|
|
1113 |
on POSIX machines, and:
|
|
|
1114 |
<programlisting>
|
|
|
1115 |
typedef struct direct dir_entry ;
|
|
|
1116 |
</programlisting>
|
|
|
1117 |
on BSD machines. Note how this portion of the API crosses the
|
|
|
1118 |
system-user boundary. The object <code>dir_entry</code> is defined
|
|
|
1119 |
in terms of the objects in the system headers, but the precise
|
|
|
1120 |
definition depends on a user-defined value (whether the target
|
|
|
1121 |
machine implements POSIX or BSD).</para>
|
|
|
1122 |
</sect3>
|
|
|
1123 |
|
|
|
1124 |
<sect3>
|
|
|
1125 |
<title id="S21">2.3.4. Alternative Program Versions</title>
|
|
|
1126 |
<para>Another reason for introducing conditional
|
|
|
1127 |
compilation which relates to APIs is the desire to combine
|
|
|
1128 |
several programs, or versions of programs, on a single source
|
|
|
1129 |
tree. There are several cases to be distinguished between. The
|
|
|
1130 |
reuse of code between genuinely different programs does not
|
|
|
1131 |
really enter the argument : any given program will only use one
|
|
|
1132 |
route through the source tree, so there is no real conditional
|
|
|
1133 |
compilation per se in the program. What is more interesting is
|
|
|
1134 |
the use of conditional compilation to combine several versions of
|
|
|
1135 |
the same program on the same source tree to provide additional or
|
|
|
1136 |
alternative features.</para>
|
|
|
1137 |
|
|
|
1138 |
<para>It could be argued that the macros (or whatever) used to
|
|
|
1139 |
select between the various versions of the program are just part
|
|
|
1140 |
of the user-defined API as before. But consider a simple program
|
|
|
1141 |
which reads in some numerical input, say, processes it, and
|
|
|
1142 |
prints the results. This might, for example, have POSIX as its
|
|
|
1143 |
API. We may wish to optionally enhance this by displaying the
|
|
|
1144 |
results graphically rather than textually on machines which have
|
|
|
1145 |
X Windows, the compilation being conditional on some boolean
|
|
|
1146 |
value, <code>HAVE_X_WINDOWS</code>, say. What is the API of the
|
|
|
1147 |
resultant program? The answer from the point of view of the
|
|
|
1148 |
program is the union of POSIX, X Windows and the user-defined
|
|
|
1149 |
value <code>HAVE_X_WINDOWS</code>. But from the implementation
|
|
|
1150 |
point of view we can either implement POSIX and set
|
|
|
1151 |
<code>HAVE_X_WINDOWS</code> to false, or implement both POSIX and
|
|
|
1152 |
X Windows and set <code>HAVE_X_WINDOWS</code> to true. So what
|
|
|
1153 |
introducing <code>HAVE_X_WINDOWS</code> does is to allow
|
|
|
1154 |
flexibility in the API implementation.</para>
|
|
|
1155 |
|
|
|
1156 |
<para>This is very similar to the alternative APIs discussed above.
|
|
|
1157 |
However the approach outlined will really only work for optional
|
|
|
1158 |
API extensions. To work in the alternative API case, we would
|
|
|
1159 |
need to have the union of POSIX, BSD and a boolean value, say, as
|
|
|
1160 |
the API. Although this is possible in theory, it is likely to
|
|
|
1161 |
lead to namespace clashes between POSIX and BSD.</para>
|
|
|
1162 |
</sect3>
|
|
|
1163 |
</sect2>
|
|
|
1164 |
</sect1>
|
|
|
1165 |
|
|
|
1166 |
<appendix>
|
|
|
1167 |
<title>Appendix: Namespaces and APIs</title>
|
|
|
1168 |
<para>Namespace problems are
|
|
|
1169 |
amongst the most difficult faced by standard defining bodies (for
|
|
|
1170 |
example, the ANSI and POSIX committees) and they often go to
|
|
|
1171 |
great lengths to specify which names should, and should not,
|
|
|
1172 |
appear when certain headers are included. (The position is set
|
|
|
1173 |
out in D. F. Prosser, <i>Header and name space rules for UNIX
|
|
|
1174 |
systems</i> (private communication), USL, 1993.)</para>
|
|
|
1175 |
|
|
|
1176 |
<para>For example, the intention, certainly in ANSI, is that each
|
|
|
1177 |
header should operate as an independent sub-API. Thus
|
|
|
1178 |
<code>va_list</code> is prohibited from appearing in the
|
|
|
1179 |
namespace when <code>stdio.h</code> is included (it is defined
|
|
|
1180 |
only in <code>stdarg.h</code>) despite the fact that it appears
|
|
|
1181 |
in the prototype:
|
|
|
1182 |
<programlisting>
|
|
|
1183 |
int vprintf ( char *, va_list ) ;
|
|
|
1184 |
</programlisting>
|
|
|
1185 |
This seeming contradiction is worked round on most
|
|
|
1186 |
implementations by defining a type <code>__va_list</code> in <code>
|
|
|
1187 |
stdio.h</code> which has exactly the same definition as
|
|
|
1188 |
<code>va_list</code>, and declaring <code>vprintf</code> as:
|
|
|
1189 |
<programlisting>
|
|
|
1190 |
int vprintf ( char *, __va_list ) ;
|
|
|
1191 |
</programlisting>
|
|
|
1192 |
This is only legal because <code>__va_list</code> is deemed
|
|
|
1193 |
not to corrupt the namespace because of the convention that names
|
|
|
1194 |
beginning with <code>__</code> are reserved for implementation use.</para>
|
|
|
1195 |
|
|
|
1196 |
<para>This particular namespace convention is well-known, but there
|
|
|
1197 |
are others defined in these standards which are not generally
|
|
|
1198 |
known (and since no compiler I know tests them, not widely
|
|
|
1199 |
adhered to). For example, the ANSI header <code>errno.h</code>
|
|
|
1200 |
reserves all names given by the regular expression:
|
|
|
1201 |
<programlisting>
|
|
|
1202 |
E[0-9A-Z][0-9a-z_A-Z]+
|
|
|
1203 |
</programlisting>
|
|
|
1204 |
against macros (i.e. in all namespaces). By prohibiting the
|
|
|
1205 |
user from using names of this form, the intention is to protect
|
|
|
1206 |
against namespace clashes with extensions of the ANSI API which
|
|
|
1207 |
introduce new error numbers. It also protects against a particular
|
|
|
1208 |
implementation of these extensions - namely that new error numbers
|
|
|
1209 |
will be defined as macros.</para>
|
|
|
1210 |
|
|
|
1211 |
<para>A better example of protecting against particular
|
|
|
1212 |
implementations comes from POSIX. If <code>sys/stat.h</code> is
|
|
|
1213 |
included names of the form:
|
|
|
1214 |
<programlisting>
|
|
|
1215 |
st_[0-9a-z_A-Z]+
|
|
|
1216 |
</programlisting>
|
|
|
1217 |
are reserved against macros (as member names). The intention
|
|
|
1218 |
here is not only to reserve field selector names for future
|
|
|
1219 |
extensions to <code>struct stat</code> (which would only affect API
|
|
|
1220 |
implementors, not ordinary users), but also to reserve against the
|
|
|
1221 |
possibility that these field selectors might be implemented by
|
|
|
1222 |
macros. So our <code>st_atime</code> example in section 2.2.3 is
|
|
|
1223 |
strictly illegal because the procedure name <code>st_atime</code>
|
|
|
1224 |
lies in a restricted namespace. Indeed the namespace is restricted
|
|
|
1225 |
precisely to disallow this program.</para>
|
|
|
1226 |
|
|
|
1227 |
<para>As an exercise to the reader, how many of your programs use
|
|
|
1228 |
names from the following restricted namespaces (all drawn from
|
|
|
1229 |
ANSI, all applying to all namespaces)?
|
|
|
1230 |
<programlisting>
|
|
|
1231 |
is[a-z][0-9a-z_A-Z]+ (ctype.h)
|
|
|
1232 |
to[a-z][0-9a-z_A-Z]+ (ctype.h)
|
|
|
1233 |
str[a-z][0-9a-z_A-Z]+ (stdlib.h)
|
|
|
1234 |
</programlisting>
|
|
|
1235 |
With the TDF approach of describing APIs in abstract terms
|
|
|
1236 |
using the <code>#pragma token</code> syntax most of these namespace
|
|
|
1237 |
restrictions are seen to be superfluous. When a target independent
|
|
|
1238 |
header is included precisely the objects defined in that header in
|
|
|
1239 |
that version of the API appear in the namespace. There are no
|
|
|
1240 |
worries about what else might happen to be in the header, because
|
|
|
1241 |
there is nothing else. Also implementation details are separated
|
|
|
1242 |
off to the TDF library building, so possible namespace pollution
|
|
|
1243 |
through particular implementations does not arise.</para>
|
|
|
1244 |
|
|
|
1245 |
<para>Currently TDF does not have a neat way of solving the
|
|
|
1246 |
<code>va_list</code> problem. The present target independent
|
|
|
1247 |
headers use a similar workaround to that described above
|
|
|
1248 |
(exploiting a reserved namespace). (See the footnote in section
|
|
|
1249 |
3.4.1.1.)</para>
|
|
|
1250 |
|
|
|
1251 |
<para>None of this is intended as criticism of the ANSI or POSIX
|
|
|
1252 |
standards. It merely shows some of the problems that can arise
|
|
|
1253 |
from the insufficient separation of code.</para>
|
|
|
1254 |
</appendix>
|
|
|
1255 |
|
|
|
1256 |
<sect1>
|
|
|
1257 |
<title>3. TDF</title>
|
|
|
1258 |
<para>Having discussed many of the problems involved
|
|
|
1259 |
with writing portable programs, we now eventually turn to TDF.
|
|
|
1260 |
Firstly a brief technical overview is given, indicating those
|
|
|
1261 |
features of TDF which facilitate the separation of program.
|
|
|
1262 |
Secondly the TDF compilation scheme is described. It is shown how
|
|
|
1263 |
the features of TDF are exploited to aid in the separation of
|
|
|
1264 |
target independent and target dependent code which we have
|
|
|
1265 |
indicated as characterising portable programs. Finally, the
|
|
|
1266 |
various constituents of this scheme are considered individually,
|
|
|
1267 |
and their particular roles are described in more detail.</para>
|
|
|
1268 |
|
|
|
1269 |
<sect2 id="S23">
|
|
|
1270 |
<title>3.1. Features of TDF</title>
|
|
|
1271 |
<para>It is not the purpose of this paper
|
|
|
1272 |
to explain the exact specification of TDF - this is described
|
|
|
1273 |
elsewhere (see [6] and [4]) - but rather to show how its general
|
|
|
1274 |
design features make it suitable as an aid to writing portable
|
|
|
1275 |
programs.</para>
|
|
|
1276 |
|
|
|
1277 |
<para>TDF is an abstraction of high-level languages - it contains
|
|
|
1278 |
such things as <code>exps</code> (abstractions of expressions and
|
|
|
1279 |
statements), <code>shapes</code> (abstractions of types) and
|
|
|
1280 |
<code>tags</code> (abstractions of variable identifiers). In
|
|
|
1281 |
general form it is an abstract syntax tree which is flattened and
|
|
|
1282 |
encoded as a series of bits, called a <code>capsule</code>. This
|
|
|
1283 |
fairly high level of definition (for a compiler intermediate
|
|
|
1284 |
language) means that TDF is architecture neutral in the sense
|
|
|
1285 |
that it makes no assumptions about the underlying processor
|
|
|
1286 |
architecture.</para>
|
|
|
1287 |
|
|
|
1288 |
<para>The translation of a capsule to and from the corresponding
|
|
|
1289 |
syntax tree is totally unambiguous, also TDF has a "universal"
|
|
|
1290 |
semantic interpretation as defined in the TDF specification.</para>
|
|
|
1291 |
|
|
|
1292 |
<sect3>
|
|
|
1293 |
<title id="S24">3.1.1. Capsule Structure</title>
|
|
|
1294 |
<para>A TDF
|
|
|
1295 |
capsule consists of a number of units of various types. These are
|
|
|
1296 |
embedded in a general linkage scheme (see Fig. 2). Each unit
|
|
|
1297 |
contains a number of variable objects of various sorts (for
|
|
|
1298 |
example, tags and tokens) which are potentially visible to other
|
|
|
1299 |
units. Within the unit body each variable object is identified by
|
|
|
1300 |
a unique number. The linking is via a set of variable objects
|
|
|
1301 |
which are global to the entire capsule. These may in turn be
|
|
|
1302 |
associated with external names. For example, in Fig. 2, the
|
|
|
1303 |
fourth variable of the first unit is identified with the first
|
|
|
1304 |
variable of the third unit, and both are associated with the
|
|
|
1305 |
fourth external name.</para>
|
|
|
1306 |
|
|
|
1307 |
<para>FIGURE 2. TDF Capsule Structure</para>
|
|
|
1308 |
|
|
|
1309 |
<img src="../images/tdf_link.gif" />
|
|
|
1310 |
<para>
|
|
|
1311 |
This capsule structure means that the combination of a number of
|
|
|
1312 |
capsules to form a single capsule is a very natural operation.
|
|
|
1313 |
The actual units are copied unchanged into the resultant capsule
|
|
|
1314 |
- it is only the surrounding linking information that needs
|
|
|
1315 |
changing. Many criteria could be used to determine how this
|
|
|
1316 |
linking is to be organised, but the simplest is to link two
|
|
|
1317 |
objects if and only if they have the same external name. This is
|
|
|
1318 |
the scheme that the current TDF linker has implemented.
|
|
|
1319 |
Furthermore such operations as changing an external name or
|
|
|
1320 |
removing it altogether ("hiding") are very simple under this
|
|
|
1321 |
linking scheme.</para>
|
|
|
1322 |
</sect3>
|
|
|
1323 |
|
|
|
1324 |
<sect3 id="S25">
|
|
|
1325 |
<title>3.1.2. Tokens</title>
|
|
|
1326 |
<para>>So, the
|
|
|
1327 |
combination of program at this high level is straightforward. But
|
|
|
1328 |
TDF also provides another mechanism which allows for the
|
|
|
1329 |
combination of program at the syntax tree level, namely
|
|
|
1330 |
<code>tokens</code>. Virtually any node of the TDF tree may be a
|
|
|
1331 |
token : a place holder which stands for a subtree. Before the TDF
|
|
|
1332 |
can be decoded fully the definition of this token must be
|
|
|
1333 |
provided. The token definition is then macro substituted for the
|
|
|
1334 |
token in the decoding process to form the complete tree (see Fig.
|
|
|
1335 |
3).</para>
|
|
|
1336 |
|
|
|
1337 |
<para>FIGURE 3. TDF Tokens</para>
|
|
|
1338 |
|
|
|
1339 |
<img src="../images/token.gif" />
|
|
|
1340 |
<para>Tokens may also take arguments (see Fig. 4). The actual argument
|
|
|
1341 |
values (from the main tree) are substituted for the formal
|
|
|
1342 |
parameters in the token definition.</para>
|
|
|
1343 |
|
|
|
1344 |
<para>FIGURE 4. TDF Tokens (with Arguments)</para>
|
|
|
1345 |
|
|
|
1346 |
<img src="../images/token_args.gif" />
|
|
|
1347 |
<para>As mentioned above, tokens are one of the types of variable
|
|
|
1348 |
objects which are potentially visible to external units. This
|
|
|
1349 |
means that a token does not have to be defined in the same unit
|
|
|
1350 |
as it is used in. Nor do these units have originally to have come
|
|
|
1351 |
from the same capsule, provided they have been linked before they
|
|
|
1352 |
need to be fully decoded. Tokens therefore provide a mechanism
|
|
|
1353 |
for the low-level separation and combination of code.</para>
|
|
|
1354 |
</sect3>
|
|
|
1355 |
</sect2>
|
|
|
1356 |
|
|
|
1357 |
<sect2 id="S26">
|
|
|
1358 |
<title>3.2. TDF Compilation Phases</title>
|
|
|
1359 |
<para>We have seen how one of the
|
|
|
1360 |
great strengths of TDF is the fact that it facilitates the
|
|
|
1361 |
separation and combination of program. We now demonstrate how
|
|
|
1362 |
this is applied in the TDF compilation strategy. This section is
|
|
|
1363 |
designed only to give an outline of this scheme. The various
|
|
|
1364 |
constituent phases are discussed in more detail later.</para>
|
|
|
1365 |
|
|
|
1366 |
<para>Again we start with the simplest case, where the program
|
|
|
1367 |
contains no target dependent code. The strategy is illustrated in
|
|
|
1368 |
Fig. 5, which should be compared with the traditional compilation
|
|
|
1369 |
strategy shown in Fig. 1. The general layout of the diagrams is
|
|
|
1370 |
the same. The left halves of the diagrams refers to the program
|
|
|
1371 |
itself, and the right halves to the corresponding API. The top
|
|
|
1372 |
halves refer to machine independent material, and the bottom
|
|
|
1373 |
halves to what happens on each target machine. Thus, as before,
|
|
|
1374 |
the portable program appears in the top left of the diagram, and
|
|
|
1375 |
the corresponding API in the top right.</para>
|
|
|
1376 |
|
|
|
1377 |
<para>The first thing to note is that, whereas previously all the
|
|
|
1378 |
compilation took place on the target machines, here the
|
|
|
1379 |
compilation has been split into a target independent (C ->
|
|
|
1380 |
TDF) part, called <code>production</code>, and a target dependent
|
|
|
1381 |
(TDF -> target) part, called <code>installation</code> . One
|
|
|
1382 |
of the synonyms for TDF is ANDF, Architecture Neutral
|
|
|
1383 |
Distribution Format, and we require that the production is
|
|
|
1384 |
precisely that - architecture neutral - so that precisely the
|
|
|
1385 |
same TDF is installed on all the target machines.</para>
|
|
|
1386 |
|
|
|
1387 |
<para>This architecture neutrality necessitates a separation of
|
|
|
1388 |
code. For example, in the "Hello world" example discussed in
|
|
|
1389 |
sections 2.1.1 and 2.1.2, the API specifies that there shall be a
|
|
|
1390 |
type <code>FILE</code> and an object <code>stdout</code> of type
|
|
|
1391 |
<code>FILE *</code>, but the implementations of these may be
|
|
|
1392 |
different on all the target machines. Thus we need to be able to
|
|
|
1393 |
abstract out the code for <code>FILE</code> and
|
|
|
1394 |
<code>stdout</code> from the TDF output by the producer, and
|
|
|
1395 |
provide the appropriate (target dependent) definitions for these
|
|
|
1396 |
objects in the installation phase.</para>
|
|
|
1397 |
|
|
|
1398 |
<para>FIGURE 5. TDF Compilation Phases</para>
|
|
|
1399 |
|
|
|
1400 |
<img src="../images/tdf_scheme.gif" />
|
|
|
1401 |
|
|
|
1402 |
<sect3 id="S27">
|
|
|
1403 |
<title>3.2.1. API Description (Top Right)</title>
|
|
|
1404 |
<para>The method used for this separation is the token
|
|
|
1405 |
mechanism. Firstly the syntactic element of the API is described
|
|
|
1406 |
in the form of a set of target independent headers. Whereas the
|
|
|
1407 |
target dependent, system headers contain the actual
|
|
|
1408 |
implementation of the API on a particular machine, the target
|
|
|
1409 |
independent headers express to the producer what is actually in
|
|
|
1410 |
the API, and which may therefore be assumed to be common to all
|
|
|
1411 |
compliant target machines. For example, in the target independent
|
|
|
1412 |
headers for the ANSI standard, there will be a file
|
|
|
1413 |
<code>stdio.h</code> containing the lines:
|
|
|
1414 |
<programlisting>
|
|
|
1415 |
#pragma token TYPE FILE # ansi.stdio.FILE
|
|
|
1416 |
#pragma token EXP rvalue : FILE * : stdout # ansi.stdio.stdout
|
|
|
1417 |
#pragma token FUNC int ( const char *, FILE * ) : fputs # ansi.stdio.fputs
|
|
|
1418 |
</programlisting>
|
|
|
1419 |
These <code>#pragma token</code> directives are extensions to
|
|
|
1420 |
the C syntax which enable the expression of abstract syntax
|
|
|
1421 |
information to the producer. The directives above tell the producer
|
|
|
1422 |
that there exists a type called <code>FILE</code>, an expression
|
|
|
1423 |
<code>stdout</code> which is an rvalue (that is, a non-assignable
|
|
|
1424 |
value) of type <code>FILE *</code>, and a procedure
|
|
|
1425 |
<code>fputs</code> with prototype:
|
|
|
1426 |
<programlisting>
|
|
|
1427 |
int fputs ( const char *, FILE * ) ;
|
|
|
1428 |
</programlisting>
|
|
|
1429 |
and that it should leave their values unresolved by means of
|
|
|
1430 |
tokens (for more details on the <code>#pragma token</code>
|
|
|
1431 |
directive see [3]). Note how the information in the target
|
|
|
1432 |
independent header precisely reflects the syntactic information in
|
|
|
1433 |
the ANSI API.</para>
|
|
|
1434 |
|
|
|
1435 |
<para>The names <code>ansi.stdio.FILE</code> etc. give the external
|
|
|
1436 |
names for these tokens, those which will be visible at the
|
|
|
1437 |
outermost layer of the capsule; they are intended to be unique
|
|
|
1438 |
(this is discussed below). It is worth making the distinction
|
|
|
1439 |
between the internal names and these external token names. The
|
|
|
1440 |
former are the names used to represent the objects within C, and
|
|
|
1441 |
the latter the names used within TDF to represent the tokens
|
|
|
1442 |
corresponding to these objects.</para>
|
|
|
1443 |
</sect3>
|
|
|
1444 |
|
|
|
1445 |
<sect3 id="S28">
|
|
|
1446 |
<title>3.2.2. Production (Top Left)</title>
|
|
|
1447 |
<para>Now the producer can compile the program using
|
|
|
1448 |
these target independent headers. As will be seen from the "Hello
|
|
|
1449 |
world" example, these headers contain sufficient information to
|
|
|
1450 |
check that the program is syntactically correct. The produced,
|
|
|
1451 |
target independent, TDF will contain tokens corresponding to the
|
|
|
1452 |
various uses of <code>stdout</code>, <code>fputs</code> and so
|
|
|
1453 |
on, but these tokens will be left undefined. In fact there will
|
|
|
1454 |
be other undefined tokens in the TDF. The basic C types,
|
|
|
1455 |
<code>int</code> and <code>char</code> are used in the program,
|
|
|
1456 |
and their implementations may vary between target machines. Thus
|
|
|
1457 |
these types must also be represented by tokens. However these
|
|
|
1458 |
tokens are implicit in the producer rather than explicit in the
|
|
|
1459 |
target independent headers.</para>
|
|
|
1460 |
|
|
|
1461 |
<para>Note also that because the information in the target
|
|
|
1462 |
independent headers describes abstractly the contents of the API
|
|
|
1463 |
and not some particular implementation of it, the producer is in
|
|
|
1464 |
effect checking the program against the API itself.</para>
|
|
|
1465 |
</sect3>
|
|
|
1466 |
|
|
|
1467 |
<sect3 id="S29">
|
|
|
1468 |
<title>3.2.3. API Implementation (Bottom Right)</title>
|
|
|
1469 |
<para>Before the TDF output by the producer can be
|
|
|
1470 |
decoded fully it needs to have had the definitions of the tokens
|
|
|
1471 |
it has left undefined provided. These definitions will be
|
|
|
1472 |
potentially different on all target machines and reflect the
|
|
|
1473 |
implementation of the API on that machine.</para>
|
|
|
1474 |
|
|
|
1475 |
<para>The syntactic details of the implementation are to be found in
|
|
|
1476 |
the system headers. The process of defining the tokens describing
|
|
|
1477 |
the API (called TDF library building) consists of comparing the
|
|
|
1478 |
implementation of the API as given in the system headers with the
|
|
|
1479 |
abstract description of the tokens comprising the API given in
|
|
|
1480 |
the target independent headers. The token definitions thus
|
|
|
1481 |
produced are stored as TDF libraries, which are just archives of
|
|
|
1482 |
TDF capsules.</para>
|
|
|
1483 |
|
|
|
1484 |
<para>For example, in the example implementation of
|
|
|
1485 |
<code>stdio.h</code> given in section 2.1.2, the token
|
|
|
1486 |
<code>ansi.stdio.FILE</code> will be defined as the TDF compound
|
|
|
1487 |
shape corresponding to the structure defining the type
|
|
|
1488 |
<code>FILE</code> (recall the distinction between internal and
|
|
|
1489 |
external names). <code>__iob</code> will be an undefined tag
|
|
|
1490 |
whose shape is an array of 60 copies of the shape given by the
|
|
|
1491 |
token <code>ansi.stdio.FILE</code>, and the token
|
|
|
1492 |
<code>ansi.stdio.stdout</code> will be defined to be the TDF
|
|
|
1493 |
expression corresponding to a pointer to the second element of
|
|
|
1494 |
this array. Finally the token <code>ansi.stdio.fputs</code> is
|
|
|
1495 |
defined to be the effect of applying the procedure given by the
|
|
|
1496 |
undefined tag <code>fputs</code>. (In fact, this picture has been
|
|
|
1497 |
slightly simplified for the sake of clarity. See the section on C
|
|
|
1498 |
-> TDF mappings in section 3.3.2.)</para>
|
|
|
1499 |
|
|
|
1500 |
<para>These token definitions are created using exactly the same C
|
|
|
1501 |
-> TDF translation program as is used in the producer phase.
|
|
|
1502 |
This program knows nothing about the distinction between target
|
|
|
1503 |
independent and target dependent TDF, it merely translates the C
|
|
|
1504 |
it is given (whether from a program or a system header) into TDF.
|
|
|
1505 |
It is the compilation process itself which enables the separation
|
|
|
1506 |
of target independent and target dependent TDF.</para>
|
|
|
1507 |
|
|
|
1508 |
<para>In addition to the tokens made explicit in the API, the
|
|
|
1509 |
implicit tokens built into the producer must also have their
|
|
|
1510 |
definitions inserted into the TDF libraries. The method of
|
|
|
1511 |
definition of these tokens is slightly different. The definitions
|
|
|
1512 |
are automatically deduced by, for example, looking in the target
|
|
|
1513 |
machine's <code>limits.h</code> header to find the local values
|
|
|
1514 |
of <code>CHAR_MIN</code> and <code>CHAR_MAX</code> , and deducing
|
|
|
1515 |
the definition of the token corresponding to the C type
|
|
|
1516 |
<code>char</code> from this. It will be the <code>variety</code>
|
|
|
1517 |
(the TDF abstraction of integer types) consisting of all integers
|
|
|
1518 |
between these values.</para>
|
|
|
1519 |
|
|
|
1520 |
<para>Note that what we are doing in the main library build is
|
|
|
1521 |
checking the actual implementation of the API against the
|
|
|
1522 |
abstract syntactic description. Any variations of the syntactic
|
|
|
1523 |
aspects of the implementation from the API will therefore show
|
|
|
1524 |
up. Thus library building is an effective way of checking the
|
|
|
1525 |
syntactic conformance of a system to an API. Checking the
|
|
|
1526 |
semantic conformance is far more difficult - we shall return to
|
|
|
1527 |
this issue later.</para>
|
|
|
1528 |
</sect3>
|
|
|
1529 |
|
|
|
1530 |
<sect3 id="S30">
|
|
|
1531 |
<title>3.2.4. Installation (Bottom Left)</title>
|
|
|
1532 |
<para>The installation phase is now straightforward. The
|
|
|
1533 |
target independent TDF representing the program contains various
|
|
|
1534 |
undefined tokens (corresponding to objects in the API), and the
|
|
|
1535 |
definitions for these tokens on the particular target machine
|
|
|
1536 |
(reflecting the API implementation) are to be found in the local
|
|
|
1537 |
TDF libraries. It is a natural matter to link these to form a
|
|
|
1538 |
complete, target dependent, TDF capsule. The rest of the
|
|
|
1539 |
installation consists of a straightforward translation phase (TDF
|
|
|
1540 |
-> target) to produce a binary object file, and linking with
|
|
|
1541 |
the system libraries to form a final executable. Linking with the
|
|
|
1542 |
system libraries will resolve any tags left undefined in the TDF.</para>
|
|
|
1543 |
</sect3>
|
|
|
1544 |
|
|
|
1545 |
<sect3 id="S31">
|
|
|
1546 |
<title>3.2.5. Illustrated Example</title>
|
|
|
1547 |
<para>In
|
|
|
1548 |
order to help clarify exactly what is happening where, Fig. 6
|
|
|
1549 |
shows a simple example superimposed on the TDF compilation
|
|
|
1550 |
diagram.</para>
|
|
|
1551 |
|
|
|
1552 |
<para>FIGURE 6. Example Compilation</para>
|
|
|
1553 |
|
|
|
1554 |
<img src="../images/eg_scheme.gif" />
|
|
|
1555 |
<para>The program to be translated is simply:
|
|
|
1556 |
<programlisting>
|
|
|
1557 |
FILE f ;
|
|
|
1558 |
</programlisting>
|
|
|
1559 |
and the API is as above, so that <code>FILE</code> is an
|
|
|
1560 |
abstract type. This API is described as target independent headers
|
|
|
1561 |
containing the <code>#pragma token</code> statements given above.
|
|
|
1562 |
The producer combines the program with the target independent
|
|
|
1563 |
headers to produce a target independent capsule which declares a
|
|
|
1564 |
tag <code>f</code> whose shape is given by the token representing
|
|
|
1565 |
<code>FILE</code>, but leaves this token undefined. In the API
|
|
|
1566 |
implementation, the local definition of the type <code>FILE</code>
|
|
|
1567 |
from the system headers is translated into the definition of this
|
|
|
1568 |
token by the library building process. Finally in the installation,
|
|
|
1569 |
the target independent capsule is combined with the local token
|
|
|
1570 |
definition library to form a target dependent capsule in which all
|
|
|
1571 |
the tokens used are also defined. This is then installed further as
|
|
|
1572 |
described above.</para>
|
|
|
1573 |
</sect3>
|
|
|
1574 |
</sect2>
|
|
|
1575 |
|
|
|
1576 |
<sect2 id="S32">
|
|
|
1577 |
<title>3.3. Aspects of the TDF System</title>Let us now consider in
|
|
|
1578 |
more detail some of the components of the TDF system and how they
|
|
|
1579 |
fit into the compilation scheme.
|
|
|
1580 |
|
|
|
1581 |
<sect3 id="S33">
|
|
|
1582 |
<title>3.3.1. The C to TDF Producer</title>
|
|
|
1583 |
<para>Above it was emphasised how the design of the
|
|
|
1584 |
compilation strategy aids the representation of program in a
|
|
|
1585 |
target independent manner, but this is not enough in itself. The
|
|
|
1586 |
C -> TDF producer must represent everything symbolically; it
|
|
|
1587 |
cannot make assumptions about the target machine. For example,
|
|
|
1588 |
the line of C containing the initialisation:
|
|
|
1589 |
<programlisting>
|
|
|
1590 |
int a = 1 + 1 ;
|
|
|
1591 |
</programlisting>
|
|
|
1592 |
is translated into TDF representing precisely that, 1 + 1,
|
|
|
1593 |
not 2, because it does not know the representation of
|
|
|
1594 |
<code>int</code> on the target machine. The installer does know
|
|
|
1595 |
this, and so is able to replace 1 + 1 by 2 (provided this is
|
|
|
1596 |
actually true).</para>
|
|
|
1597 |
|
|
|
1598 |
<para>As another example, in the structure:
|
|
|
1599 |
<programlisting>
|
|
|
1600 |
struct tag {
|
|
|
1601 |
int a ;
|
|
|
1602 |
double b ;
|
|
|
1603 |
} ;
|
|
|
1604 |
</programlisting>
|
|
|
1605 |
the producer does not know the actual value in bits of the
|
|
|
1606 |
offset of the second field from the start of the structure - it
|
|
|
1607 |
depends on the sizes of <code>int</code> and <code>double</code>
|
|
|
1608 |
and the alignment rules on the target machine. Instead it
|
|
|
1609 |
represents it symbolically (it is the size of <code>int</code>
|
|
|
1610 |
rounded up to a multiple of the alignment of <code>double</code>).
|
|
|
1611 |
This level of abstraction makes the tokenisation required by the
|
|
|
1612 |
target independent API headers very natural. If we only knew that
|
|
|
1613 |
there existed a structure <code>struct tag</code> with a field
|
|
|
1614 |
<code>b</code> of type <code>double</code> then it is perfectly
|
|
|
1615 |
simple to use a token to represent the (unknown) offset of this
|
|
|
1616 |
field from the start of the structure rather than using the
|
|
|
1617 |
calculated (known) value. Similarly, when it comes to defining this
|
|
|
1618 |
token in the library building phase (recall that this is done by
|
|
|
1619 |
the same C -> TDF translation program as the production) it is a
|
|
|
1620 |
simple matter to define the token to be the calculated value.</para>
|
|
|
1621 |
|
|
|
1622 |
<para>Furthermore, because all the producer's operations are
|
|
|
1623 |
performed at this very abstract level, it is a simple matter to
|
|
|
1624 |
put in extra portability checks. For example, it would be a
|
|
|
1625 |
relatively simple task to put most of the functionality of
|
|
|
1626 |
<code>lint</code> (excluding intermodular checking) or
|
|
|
1627 |
<code>gcc</code>'s <b>-Wall</b> option into the producer, and
|
|
|
1628 |
moreover have these checks applied to an abstract machine rather
|
|
|
1629 |
than a particular target machine. Indeed a number of these checks
|
|
|
1630 |
have already been implemented.</para>
|
|
|
1631 |
|
|
|
1632 |
<para>These extra checks are switched on and off by using
|
|
|
1633 |
<code>#pragma</code> statements. (For more details on the
|
|
|
1634 |
<code>#pragma</code> syntax and which portability checks are
|
|
|
1635 |
currently supported by the producer see [3].) For example, ANSI C
|
|
|
1636 |
states that any undeclared function is assumed to return
|
|
|
1637 |
<code>int</code>, whereas for strict portability checking it is
|
|
|
1638 |
more useful to have undeclared functions marked as an error
|
|
|
1639 |
(indeed for strict API checking this is essential). This is done
|
|
|
1640 |
by inserting the line:
|
|
|
1641 |
<programlisting>
|
|
|
1642 |
#pragma no implicit definitions
|
|
|
1643 |
</programlisting>
|
|
|
1644 |
either at the start of each file to be checked or, more
|
|
|
1645 |
simply, in a start-up file - a file which can be
|
|
|
1646 |
<code>#include</code>'d at the start of each source file by means
|
|
|
1647 |
of a command line option.</para>
|
|
|
1648 |
|
|
|
1649 |
<para>Because these checks can be turned off as well as on it is
|
|
|
1650 |
possible to relax as well as strengthen portability checking.
|
|
|
1651 |
Thus if a program is only intended to work on 32-bit machines, it
|
|
|
1652 |
is possible to switch off certain portability checks. The whole
|
|
|
1653 |
ethos underlying the producer is that these portability
|
|
|
1654 |
assumptions should be made explicit, so that the appropriate
|
|
|
1655 |
level of checking can be done.</para>
|
|
|
1656 |
|
|
|
1657 |
<para>As has been previously mentioned, the use of a single
|
|
|
1658 |
front-end to any compiler not only virtually eliminates the
|
|
|
1659 |
problems of differing code interpretation and compiler quirks,
|
|
|
1660 |
but also reduces the exposure to compiler bugs. Of course, this
|
|
|
1661 |
also applies to the TDF compiler, which has a single front-end
|
|
|
1662 |
(the producer) and multiple back-ends (the installers). As
|
|
|
1663 |
regards the syntax and semantics of the C language, the producer
|
|
|
1664 |
is by default a strictly ANSI C compliant compiler. (Addition to
|
|
|
1665 |
the October 1993 revision : Alas, this is no longer true; however
|
|
|
1666 |
strict ANSI can be specified by means of a simple command line
|
|
|
1667 |
option (see [1]). The decision whether to make the default strict
|
|
|
1668 |
and allow people to relax it, or to make the default lenient and
|
|
|
1669 |
allow people to strengthen it, is essentially a political one. It
|
|
|
1670 |
does not really matter in technical terms provided the user is
|
|
|
1671 |
made aware of exactly what each compilation mode means in terms
|
|
|
1672 |
of syntax, semantics and portability checking.) However it is
|
|
|
1673 |
possible to change its behaviour (again by means of
|
|
|
1674 |
<code>#pragma</code> statements) to implement many of the
|
|
|
1675 |
features found in "traditional" or "K&R" C. Hence it is
|
|
|
1676 |
possible to precisely determine how the producer will interpret
|
|
|
1677 |
the C code it is given by explicitly describing the C dialect it
|
|
|
1678 |
is written in in terms of these <code>#pragma</code>
|
|
|
1679 |
statements.</para>
|
|
|
1680 |
</sect3>
|
|
|
1681 |
|
|
|
1682 |
<sect3 id="S34">
|
|
|
1683 |
<title>3.3.2. C to TDF Mappings</title>
|
|
|
1684 |
<para>The
|
|
|
1685 |
nature of the C -> TDF transformation implemented by the
|
|
|
1686 |
producer is worth considering, although not all the features
|
|
|
1687 |
described in this section are fully implemented in the current
|
|
|
1688 |
(October 1993) producer. Although it is only indirectly related
|
|
|
1689 |
to questions of portability, this mapping does illustrate some of
|
|
|
1690 |
the problems the producer has in trying to represent program in
|
|
|
1691 |
an architecture neutral manner.</para>
|
|
|
1692 |
|
|
|
1693 |
<para>Once the initial difficulty of overcoming the syntactic and
|
|
|
1694 |
semantic differences between the various C dialects is overcome,
|
|
|
1695 |
the C -> TDF mapping is quite straightforward. In a hierarchy
|
|
|
1696 |
from high level to low level languages C and TDF are not that
|
|
|
1697 |
dissimilar - both come towards the bottom of what may
|
|
|
1698 |
legitimately be regarded as high level languages. Thus the
|
|
|
1699 |
constructs in C map easily onto the constructs of TDF (there are
|
|
|
1700 |
a few exceptions, for example coercing integers to pointers,
|
|
|
1701 |
which are discussed in [3]). Eccentricities of the C language
|
|
|
1702 |
specification such as doing all integer arithmetic in the
|
|
|
1703 |
promoted integer type are translated explicitly into TDF. So to
|
|
|
1704 |
add two <code>char</code>'s, they are promoted to
|
|
|
1705 |
<code>int</code>'s, added together as <code>int</code>'s, and the
|
|
|
1706 |
result is converted back to a <code>char</code>. These rules are
|
|
|
1707 |
not built directly into TDF because of the desire to support
|
|
|
1708 |
languages other than C (and even other C dialects).</para>
|
|
|
1709 |
|
|
|
1710 |
<para>A number of issues arise when tokens are introduced. Consider
|
|
|
1711 |
for example the type <code>size_t</code> from the ANSI standard.
|
|
|
1712 |
This is a target dependent integer type, so bearing in mind what
|
|
|
1713 |
was said above it is natural for the producer to use a tokenised
|
|
|
1714 |
variety (the TDF representation of integer types) to stand for
|
|
|
1715 |
<code>size_t</code>. This is done by a <code>#pragma token</code>
|
|
|
1716 |
statement of the form:</para>
|
|
|
1717 |
<programlisting>
|
|
|
1718 |
#pragma token VARIETY size_t # ansi.stddef.size_t
|
|
|
1719 |
</programlisting>But if we want to do arithmetic on <code>size_t</code>'s we
|
|
|
1720 |
need to know the integer type corresponding to the integral
|
|
|
1721 |
promotion of <code>size_t</code> . But this is again target
|
|
|
1722 |
dependent, so it makes sense to have another tokenised variety
|
|
|
1723 |
representing the integral promotion of <code>size_t</code>. Thus
|
|
|
1724 |
the simple token directive above maps to (at least) two TDF tokens,
|
|
|
1725 |
the type itself and its integral promotion.
|
|
|
1726 |
|
|
|
1727 |
<para>As another example, suppose that we have a target dependent C
|
|
|
1728 |
type, <code>type</code> say, and we define a procedure which
|
|
|
1729 |
takes an argument of type <code>type</code>. In both the
|
|
|
1730 |
procedure body and at any call of the procedure the TDF we need
|
|
|
1731 |
to produce to describe how C passes this argument will depend on
|
|
|
1732 |
<code>type</code>. This is because C does not treat all procedure
|
|
|
1733 |
argument types uniformly. Most types are passed by value, but
|
|
|
1734 |
array types are passed by address. But whether or not
|
|
|
1735 |
<code>type</code> is an array type is target dependent, so we
|
|
|
1736 |
need to use tokens to abstract out the argument passing
|
|
|
1737 |
mechanism. For example, we could implement the mechanism using
|
|
|
1738 |
four tokens : one for the type <code>type</code> (which will be a
|
|
|
1739 |
tokenised shape), one for the type an argument of type
|
|
|
1740 |
<code>type</code> is passed as, <code>arg_type</code> say, (which
|
|
|
1741 |
will be another tokenised shape), and two for converting values
|
|
|
1742 |
of type <code>type</code> to and from the corresponding values of
|
|
|
1743 |
type <code>arg_type</code> (these will be tokens which take one
|
|
|
1744 |
exp argument and give an exp). For most types,
|
|
|
1745 |
<code>arg_type</code> will be the same as <code>type</code> and
|
|
|
1746 |
the conversion tokens will be identities, but for array types,
|
|
|
1747 |
<code>arg_type</code> will be a pointer to <code>type</code> and
|
|
|
1748 |
the conversion tokens will be "address of" and "contents of".</para>
|
|
|
1749 |
|
|
|
1750 |
<para>So there is not the simple one to one correspondence between
|
|
|
1751 |
<code>#pragma token</code> directives and TDF tokens one might
|
|
|
1752 |
expect. Each such directive maps onto a family of TDF tokens, and
|
|
|
1753 |
this mapping in a sense encapsulates the C language
|
|
|
1754 |
specification. Of course in the TDF library building process the
|
|
|
1755 |
definitions of all these tokens are deduced automatically from
|
|
|
1756 |
the local values.</para>
|
|
|
1757 |
</sect3>
|
|
|
1758 |
|
|
|
1759 |
<sect3 id="S35">
|
|
|
1760 |
<title>3.3.3. TDF Linking</title>
|
|
|
1761 |
<para>We now move
|
|
|
1762 |
from considering the components of the producer to those of the
|
|
|
1763 |
installer. The first phase of the installation - linking in the
|
|
|
1764 |
TDF libraries containing the token definitions describing the
|
|
|
1765 |
local implementation of the API - is performed by a general
|
|
|
1766 |
utility program, the TDF linker (or builder). This is a very
|
|
|
1767 |
simple program which is used to combine a number of TDF capsules
|
|
|
1768 |
and libraries into a single capsule. As has been emphasised
|
|
|
1769 |
previously, the capsule structure means that this is a very
|
|
|
1770 |
natural operation, but, as will be seen from the previous
|
|
|
1771 |
discussion (particularly section 2.2.3), such combinatorial
|
|
|
1772 |
phases are very prone to namespace problems.</para>
|
|
|
1773 |
|
|
|
1774 |
<para>In TDF tags, tokens and other externally named objects occupy
|
|
|
1775 |
separate namespaces, and there are no constructs which can cut
|
|
|
1776 |
across these namespaces in the way that the C macros do. There
|
|
|
1777 |
still remains the problem that the only way to know that two
|
|
|
1778 |
tokens, say, in different capsules are actually the same is if
|
|
|
1779 |
they have the same name. This, as we have already seen in the
|
|
|
1780 |
case of system linking, can cause objects to be identified
|
|
|
1781 |
wrongly.</para>
|
|
|
1782 |
|
|
|
1783 |
<para>In the main TDF linking phase - linking in the token
|
|
|
1784 |
definitions at the start of the installation - we are primarily
|
|
|
1785 |
linking on token names, these tokens being those arising from the
|
|
|
1786 |
use of the target independent headers. Potential namespace
|
|
|
1787 |
problems are virtually eliminated by the use of unique external
|
|
|
1788 |
names for the tokens in these headers (such as
|
|
|
1789 |
<code>ansi.stdio.FILE</code> in the example above). This means
|
|
|
1790 |
that there is a genuine one to one correspondence between tokens
|
|
|
1791 |
and token names. Of course this relies on the external token
|
|
|
1792 |
names given in the headers being genuinely unique. In fact, as is
|
|
|
1793 |
explained below, these names are normally automatically
|
|
|
1794 |
generated, and uniqueness of names within a given API is checked.
|
|
|
1795 |
Also incorporating the API name into the token name helps to
|
|
|
1796 |
ensure uniqueness across APIs. However the token namespace does
|
|
|
1797 |
require careful management. (Note that the user does not normally
|
|
|
1798 |
have access to the token namespace; all variable and procedure
|
|
|
1799 |
names map into the tag namespace.)</para>
|
|
|
1800 |
|
|
|
1801 |
<para>We can illustrate the "clean" nature of TDF linking by
|
|
|
1802 |
considering the <code>st_atime</code> example given in section
|
|
|
1803 |
2.2.3. Recall that in the traditional compilation scheme the
|
|
|
1804 |
problem arose, not because of the program or the API
|
|
|
1805 |
implementation, but because of the way they were combined by the
|
|
|
1806 |
pre-processor. In the TDF scheme the target independent version
|
|
|
1807 |
of <code>sys/stat.h</code> will be included. Thus the procedure
|
|
|
1808 |
name <code>st_atime</code> and the field selector
|
|
|
1809 |
<code>st_atime</code> will be seen to belong to genuinely
|
|
|
1810 |
different namespaces - there are no macros to disrupt this. The
|
|
|
1811 |
former will be translated into a TDF tag with external name
|
|
|
1812 |
<code>st_atime</code>, whereas the latter is translated into a
|
|
|
1813 |
token with external name
|
|
|
1814 |
<code>posix.stat.struct_stat.st_atime</code> , say. In the TDF
|
|
|
1815 |
library reflecting the API implementation, the token
|
|
|
1816 |
<code>posix.stat.struct_stat.st_atime</code> will be defined
|
|
|
1817 |
precisely as the system header intended, as the offset
|
|
|
1818 |
corresponding to the C field selector
|
|
|
1819 |
<code>st_atim.st__sec</code>. The fact that this token is defined
|
|
|
1820 |
using a macro rather than a conventional direct field selector is
|
|
|
1821 |
not important to the library building process. Now the
|
|
|
1822 |
combination of the program with the API implementation in this
|
|
|
1823 |
case is straightforward - not only are the procedure name and the
|
|
|
1824 |
field selector name in the TDF now different, but they also lie
|
|
|
1825 |
in distinct namespaces. This shows how the separation of the API
|
|
|
1826 |
implementation from the main program is cleaner in the TDF
|
|
|
1827 |
compilation scheme than in the traditional scheme.</para>
|
|
|
1828 |
|
|
|
1829 |
<para>TDF linking also opens up new ways of combining code which may
|
|
|
1830 |
solve some other namespace problems. For example, in the
|
|
|
1831 |
<code>open</code> example in section 2.2.3, the name
|
|
|
1832 |
<code>open</code> is meant to be internal to the program. It is
|
|
|
1833 |
the fact that it is not treated as such which leads to the
|
|
|
1834 |
problem. If the program consisted of a single source file then we
|
|
|
1835 |
could make <code>open</code> a <code>static</code> procedure, so
|
|
|
1836 |
that its name does not appear in the external namespace. But if
|
|
|
1837 |
the program consists of several source files the external name is
|
|
|
1838 |
necessary for intra-program linking. The TDF linker allows this
|
|
|
1839 |
intra-program linking to be separated from the main system
|
|
|
1840 |
linking. In the TDF compilation scheme described above each
|
|
|
1841 |
source file is translated into a separate TDF capsule, which is
|
|
|
1842 |
installed separately to a binary object file. It is only the
|
|
|
1843 |
system linking which finally combines the various components into
|
|
|
1844 |
a single program. An alternative scheme would be to use the TDF
|
|
|
1845 |
linker to combine all the TDF capsules into a single capsule in
|
|
|
1846 |
the production phase and install that. Because all the
|
|
|
1847 |
intra-program linking has already taken place, the external names
|
|
|
1848 |
required for it can be "hidden" - that is to say, removed from
|
|
|
1849 |
the tag namespace. Only tag names which are used but not defined
|
|
|
1850 |
(and so are not internal to the program) and <code>main</code>
|
|
|
1851 |
should not be hidden. In effect this linking phase has made all
|
|
|
1852 |
the internal names in the program (except <code>main</code>)
|
|
|
1853 |
<code>static</code>.</para>
|
|
|
1854 |
|
|
|
1855 |
<para>In fact this type of complete program linking is not always
|
|
|
1856 |
feasible. For very large programs the resulting TDF capsule can
|
|
|
1857 |
to be too large for the installer to cope with (it is the system
|
|
|
1858 |
assembler which tends to cause the most problems). Instead it may
|
|
|
1859 |
be better to use a more judiciously chosen partial linking and
|
|
|
1860 |
hiding scheme.</para>
|
|
|
1861 |
</sect3>
|
|
|
1862 |
|
|
|
1863 |
<sect3 id="S36">
|
|
|
1864 |
<title>3.3.4. The TDF Installers</title>
|
|
|
1865 |
<para>>The
|
|
|
1866 |
TDF installer on a given machine typically consists of four
|
|
|
1867 |
phases: TDF linking, which has already been discussed,
|
|
|
1868 |
translating TDF to assembly source code, translating assembly
|
|
|
1869 |
source code to a binary object file, and linking binary object
|
|
|
1870 |
files with the system libraries to form the final executable. The
|
|
|
1871 |
latter two phases are currently implemented by the system
|
|
|
1872 |
assembler and linker, and so are identical to the traditional
|
|
|
1873 |
compilation scheme.</para>
|
|
|
1874 |
|
|
|
1875 |
<para>It is the TDF to assembly code translator which is the main
|
|
|
1876 |
part of the installer. Although not strictly related to the
|
|
|
1877 |
question of portability, the nature of the translator is worth
|
|
|
1878 |
considering. Like the producer (and the assembler), it is a
|
|
|
1879 |
transformational, as opposed to a combinatorial, compilation
|
|
|
1880 |
phase. But whereas the transformation from C to TDF is
|
|
|
1881 |
"difficult" because of the syntax and semantics of C and the need
|
|
|
1882 |
to represent everything in an architecture neutral manner, the
|
|
|
1883 |
transformation from TDF to assembly code is much easier because
|
|
|
1884 |
of the unambiguous syntax and uniform semantics of TDF, and
|
|
|
1885 |
because now we know the details of the target machine, it is no
|
|
|
1886 |
longer necessary to work at such an abstract level.</para>
|
|
|
1887 |
|
|
|
1888 |
<para>The whole construction of the current generation of TDF
|
|
|
1889 |
translators is based on the concept of compilation as
|
|
|
1890 |
transformation. They represent the TDF they read in as a syntax
|
|
|
1891 |
tree, virtually identical to the syntax tree comprising the TDF.
|
|
|
1892 |
The translation process then consists of continually applying
|
|
|
1893 |
transformations to this tree - in effect TDF -> TDF
|
|
|
1894 |
transformations - gradually optimising it and changing it to a
|
|
|
1895 |
form where the translation into assembly source code is a simple
|
|
|
1896 |
transcription process (see [7]).</para>
|
|
|
1897 |
|
|
|
1898 |
<para>Even such operations as constant evaluation - replacing 1 + 1
|
|
|
1899 |
by 2 in the example above - may be regarded as TDF -> TDF
|
|
|
1900 |
transformations. But so may more complex optimisations such as
|
|
|
1901 |
taking constants out of a loop, common sub-expression
|
|
|
1902 |
elimination, strength reduction and so on. Some of these
|
|
|
1903 |
transformations are universally applicable, others can only be
|
|
|
1904 |
applied on certain classes of machines. This transformational
|
|
|
1905 |
approach results in high quality code generation (see [5]) while
|
|
|
1906 |
minimising the risk of transformational errors. Moreover the
|
|
|
1907 |
sharing of so much code - up to 70% - between all the TDF
|
|
|
1908 |
translators, like the introduction of a common front-end, further
|
|
|
1909 |
reduces the exposure to compiler bugs.</para>
|
|
|
1910 |
|
|
|
1911 |
<para>Much of the machine ABI information is built into the
|
|
|
1912 |
translator in a very simple way. For example, to evaluate the
|
|
|
1913 |
offset of the field <code>b</code> in the structure <code>struct
|
|
|
1914 |
tag</code> above, the producer has already done all the hard
|
|
|
1915 |
work, providing a formula for the offset in terms of the sizes
|
|
|
1916 |
and alignments of the basic C types. The translator merely
|
|
|
1917 |
provides these values and the offset is automatically evaluated
|
|
|
1918 |
by the constant evaluation transformations. Other aspects of the
|
|
|
1919 |
ABI, for example the procedure argument and result passing
|
|
|
1920 |
conventions, require more detailed attention.</para>
|
|
|
1921 |
|
|
|
1922 |
<para>One interesting range of optimisations implemented by many of
|
|
|
1923 |
the current translators consists of the inlining of certain
|
|
|
1924 |
standard procedure calls. For example, <code>strlen ( "hello"
|
|
|
1925 |
)</code> is replaced by 5. As it stands this optimisation appears
|
|
|
1926 |
to run the risk of corrupting the programmer's namespace - what
|
|
|
1927 |
if <code>strlen</code> was a user-defined procedure rather than
|
|
|
1928 |
the standard library routine (cf. the <code>open</code> example
|
|
|
1929 |
in section 2.2.3)? This risk only materialises however if we
|
|
|
1930 |
actually use the procedure name to spot this optimisation. In
|
|
|
1931 |
code compiled from the target independent headers all calls to
|
|
|
1932 |
the library routine <code>strlen</code> will be implemented by
|
|
|
1933 |
means of a uniquely named token, <code>ansi.string.strlen</code>
|
|
|
1934 |
say. It is by recognising this token name as the token is
|
|
|
1935 |
expanded that the translators are able to ensure that this is
|
|
|
1936 |
really the library routine <code>strlen</code>.</para>
|
|
|
1937 |
|
|
|
1938 |
<para>Another example of an inlined procedure of this type is
|
|
|
1939 |
<code>alloca</code>. Many other compilers inline
|
|
|
1940 |
<code>alloca</code>, or rather they inline
|
|
|
1941 |
<code>__builtin_alloca</code> and rely on the programmer to
|
|
|
1942 |
identify <code>alloca</code> with <code>__builtin_alloca</code>.
|
|
|
1943 |
This gets round the potential namespace problems by getting the
|
|
|
1944 |
programmer to confirm that <code>alloca</code> in the program
|
|
|
1945 |
really is the library routine <code>alloca</code>. By the use of
|
|
|
1946 |
tokens this information is automatically provided to the TDF
|
|
|
1947 |
translators.</para>
|
|
|
1948 |
</sect3>
|
|
|
1949 |
</sect2>
|
|
|
1950 |
|
|
|
1951 |
<sect2 id="S37">
|
|
|
1952 |
<title>3.4. TDF and APIs</title>
|
|
|
1953 |
<para>What the discussion above has
|
|
|
1954 |
emphasised is that the ability to describe APIs abstractly as
|
|
|
1955 |
target independent headers underpins the entire TDF approach to
|
|
|
1956 |
portability. We now consider this in more detail.</para>
|
|
|
1957 |
|
|
|
1958 |
<sect3 id="S38">
|
|
|
1959 |
<title>3.4.1. API Description</title>
|
|
|
1960 |
<para>The
|
|
|
1961 |
process of transforming an API specification into its description
|
|
|
1962 |
in terms of <code>#pragma token</code> directives is a
|
|
|
1963 |
time-consuming but often fascinating task. In this section we
|
|
|
1964 |
discuss some of the issues arising from the process of describing
|
|
|
1965 |
an API in this way.</para>
|
|
|
1966 |
|
|
|
1967 |
<sect4 id="S39">
|
|
|
1968 |
<title>3.4.1.1. The Description Process</title>
|
|
|
1969 |
<para>As may be observed from the example given in
|
|
|
1970 |
section 3.2.1, the <code>#pragma token</code> syntax is not
|
|
|
1971 |
necessarily intuitively obvious. It is designed to be a low-level
|
|
|
1972 |
description of tokens which is capable of expressing many complex
|
|
|
1973 |
token specifications. Most APIs are however specified in C-like
|
|
|
1974 |
terms, so an alternative syntax, closer to C, has been developed
|
|
|
1975 |
in order to facilitate their description. This is then
|
|
|
1976 |
transformed into the corresponding <code>#pragma token</code>
|
|
|
1977 |
directives by a specification tool called <code>tspec</code> (see
|
|
|
1978 |
[2]), which also applies a number of checks to the input and
|
|
|
1979 |
generates the unique token names. For example, the description
|
|
|
1980 |
leading to the example above was:
|
|
|
1981 |
<programlisting>
|
|
|
1982 |
+TYPE FILE ;
|
|
|
1983 |
+EXP FILE *stdout ;
|
|
|
1984 |
+FUNC int fputs ( const char *, FILE * ) ;
|
|
|
1985 |
</programlisting>
|
|
|
1986 |
Note how close this is to the English language specification
|
|
|
1987 |
of the API given previously. (There are a number of open issues
|
|
|
1988 |
relating to <code>tspec</code> and the <code>#pragma token</code>
|
|
|
1989 |
syntax, mainly concerned with determining the type of syntactic
|
|
|
1990 |
statements that it is desired to make about the APIs being
|
|
|
1991 |
described. The current scheme is adequate for those APIs so far
|
|
|
1992 |
considered, but it may need to be extended in future.)</para>
|
|
|
1993 |
|
|
|
1994 |
<para><code>tspec</code> is not capable of expressing the full power
|
|
|
1995 |
of the <code>#pragma token</code> syntax. Whereas this makes it
|
|
|
1996 |
easier to use in most cases, for describing the normal C-like
|
|
|
1997 |
objects such as types, expressions and procedures, it cannot
|
|
|
1998 |
express complex token descriptions. Instead it is necessary to
|
|
|
1999 |
express these directly in the <code>#pragma token</code> syntax.
|
|
|
2000 |
However this is only rarely required : the constructs
|
|
|
2001 |
<code>offsetof</code>, <code>va_start</code> and
|
|
|
2002 |
<code>va_arg</code> from ANSI are the only examples so far
|
|
|
2003 |
encountered during the API description programme at DRA. For
|
|
|
2004 |
example, <code>va_arg</code> takes an assignable expression of
|
|
|
2005 |
type <code>va_list</code> and a type <code>t</code> and returns
|
|
|
2006 |
an expression of type <code>t</code>. Clearly, this cannot be
|
|
|
2007 |
expressed abstractly in C-like terms; so the <code>#pragma
|
|
|
2008 |
token</code> description:
|
|
|
2009 |
<programlisting>
|
|
|
2010 |
#pragma token PROC ( EXP lvalue : va_list : e, TYPE t )\
|
|
|
2011 |
EXP rvalue : t : va_arg # ansi.stdarg.va_arg
|
|
|
2012 |
</programlisting>
|
|
|
2013 |
must be used instead.</para>
|
|
|
2014 |
|
|
|
2015 |
<para>Most of the process of describing an API consists of going
|
|
|
2016 |
through its English language specification transcribing the
|
|
|
2017 |
object specifications it gives into the <code>tspec</code> syntax
|
|
|
2018 |
(if the specification is given in a machine readable form this
|
|
|
2019 |
process can be partially automated). The interesting part
|
|
|
2020 |
consists of trying to interpret what is written and reading
|
|
|
2021 |
between the lines as to what is meant. It is important to try to
|
|
|
2022 |
represent exactly what is in the specification rather than being
|
|
|
2023 |
influenced by one's knowledge of a particular implementation,
|
|
|
2024 |
otherwise the API checking phase of the compilation will not be
|
|
|
2025 |
checking against what is actually in the API but against a
|
|
|
2026 |
particular way of implementing it.</para>
|
|
|
2027 |
|
|
|
2028 |
<para>There is a continuing API description programme at DRA. The
|
|
|
2029 |
current status (October 1993) is that ANSI (X3.159), POSIX
|
|
|
2030 |
(1003.1), XPG3 (X/Open Portability Guide 3) and SVID (System V
|
|
|
2031 |
Interface Definition, 3rd Edition) have been described and
|
|
|
2032 |
extensively tested. POSIX2 (1003.2), XPG4, AES (Revision A), X11
|
|
|
2033 |
(Release 5) and Motif (Version 1.1) have been described, but not
|
|
|
2034 |
yet extensively tested.</para>
|
|
|
2035 |
|
|
|
2036 |
<para>There may be some syntactic information in the paper API
|
|
|
2037 |
specifications which <code>tspec</code> (and the <code>#pragma
|
|
|
2038 |
token</code> syntax) is not yet capable of expressing. In
|
|
|
2039 |
particular, some APIs go into very careful management of
|
|
|
2040 |
namespaces within the API, explicitly spelling out exactly what
|
|
|
2041 |
should, and should not, appear in the namespaces as each header
|
|
|
2042 |
is included (see the appendix on namespaces and APIs below). What
|
|
|
2043 |
is actually being done here is to regard each header as an
|
|
|
2044 |
independent sub-API. There is not however a sufficiently
|
|
|
2045 |
developed "API calculus" to allow such relationships to be easily
|
|
|
2046 |
expressed.</para>
|
|
|
2047 |
</sect4>
|
|
|
2048 |
|
|
|
2049 |
<sect4 id="S40">
|
|
|
2050 |
<title>3.4.1.2. Resolving Conflicts</title>
|
|
|
2051 |
<para>>Another consideration during the description
|
|
|
2052 |
process is to try to integrate the various API descriptions. For
|
|
|
2053 |
example, POSIX extends ANSI, so it makes sense to have the target
|
|
|
2054 |
independent POSIX headers include the corresponding ANSI headers
|
|
|
2055 |
and just add the new objects introduced by POSIX. This does
|
|
|
2056 |
present problems with APIs which are basically compatible but
|
|
|
2057 |
have a small number of incompatibilities, whether deliberate or
|
|
|
2058 |
accidental. As an example of an "accidental" incompatibility,
|
|
|
2059 |
XPG3 is an extension of POSIX, but whereas POSIX declares
|
|
|
2060 |
<code>malloc</code> by means of the prototype:
|
|
|
2061 |
<programlisting>
|
|
|
2062 |
void *malloc(size_t);
|
|
|
2063 |
</programlisting>
|
|
|
2064 |
XPG3 declares it by means of the traditional procedure
|
|
|
2065 |
declaration:
|
|
|
2066 |
<programlisting>
|
|
|
2067 |
void *malloc(s)
|
|
|
2068 |
size_t s;
|
|
|
2069 |
</programlisting>
|
|
|
2070 |
These are surely intended to express the same thing, but in
|
|
|
2071 |
the first case the argument is passed as a <code>size_t</code> and
|
|
|
2072 |
in the second it is firstly promoted to the integer promotion of
|
|
|
2073 |
<code>size_t</code>. On most machines these are compatible, either
|
|
|
2074 |
because of the particular implementation of <code>size_t</code>, or
|
|
|
2075 |
because the procedure calling conventions make them compatible.
|
|
|
2076 |
However in general they are incompatible, so the target independent
|
|
|
2077 |
headers either have to reflect this or have to read between the
|
|
|
2078 |
lines and assume that the incompatibility was accidental and ignore
|
|
|
2079 |
it.</para>
|
|
|
2080 |
|
|
|
2081 |
<para>As an example of a deliberate incompatibility, both XPG3 and
|
|
|
2082 |
SVID3 declare a structure <code>struct msqid_ds</code> in
|
|
|
2083 |
<code>sys/msg.h</code> which has fields <code>msg_qnum</code> and
|
|
|
2084 |
<code>msg_qbytes</code>. The difference is that whereas XPG3
|
|
|
2085 |
declares these fields to have type <code>unsigned short</code>,
|
|
|
2086 |
SVID3 declares them to have type <code>unsigned long</code>.
|
|
|
2087 |
However for most purposes the precise types of these fields is
|
|
|
2088 |
not important, so the APIs can be unified by making the types of
|
|
|
2089 |
these fields target dependent. That is to say, tokenised integer
|
|
|
2090 |
types <code>__msg_q_t</code> and <code>__msg_l_t</code> are
|
|
|
2091 |
introduced. On XPG3-compliant machines these will both be defined
|
|
|
2092 |
to be <code>unsigned short</code>, and on SVID3-compliant
|
|
|
2093 |
machines they will both be <code>unsigned long</code>. So,
|
|
|
2094 |
although strict XPG3 and strict SVID3 are incompatible, the two
|
|
|
2095 |
extension APIs created by adding these types are compatible. In
|
|
|
2096 |
the rare case when the precise type of these fields is important,
|
|
|
2097 |
the strict APIs can be recovered by defining the field types to
|
|
|
2098 |
be <code>unsigned short</code> or <code>unsigned long</code> at
|
|
|
2099 |
produce-time rather than at install-time. (XPG4 uses a similar
|
|
|
2100 |
technique to resolve this incompatibility. But whereas the XPG4
|
|
|
2101 |
types need to be defined explicitly, the tokenised types are
|
|
|
2102 |
defined implicitly according to whatever the field types are on a
|
|
|
2103 |
particular machine.)</para>
|
|
|
2104 |
|
|
|
2105 |
<para>This example shows how introducing extra abstractions can
|
|
|
2106 |
resolve potential conflicts between APIs. But it may also be used
|
|
|
2107 |
to resolve conflicts between the API specification and the API
|
|
|
2108 |
implementations. For example, POSIX specifies that the structure
|
|
|
2109 |
<code>struct flock</code> defined in <code>fcntl.h</code> shall
|
|
|
2110 |
have a field <code>l_pid</code> of type <code>pid_t</code>.
|
|
|
2111 |
However on at least two of the POSIX implementations examined at
|
|
|
2112 |
DRA, <code>pid_t</code> was implemented as an <code>int</code>,
|
|
|
2113 |
but the <code>l_pid</code> field of <code>struct flock</code> was
|
|
|
2114 |
implemented as a <code>short</code> (this showed up in the TDF
|
|
|
2115 |
library building process). The immediate reaction might be that
|
|
|
2116 |
these system have not implemented POSIX correctly, so they should
|
|
|
2117 |
be cast into the outer darkness. However for the vast majority of
|
|
|
2118 |
applications, even those which use the <code>l_pid</code> field,
|
|
|
2119 |
its precise type is not important. So the decision was taken to
|
|
|
2120 |
introduce a tokenised integer type, <code>__flock_pid_t</code>,
|
|
|
2121 |
to stand for the type of the <code>l_pid</code> field. So
|
|
|
2122 |
although the implementations do not conform to strict POSIX, they
|
|
|
2123 |
do to this slightly more relaxed extension. Of course, one could
|
|
|
2124 |
enforce strict POSIX by defining <code>__flock_pid_t</code> to be
|
|
|
2125 |
<code>pid_t</code> at produce-time, but the given implementations
|
|
|
2126 |
would not conform to this stricter API.</para>
|
|
|
2127 |
|
|
|
2128 |
<para>Both the previous two examples are really concerned with the
|
|
|
2129 |
question of determining the correct level of abstraction in API
|
|
|
2130 |
specification. Abstraction is inclusive and allows for API
|
|
|
2131 |
evolution, whereas specialisation is exclusive and may lead to
|
|
|
2132 |
dead-end APIs. The SVID3 method of allowing for longer messages
|
|
|
2133 |
than XPG3 - changing the <code>msg_qnum</code> and
|
|
|
2134 |
<code>msg_qbytes</code> fields of <code>struct msqid_ds</code>
|
|
|
2135 |
from <code>unsigned short</code> to <code>unsigned long</code> -
|
|
|
2136 |
is an over-specialisation which leads to an unnecessary conflict
|
|
|
2137 |
with XPG3. The XPG4 method of achieving exactly the same end -
|
|
|
2138 |
abstracting the types of these fields - is, by contrast, a smooth
|
|
|
2139 |
evolutionary path.</para>
|
|
|
2140 |
</sect4>
|
|
|
2141 |
|
|
|
2142 |
<sect4 id="S41">
|
|
|
2143 |
<title>3.4.1.3. The Benefits of API Description</title>
|
|
|
2144 |
<para>The description process is potentially of
|
|
|
2145 |
great benefit to bodies involved in API specification. While the
|
|
|
2146 |
specification itself stays on paper the only real existence of
|
|
|
2147 |
the API is through its implementations. Giving the specification
|
|
|
2148 |
a concrete form means not only does it start to be seen as an
|
|
|
2149 |
object in its own right, rather than some fuzzy document
|
|
|
2150 |
underlying the real implementations, but also any omissions,
|
|
|
2151 |
insufficient specifications (where what is written down does not
|
|
|
2152 |
reflect what the writer actually meant) or built-in assumptions
|
|
|
2153 |
are more apparent. It may also be able to help show up the kind
|
|
|
2154 |
of over-specialisation discussed above. The concrete
|
|
|
2155 |
representation also becomes an object which both applications and
|
|
|
2156 |
implementations can be automatically checked against. As has been
|
|
|
2157 |
mentioned previously, the production phase of the compilation
|
|
|
2158 |
involves checking the program against the abstract API
|
|
|
2159 |
description, and the library building phase checks the syntactic
|
|
|
2160 |
aspect of the implementation against it.</para>
|
|
|
2161 |
|
|
|
2162 |
<para>The implementation checking aspect is considered below. Let us
|
|
|
2163 |
here consider the program checking aspect by re-examining the
|
|
|
2164 |
examples given in section 2.2.4.1. The <code>SIGKILL</code>
|
|
|
2165 |
example is straightforward; <code>SIGKILL</code> will appear in
|
|
|
2166 |
the POSIX version of <code>signal.h</code> but not the ANSI
|
|
|
2167 |
version, so if the program is compiled with the target
|
|
|
2168 |
independent ANSI headers it will be reported as being undefined.
|
|
|
2169 |
In a sense this is nothing to do with the <code>#pragma
|
|
|
2170 |
token</code> syntax, but with the organisation of the target
|
|
|
2171 |
independent headers. The other examples do however rely on the
|
|
|
2172 |
fact that the <code>#pragma token</code> syntax can express
|
|
|
2173 |
syntactic information in a way which is not possible directly
|
|
|
2174 |
from C. Thus the target independent headers express exactly the
|
|
|
2175 |
fact that <code>time_t</code> is an arithmetic type, about which
|
|
|
2176 |
nothing else is known. Thus <code>( t & 1 )</code> is not
|
|
|
2177 |
type correct for a <code>time_t t</code> because the binary
|
|
|
2178 |
<code>&</code> operator does not apply to all arithmetic
|
|
|
2179 |
types. Similarly, for the type <code>div_t</code> the target
|
|
|
2180 |
independent headers express the information that there exists a
|
|
|
2181 |
structure type <code>div_t</code> and field selectors
|
|
|
2182 |
<code>quot</code> and <code>rem</code> of <code>div_t</code> of
|
|
|
2183 |
type <code>int</code>, but nothing about the order of these
|
|
|
2184 |
fields or the existence of other fields. Thus any attempt to
|
|
|
2185 |
initialise a <code>div_t</code> will fail because the
|
|
|
2186 |
correspondence between the values in the initialisation and the
|
|
|
2187 |
fields of the structure is unknown. The <code>struct
|
|
|
2188 |
dirent</code> example is entirely analogous, except that here the
|
|
|
2189 |
declarations of the structure type <code>struct dirent</code> and
|
|
|
2190 |
the field selector <code>d_name</code> appear in both the POSIX
|
|
|
2191 |
and XPG3 versions of <code>dirent.h</code>, whereas the field
|
|
|
2192 |
selector <code>d_ino</code> appears only in the XPG3 version.</para>
|
|
|
2193 |
</sect4>
|
|
|
2194 |
</sect3>
|
|
|
2195 |
|
|
|
2196 |
<sect3 id="S42">
|
|
|
2197 |
<title>3.4.2. TDF Library Building</title>
|
|
|
2198 |
<para>As
|
|
|
2199 |
we have said, two of the primary problems with writing portable
|
|
|
2200 |
programs are dealing with API implementation errors on the target
|
|
|
2201 |
machines - objects not being defined, or being defined in the
|
|
|
2202 |
wrong place, or being implemented incorrectly - and namespace
|
|
|
2203 |
problems - particularly those introduced by the system headers.
|
|
|
2204 |
The most interesting contrast between the traditional compilation
|
|
|
2205 |
scheme (Fig. 1) and the TDF scheme (Fig. 5) is that in the former
|
|
|
2206 |
the program comes directly into contact with the "real world" of
|
|
|
2207 |
messy system headers and incorrectly implemented APIs, whereas in
|
|
|
2208 |
the latter there is an "ideal world" layer interposed. This
|
|
|
2209 |
consists of the target independent headers, which describe all
|
|
|
2210 |
the syntactic features of the API where they are meant to be, and
|
|
|
2211 |
with no extraneous material to clutter up the namespaces (like
|
|
|
2212 |
<code>index</code> and the macro <code>st_atime</code> in the
|
|
|
2213 |
examples given in section 2.2.3), and the TDF libraries, which
|
|
|
2214 |
can be combined "cleanly" with the program without any namespace
|
|
|
2215 |
problems. All the unpleasantness has been shifted to the
|
|
|
2216 |
interface between this "ideal world" and the "real world"; that
|
|
|
2217 |
is to say, the TDF library building.</para>
|
|
|
2218 |
|
|
|
2219 |
<para>The importance of this change may be summarised by observing
|
|
|
2220 |
that previously all the unpleasantnesses happened in the left
|
|
|
2221 |
hand side of the diagram (the program half), whereas in the TDF
|
|
|
2222 |
scheme they are in the right hand side (the API half). So API
|
|
|
2223 |
implementation problems are seen to be a genuinely separate issue
|
|
|
2224 |
from the main business of writing programs; the ball is firmly in
|
|
|
2225 |
the API implementor's court rather than the programmer's. Also
|
|
|
2226 |
the problems need to be solved once per API rather than once per
|
|
|
2227 |
program.</para>
|
|
|
2228 |
|
|
|
2229 |
<para>It might be said that this has not advanced us very far
|
|
|
2230 |
towards actually dealing with the implementation errors. The API
|
|
|
2231 |
implementation still contains errors whoever's responsibility it
|
|
|
2232 |
is. But the TDF library building process gives the API
|
|
|
2233 |
implementor a second chance. Many of the syntactic implementation
|
|
|
2234 |
problems will be shown up as the library builder compares the
|
|
|
2235 |
implementation against the abstract API description, and it may
|
|
|
2236 |
be possible to build corrections into the TDF libraries so that
|
|
|
2237 |
the libraries reflect, not the actual implementation, but some
|
|
|
2238 |
improved version of it.</para>
|
|
|
2239 |
|
|
|
2240 |
<para>To show how this might be done, we reconsider the examples of
|
|
|
2241 |
API implementation errors given in section 2.2.4.2. As before we
|
|
|
2242 |
may divide our discussion between system header problems and
|
|
|
2243 |
system library problems. Recall however the important
|
|
|
2244 |
distinction, that whereas previously the programmer was trying to
|
|
|
2245 |
deal with these problems in a way which would work on all
|
|
|
2246 |
machines (top left of the compilation diagrams), now the person
|
|
|
2247 |
building the TDF libraries is trying to deal with implementation
|
|
|
2248 |
problems for a particular API on a particular machine (bottom
|
|
|
2249 |
right).</para>
|
|
|
2250 |
|
|
|
2251 |
<sect4 id="S43">
|
|
|
2252 |
<title>3.4.2.1. System Header Problems</title>
|
|
|
2253 |
<para>Values which are defined in the wrong place,
|
|
|
2254 |
such as <code>SEEK_SET</code> in the example given, present no
|
|
|
2255 |
difficulties. The library builder will look where it expects to
|
|
|
2256 |
find them and report that they are undefined. To define these
|
|
|
2257 |
values it is merely a matter of telling the library builder where
|
|
|
2258 |
they are actually defined (in <code>unistd.h</code> rather than
|
|
|
2259 |
<code>stdio.h</code>).</para>
|
|
|
2260 |
|
|
|
2261 |
<para>Similarly, values which are undefined are also reported. If
|
|
|
2262 |
these values can be deduced from other information, then it is a
|
|
|
2263 |
simple matter to tell the library builder to use these deduced
|
|
|
2264 |
values. For example, if <code>EXIT_SUCCESS</code> and
|
|
|
2265 |
<code>EXIT_FAILURE</code> are undefined, it is probably possible
|
|
|
2266 |
to deduce their values from experimentation or experience (or
|
|
|
2267 |
guesswork).</para>
|
|
|
2268 |
|
|
|
2269 |
<para>Wrongly defined values are more difficult. Firstly they are
|
|
|
2270 |
not necessarily detected by the library builder because they are
|
|
|
2271 |
semantic rather than syntactic errors. Secondly, whereas it is
|
|
|
2272 |
easy to tell the library builder to use a corrected value rather
|
|
|
2273 |
than the value given in the implementation, this mechanism needs
|
|
|
2274 |
to be used with circumspection. The system libraries are provided
|
|
|
2275 |
pre-compiled, and they have been compiled using the system
|
|
|
2276 |
headers. If we define these values differently in the TDF
|
|
|
2277 |
libraries we are effectively changing the system headers, and
|
|
|
2278 |
there is a risk of destroying the interface with the system
|
|
|
2279 |
libraries. For example, changing a structure is not a good idea,
|
|
|
2280 |
because different parts of the program - the main body and the
|
|
|
2281 |
parts linked in from the system libraries - will have different
|
|
|
2282 |
ideas of the size and layout of this structure. (See the
|
|
|
2283 |
<code>struct flock</code> example in section 3.4.1.2 for a
|
|
|
2284 |
potential method of resolving such implementation problems.)</para>
|
|
|
2285 |
|
|
|
2286 |
<para>In the two cases given above - <code>DBL_MAX</code> and
|
|
|
2287 |
<code>size_t</code> - the necessary changes are probably "safe".
|
|
|
2288 |
<code>DBL_MAX</code> is not a special value in any library
|
|
|
2289 |
routines, and changing <code>size_t</code> from <code>int</code>
|
|
|
2290 |
to <code>unsigned int</code> does not affect its size, alignment
|
|
|
2291 |
or procedure passing rules (at least not on the target machines
|
|
|
2292 |
we have in mind) and so should not disrupt the interface with the
|
|
|
2293 |
system library.</para>
|
|
|
2294 |
</sect4>
|
|
|
2295 |
|
|
|
2296 |
<sect4 id="S44">
|
|
|
2297 |
<title>3.4.2.2. System Library Problems</title>
|
|
|
2298 |
<para>Errors in the system libraries will not be
|
|
|
2299 |
detected by the TDF library builder because they are semantic
|
|
|
2300 |
errors, whereas the library building process is only checking
|
|
|
2301 |
syntax. The only realistic ways of detecting semantic problems is
|
|
|
2302 |
by means of test suites, such as the Plum-Hall or CVSA library
|
|
|
2303 |
tests for ANSI and VSX for XPG3, or by detailed knowledge of
|
|
|
2304 |
particular API implementations born of personal experience.
|
|
|
2305 |
However it may be possible to build workarounds for problems
|
|
|
2306 |
identified in these tests into the TDF libraries.</para>
|
|
|
2307 |
|
|
|
2308 |
<para>For example, the problem with <code>realloc</code> discussed
|
|
|
2309 |
in section 2.2.4.4 could be worked around by defining the token
|
|
|
2310 |
representing <code>realloc</code> to be the equivalent of:
|
|
|
2311 |
<programlisting>
|
|
|
2312 |
#define realloc ( p, s ) ( void *q = ( p ) ? ( realloc ) ( q, s ) : malloc ( s ) )
|
|
|
2313 |
</programlisting>
|
|
|
2314 |
(where the C syntax has been extended to allow variables to
|
|
|
2315 |
be introduced inside expressions) or:
|
|
|
2316 |
<programlisting>
|
|
|
2317 |
static void *__realloc ( void *p, size_t s )
|
|
|
2318 |
{
|
|
|
2319 |
if ( p == NULL ) return ( malloc ( s ) ) ;
|
|
|
2320 |
return ( ( realloc ) ( p, s ) ) ;
|
|
|
2321 |
}
|
|
|
2322 |
|
|
|
2323 |
#define realloc ( p, s ) __realloc ( p, s )
|
|
|
2324 |
</programlisting>
|
|
|
2325 |
Alternatively, the token definition could be encoded directly
|
|
|
2326 |
into TDF (not via C), using the TDF notation compiler (see [9]).</para>
|
|
|
2327 |
</sect4>
|
|
|
2328 |
|
|
|
2329 |
<sect4 id="S45">
|
|
|
2330 |
<title>3.4.2.3. TDF Library Builders</title>
|
|
|
2331 |
<para>The discussion above shows how the TDF libraries
|
|
|
2332 |
are an extra layer which lies on top of the existing system API
|
|
|
2333 |
implementation, and how this extra layer can be exploited to
|
|
|
2334 |
provide corrections and workarounds to various implementation
|
|
|
2335 |
problems. The expertise of particular API implementation problems
|
|
|
2336 |
on particular machines can be captured once and for all in the
|
|
|
2337 |
TDF libraries, rather than being spread piecemeal over all the
|
|
|
2338 |
programs which use that API implementation. But being able to
|
|
|
2339 |
encapsulate this expertise in this way makes it a marketable
|
|
|
2340 |
quantity. One could envisage a market in TDF libraries: ranging
|
|
|
2341 |
from libraries closely reflecting the actual API implementation
|
|
|
2342 |
to top of the range libraries with many corrections and
|
|
|
2343 |
workarounds built in.</para>
|
|
|
2344 |
|
|
|
2345 |
<para>All of this has tended to paint the system vendors as the
|
|
|
2346 |
villains of the piece for not providing correct API
|
|
|
2347 |
implementations, but this is not entirely fair. The reason why
|
|
|
2348 |
API implementation errors may persist over many operating system
|
|
|
2349 |
releases is that system vendors have as many porting problems as
|
|
|
2350 |
anyone else - preparing a new operating system release is in
|
|
|
2351 |
effect a huge porting exercise - and are understandably reluctant
|
|
|
2352 |
to change anything which basically works. The use of TDF
|
|
|
2353 |
libraries could be a low-risk strategy for system vendors to
|
|
|
2354 |
allow users the benefits of API conformance without changing the
|
|
|
2355 |
underlying operating system.</para>
|
|
|
2356 |
|
|
|
2357 |
<para>Of course, if the system vendor's porting problems could be
|
|
|
2358 |
reduced, they would have more confidence to make their underlying
|
|
|
2359 |
systems more API conformant, and thereby help reduce the normal
|
|
|
2360 |
programmer's porting problems. So whereas using the TDF libraries
|
|
|
2361 |
might be a short-term workaround for API implementation problems,
|
|
|
2362 |
the rest of the TDF porting system might help towards a long-term
|
|
|
2363 |
solution.</para>
|
|
|
2364 |
|
|
|
2365 |
<para>Another interesting possibility arises. As we said above, many
|
|
|
2366 |
APIs, for example POSIX and BSD, offer equivalent functionality
|
|
|
2367 |
by different methods. It may be possible to use the TDF library
|
|
|
2368 |
building process to express one in terms of the other. For
|
|
|
2369 |
example, in the <code>struct dirent</code> example10 in section
|
|
|
2370 |
2.3.3, the only differences between POSIX and BSD were that the
|
|
|
2371 |
BSD version was defined in a different header and that the
|
|
|
2372 |
structure was called <code>struct direct</code>. But this
|
|
|
2373 |
presents no problems to the TDF library builder : it is perfectly
|
|
|
2374 |
simple to tell it to look in <code>sys/dir.h</code> instead of
|
|
|
2375 |
<code>dirent.h</code> , and to identify <code>struct
|
|
|
2376 |
direct</code> with <code>struct dirent</code>. So it may be
|
|
|
2377 |
possible to build a partial POSIX lookalike on BSD systems by
|
|
|
2378 |
using the TDF library mechanism.</para>
|
|
|
2379 |
</sect4>
|
|
|
2380 |
</sect3>
|
|
|
2381 |
</sect2>
|
|
|
2382 |
|
|
|
2383 |
<sect2 id="S46">
|
|
|
2384 |
<title>3.5. TDF and Conditional Compilation</title>
|
|
|
2385 |
<para>So far our
|
|
|
2386 |
discussion of the TDF approach to portability has been confined
|
|
|
2387 |
to the simplest case, where the program itself contains no target
|
|
|
2388 |
dependent code. We now turn to programs which contain conditional
|
|
|
2389 |
compilation. As we have seen, many of the reasons why it is
|
|
|
2390 |
necessary to introduce conditional compilation into the
|
|
|
2391 |
traditional compilation process either do not arise or are seen
|
|
|
2392 |
to be distinct phases in the TDF compilation process. The use of
|
|
|
2393 |
a single front-end (the producer) virtually eliminates problems
|
|
|
2394 |
of compiler limitations and differing interpretations and reduces
|
|
|
2395 |
compiler bug problems, so it is not necessary to introduce
|
|
|
2396 |
conditionally compiled workarounds for these. Also API
|
|
|
2397 |
implementation problems, another prime reason for introducing
|
|
|
2398 |
conditional compilation in the traditional scheme, are seen to be
|
|
|
2399 |
isolated in the TDF library building process, thereby allowing
|
|
|
2400 |
the programmer to work in an idealised world one step removed
|
|
|
2401 |
from the real API implementations. However the most important
|
|
|
2402 |
reason for introducing conditional compilation is where things,
|
|
|
2403 |
for reasons of efficiency or whatever, are genuinely different on
|
|
|
2404 |
different machines. It is this we now consider.</para>
|
|
|
2405 |
|
|
|
2406 |
<sect3 id="S47">
|
|
|
2407 |
<title>3.5.1. User-Defined APIs</title>The
|
|
|
2408 |
things which are done genuinely differently on different machines
|
|
|
2409 |
have previously been characterised as comprising the user-defined
|
|
|
2410 |
component of the API. So the real issue in this case is how to
|
|
|
2411 |
use the TDF API description and representation methods within
|
|
|
2412 |
one's own programs. A very simple worked example is given below
|
|
|
2413 |
(in section 3.5.2), for more detailed examples see [8].
|
|
|
2414 |
|
|
|
2415 |
<para>For the <code>MSB</code> example given in section 2.3 we
|
|
|
2416 |
firstly have to decide what the user-defined API is. To fully
|
|
|
2417 |
reflect exactly what the target dependent code is, we could
|
|
|
2418 |
define the API, in <code>tspec</code> terms, to be:
|
|
|
2419 |
<programlisting>
|
|
|
2420 |
+MACRO unsigned char MSB ( unsigned int a ) ;
|
|
|
2421 |
</programlisting>
|
|
|
2422 |
where the macro <code>MSB</code> gives the most significant
|
|
|
2423 |
byte of its argument, <code>a</code>. Let us say that the
|
|
|
2424 |
corresponding <code>#pragma token</code> statement is put into the
|
|
|
2425 |
header <code>msb.h</code>. Then the program can be recast into the
|
|
|
2426 |
form:
|
|
|
2427 |
<programlisting>
|
|
|
2428 |
#include <stdio.h>
|
|
|
2429 |
#include "msb.h"
|
|
|
2430 |
|
|
|
2431 |
unsigned int x = 100000000 ;
|
|
|
2432 |
|
|
|
2433 |
int main ()
|
|
|
2434 |
{
|
|
|
2435 |
printf ( "%u\n", MSB ( x ) ) ;
|
|
|
2436 |
return ( 0 ) ;
|
|
|
2437 |
}
|
|
|
2438 |
</programlisting>
|
|
|
2439 |
The producer will compile this into a target independent TDF
|
|
|
2440 |
capsule which uses a token to represent the use of
|
|
|
2441 |
<code>MSB</code>, but leaves this token undefined. The only
|
|
|
2442 |
question that remains is how this token is defined on the target
|
|
|
2443 |
machine; that is, how the user-defined API is implemented. On each
|
|
|
2444 |
target machine a TDF library containing the local definition of the
|
|
|
2445 |
token representing <code>MSB</code> needs to be built. There are
|
|
|
2446 |
two basic possibilities. Firstly the person performing the
|
|
|
2447 |
installation could build the library directly, by compiling a
|
|
|
2448 |
program of the form:
|
|
|
2449 |
<programlisting>
|
|
|
2450 |
#pragma implement interface "msb.h"
|
|
|
2451 |
#include "config.h"
|
|
|
2452 |
|
|
|
2453 |
#ifndef SLOW_SHIFT
|
|
|
2454 |
#define MSB ( a ) ( ( unsigned char ) ( a >> 24 ) )
|
|
|
2455 |
#else
|
|
|
2456 |
#ifdef BIG_ENDIAN
|
|
|
2457 |
#define MSB ( a ) *( ( unsigned char * ) &( a ) )
|
|
|
2458 |
#else
|
|
|
2459 |
#define MSB ( a ) *( ( unsigned char * ) &( a ) + 3 )
|
|
|
2460 |
#endif
|
|
|
2461 |
#endif
|
|
|
2462 |
</programlisting>
|
|
|
2463 |
with the appropriate <code>config.h</code> to choose the
|
|
|
2464 |
correct local implementation of the interface described in
|
|
|
2465 |
<code>msb.h</code>. Alternatively the programmer could provide
|
|
|
2466 |
three alternative TDF libraries corresponding to the three
|
|
|
2467 |
implementations, and let the person installing the program choose
|
|
|
2468 |
between these. The two approaches are essentially equivalent, they
|
|
|
2469 |
just provide for making the choice of the implementation of the
|
|
|
2470 |
user-defined component of the API in different ways. An interesting
|
|
|
2471 |
alternative approach would be to provide a short program which does
|
|
|
2472 |
the selection between the provided API implementations
|
|
|
2473 |
automatically. This approach might be particularly effective in
|
|
|
2474 |
deciding which implementation offers the best performance on a
|
|
|
2475 |
particular target machine.</para>
|
|
|
2476 |
</sect3>
|
|
|
2477 |
|
|
|
2478 |
<sect3>
|
|
|
2479 |
<title id="S48">3.5.2. User Defined Tokens - Example</title>
|
|
|
2480 |
<para>As an example of how to define a simple token
|
|
|
2481 |
consider the following example. We have a simple program which
|
|
|
2482 |
prints "hello" in some language, the language being target
|
|
|
2483 |
dependent. Our first task is choose an API. We choose ANSI C
|
|
|
2484 |
extended by a tokenised object <code>hello</code> of type
|
|
|
2485 |
<code>char *</code> which gives the message to be printed. This
|
|
|
2486 |
object will be an rvalue (i.e. it cannot be assigned to). For
|
|
|
2487 |
convenience this token is declared in a header file,
|
|
|
2488 |
<code>tokens.h</code> say. This particular case is simple enough
|
|
|
2489 |
to encode by hand; it takes the form:
|
|
|
2490 |
<programlisting>
|
|
|
2491 |
#pragma token EXP rvalue : char * : hello #
|
|
|
2492 |
#pragma interface hello
|
|
|
2493 |
</programlisting>consisting of a <code>#pragma token</code> directive
|
|
|
2494 |
describing the object to be tokenised, and a <code>#pragma
|
|
|
2495 |
interface</code> directive to show that this is the only object in
|
|
|
2496 |
the API. An alternative would be to generate <code>tokens.h</code>
|
|
|
2497 |
from a <code>tspec</code> specification of the form:
|
|
|
2498 |
<programlisting>
|
|
|
2499 |
+EXP char *hello ;
|
|
|
2500 |
</programlisting>The next task is to write the program conforming to this API.
|
|
|
2501 |
This may take the form of a single source file,
|
|
|
2502 |
<code>hello.c</code>, containing the lines:
|
|
|
2503 |
<programlisting>
|
|
|
2504 |
#include <stdio.h>
|
|
|
2505 |
#include "tokens.h"
|
|
|
2506 |
|
|
|
2507 |
int main ()
|
|
|
2508 |
{
|
|
|
2509 |
printf ( "%s\n", hello ) ;
|
|
|
2510 |
return ( 0 ) ;
|
|
|
2511 |
}
|
|
|
2512 |
</programlisting>The production process may be specified by means of a <code>
|
|
|
2513 |
Makefile</code>. This uses the TDF C compiler, <code>tcc</code>,
|
|
|
2514 |
which is an interface to the TDF system which is designed to be
|
|
|
2515 |
like <code>cc</code>, but with extra options to handle the extra
|
|
|
2516 |
functionality offered by the TDF system (see [1]).
|
|
|
2517 |
<programlisting>
|
|
|
2518 |
produce : hello.j
|
|
|
2519 |
echo "PRODUCTION COMPLETE"
|
|
|
2520 |
|
|
|
2521 |
hello.j : hello.c tokens.h
|
|
|
2522 |
echo "PRODUCTION : C->TDF"
|
|
|
2523 |
tcc -Fj hello.c
|
|
|
2524 |
</programlisting>The production is run by typing <code>make produce</code>.
|
|
|
2525 |
The ANSI API is the default, and so does not need to be specified
|
|
|
2526 |
to <code>tcc</code>. The program <code>hello.c</code> is compiled
|
|
|
2527 |
to a target independent capsule, <code>hello.j</code>. This will
|
|
|
2528 |
use a token to represent <code>hello</code>, but it will be left
|
|
|
2529 |
undefined.
|
|
|
2530 |
|
|
|
2531 |
<para>On each target machine we need to create a token library
|
|
|
2532 |
giving the local definitions of the objects in the API. We shall
|
|
|
2533 |
assume that the library corresponding to the ANSI C API has
|
|
|
2534 |
already been constructed, so that we only need to define the
|
|
|
2535 |
token representing <code>hello</code>. This is done by means of a
|
|
|
2536 |
short C program, <code>tokens.c</code>, which implements the
|
|
|
2537 |
tokens declared in <code>tokens.h</code>. This might take the
|
|
|
2538 |
form:</para>
|
|
|
2539 |
<programlisting>
|
|
|
2540 |
#pragma implement interface "tokens.h"
|
|
|
2541 |
#define hello "bonjour"
|
|
|
2542 |
</programlisting>to define <code>hello</code> to be "bonjour". On a different
|
|
|
2543 |
machine, the definition of <code>hello</code> could be given as
|
|
|
2544 |
"hello", "guten Tag", "zdrastvetye" (excuse my transliteration) or
|
|
|
2545 |
whatever (including complex expressions as well as simple strings).
|
|
|
2546 |
Note the use of <code>#pragma implement interface</code> to
|
|
|
2547 |
indicate that we are now implementing the API described in
|
|
|
2548 |
<code>tokens.h</code>, as opposed to the use of
|
|
|
2549 |
<code>#include</code> earlier when we were just using the API.
|
|
|
2550 |
|
|
|
2551 |
<para>The installation process may be specified by adding the
|
|
|
2552 |
following lines to the <code>Makefile</code>:</para>
|
|
|
2553 |
<programlisting>
|
|
|
2554 |
install : hello
|
|
|
2555 |
echo "INSTALLATION COMPLETE"
|
|
|
2556 |
|
|
|
2557 |
hello : hello.j tokens.tl
|
|
|
2558 |
echo "INSTALLATION : TDF->TARGET"
|
|
|
2559 |
tcc -o hello -J. -jtokens hello.j
|
|
|
2560 |
|
|
|
2561 |
tokens.tl : tokens.j
|
|
|
2562 |
echo "LIBRARY BUILDING : LINKING LIBRARY"
|
|
|
2563 |
tcc -Ymakelib -o tokens.tl tokens.j
|
|
|
2564 |
|
|
|
2565 |
tokens.j : tokens.c tokens.h
|
|
|
2566 |
echo "LIBRARY BUILDING : DEFINING TOKENS"
|
|
|
2567 |
tcc -Fj -not_ansi tokens.c
|
|
|
2568 |
</programlisting>The complete installation process is run by typing <code>make
|
|
|
2569 |
install</code>. Firstly the file <code>tokens.c</code> is compiled
|
|
|
2570 |
to give the TDF capsule <code>tokens.j</code> containing the
|
|
|
2571 |
definition of <code>hello</code>. The <b>-not_ansi</b> flag is
|
|
|
2572 |
needed because <code>tokens.c</code> does not contain any real C
|
|
|
2573 |
(declarations or definitions), which is not allowed in ANSI C. The
|
|
|
2574 |
next step is to turn the capsule <code>tokens.j</code> into a TDF
|
|
|
2575 |
library, <code>tokens.tl</code>, using the <b>-Ymakelib</b> option
|
|
|
2576 |
to <code>tcc</code> (with older versions of <code>tcc</code> it may
|
|
|
2577 |
be necessary to change this option to <b>-Ymakelib -M -Fj</b>).
|
|
|
2578 |
This completes the API implementation.</para>
|
|
|
2579 |
|
|
|
2580 |
<para>The final step is installation. The target independent TDF,
|
|
|
2581 |
<code>hello.j</code>, is linked with the TDF libraries
|
|
|
2582 |
<code>tokens.tl</code> and <code>ansi.tl</code> (which is built
|
|
|
2583 |
into <code>tcc</code> as default) to form a target dependent TDF
|
|
|
2584 |
capsule with all the necessary token definitions, which is then
|
|
|
2585 |
translated to a binary object file and linked with the system
|
|
|
2586 |
libraries. All of this is under the control of
|
|
|
2587 |
<code>tcc</code>.</para>
|
|
|
2588 |
|
|
|
2589 |
<para>Note the four stages of the compilation : API specification,
|
|
|
2590 |
production, API implementation and installation, corresponding to
|
|
|
2591 |
the four regions of the compilation diagram (Fig. 5).</para>
|
|
|
2592 |
</sect3>
|
|
|
2593 |
|
|
|
2594 |
<sect3>
|
|
|
2595 |
<title id="S49">3.5.3. Conditional Compilation within TDF</title>
|
|
|
2596 |
<para>Although tokens are the main method used to deal with
|
|
|
2597 |
target dependencies, TDF does have built-in conditional
|
|
|
2598 |
compilation constructs. For most TDF sorts <code>X</code> (for
|
|
|
2599 |
example, exp, shape or variety) there is a construct
|
|
|
2600 |
<code>X_cond</code> which takes an exp and two <code>X</code>'s
|
|
|
2601 |
and gives an <code>X</code>. The exp argument will evaluate to an
|
|
|
2602 |
integer constant at install time. If this is true (nonzero), the
|
|
|
2603 |
result of the construct is the first <code>X</code> argument and
|
|
|
2604 |
the second is ignored; otherwise the result is the second
|
|
|
2605 |
<code>X</code> argument and the first is ignored. By ignored we
|
|
|
2606 |
mean completely ignored - the argument is stepped over and not
|
|
|
2607 |
decoded. In particular any tokens in the definition of this
|
|
|
2608 |
argument are not expanded, so it does not matter if they are
|
|
|
2609 |
undefined.</para>
|
|
|
2610 |
|
|
|
2611 |
<para>These conditional compilation constructs are used by the C
|
|
|
2612 |
-> TDF producer to translate certain statements
|
|
|
2613 |
containing:
|
|
|
2614 |
<programlisting>
|
|
|
2615 |
#if condition
|
|
|
2616 |
</programlisting>
|
|
|
2617 |
where <code>condition</code> is a target dependent value.
|
|
|
2618 |
Thus, because it is not known which branch will be taken at produce
|
|
|
2619 |
time, the decision is postponed to install time. If
|
|
|
2620 |
<code>condition</code> is a target independent value then the
|
|
|
2621 |
branch to be taken is known at produce time, so the producer only
|
|
|
2622 |
translates this branch. Thus, for example, code surrounded by
|
|
|
2623 |
<code>#if 0</code> ... <code>#endif</code> will be ignored by the
|
|
|
2624 |
producer.</para>
|
|
|
2625 |
|
|
|
2626 |
<para>Not all such <code>#if</code> statements can be translated
|
|
|
2627 |
into TDF <code>X_cond</code> constructs. The two branches of the
|
|
|
2628 |
<code>#if</code> statement are translated into the two
|
|
|
2629 |
<code>X</code> arguments of the <code>X_cond</code> construct;
|
|
|
2630 |
that is, into sub-trees of the TDF syntax tree. This can only be
|
|
|
2631 |
done if each of the two branches is syntactically complete.</para>
|
|
|
2632 |
|
|
|
2633 |
<para>The producer interprets <code>#ifdef</code> (and
|
|
|
2634 |
<code>#ifndef</code>) constructs to mean, is this macro is
|
|
|
2635 |
defined (or undefined) at produce time? Given the nature of
|
|
|
2636 |
pre-processing in C this is in fact the only sensible
|
|
|
2637 |
interpretation. But if such constructs are being used to control
|
|
|
2638 |
conditional compilation, what is actually intended is, is this
|
|
|
2639 |
macro defined at install time? This distinction is necessitated
|
|
|
2640 |
by the splitting of the TDF compilation into production and
|
|
|
2641 |
installation - it does not exist in the traditional compilation
|
|
|
2642 |
scheme. For example, in the mips example in section 2.3, whether
|
|
|
2643 |
or not <code>mips</code> is defined is intended to be an
|
|
|
2644 |
installer property, rather than what it is interpreted as, a
|
|
|
2645 |
producer property. The choice of the conditional compilation path
|
|
|
2646 |
may be put off to install time by, for example, changing
|
|
|
2647 |
<code>#ifdef mips</code> to <code>#if is_mips</code> where
|
|
|
2648 |
<code>is_mips</code> is a tokenised integer which is either 1 (on
|
|
|
2649 |
those machines on which <code>mips</code> would be defined) or 0
|
|
|
2650 |
(otherwise). In fact in view of what was said above about
|
|
|
2651 |
syntactic completeness, it might be better to recast the program
|
|
|
2652 |
as:
|
|
|
2653 |
<programlisting>
|
|
|
2654 |
#include <stdio.h>
|
|
|
2655 |
#include "user_api.h" /* For the spec of is_mips */
|
|
|
2656 |
|
|
|
2657 |
int main ()
|
|
|
2658 |
{
|
|
|
2659 |
if ( is_mips ) {
|
|
|
2660 |
fputs ( "This machine is a mips\n", stdout ) ;
|
|
|
2661 |
}
|
|
|
2662 |
return ( 0 ) ;
|
|
|
2663 |
}
|
|
|
2664 |
</programlisting>because the branches of an <code>if</code> statement, unlike
|
|
|
2665 |
those of an <code>#if</code> statement, have to be syntactically
|
|
|
2666 |
complete is any case. The installer will optimise out the
|
|
|
2667 |
unnecessary test and any unreached code, so the use of <code>if (
|
|
|
2668 |
condition )</code> is guaranteed to produce as efficient code as
|
|
|
2669 |
<code>#if condition</code>.</para>
|
|
|
2670 |
|
|
|
2671 |
<para>In order to help detect such "installer macro" problems the
|
|
|
2672 |
producer has a mode for detecting them. All <code>#ifdef</code>
|
|
|
2673 |
and <code>#ifndef</code> constructs in which the compilation path
|
|
|
2674 |
to be taken is potentially target dependent are reported (see [3]
|
|
|
2675 |
and [8]).</para>
|
|
|
2676 |
|
|
|
2677 |
<para>The existence of conditional compilation within TDF also gives
|
|
|
2678 |
flexibility in how to approach expressing target dependent code.
|
|
|
2679 |
Instead of a "full" abstraction of the user-defined API as target
|
|
|
2680 |
dependent types, values and functions, it can be abstracted as a
|
|
|
2681 |
set of binary tokens (like <code>is_mips</code> in the example
|
|
|
2682 |
above) which are used to control conditional compilation. This
|
|
|
2683 |
latter approach can be used to quickly adapt existing programs to
|
|
|
2684 |
a TDF-portable form since it is closer to the "traditional"
|
|
|
2685 |
approach of scattering the program with <code>#ifdef</code>'s and
|
|
|
2686 |
<code>#ifndef</code>'s to implement target dependent code.
|
|
|
2687 |
However the definition of a user-defined API gives a better
|
|
|
2688 |
separation of target independent and target dependent code, and
|
|
|
2689 |
the effort to define such as API may often be justified. When
|
|
|
2690 |
writing a new program from scratch the API rather than the
|
|
|
2691 |
conditional compilation approach is recommended.</para>
|
|
|
2692 |
|
|
|
2693 |
<para>The latter approach of a fully abstracted user-defined API may
|
|
|
2694 |
be more time consuming in the short run, but this may well be
|
|
|
2695 |
offset by the increased ease of porting. Also there is no reason
|
|
|
2696 |
why a user-defined API, once specified, should not serve more
|
|
|
2697 |
than one program. Similar programs are likely to require the same
|
|
|
2698 |
abstractions of target dependent constructs. Because the API is a
|
|
|
2699 |
concrete object, it can be reused in this way in a very simple
|
|
|
2700 |
fashion. One could envisage libraries of private APIs being built
|
|
|
2701 |
up in this way.</para>
|
|
|
2702 |
</sect3>
|
|
|
2703 |
|
|
|
2704 |
<sect3 id="S50">
|
|
|
2705 |
<title>3.5.4. Alternative Program Versions</title>
|
|
|
2706 |
<para>Consider again the program described in section
|
|
|
2707 |
2.3.4 which has optional features for displaying its output
|
|
|
2708 |
graphically depending on the boolean value
|
|
|
2709 |
<code>HAVE_X_WINDOWS</code>. By making
|
|
|
2710 |
<code>HAVE_X_WINDOWS</code> part of the user-defined API as a
|
|
|
2711 |
tokenised integer and using:
|
|
|
2712 |
<programlisting>
|
|
|
2713 |
#if HAVE_X_WINDOWS
|
|
|
2714 |
</programlisting>to conditionally compile the X Windows code, the choice of
|
|
|
2715 |
whether or not to use this version of the program is postponed to
|
|
|
2716 |
install time. If both POSIX and X Windows are implemented on the
|
|
|
2717 |
target machine the installation is straightforward.
|
|
|
2718 |
<code>HAVE_X_WINDOWS</code> is defined to be true, and the
|
|
|
2719 |
installation proceeds as normal. The case where only POSIX is
|
|
|
2720 |
implemented appears to present problems. The TDF representing the
|
|
|
2721 |
program will contain undefined tokens representing objects from
|
|
|
2722 |
both the POSIX and X Windows APIs. Surely it is necessary to define
|
|
|
2723 |
these tokens (i.e. implement both APIs) in order to install the
|
|
|
2724 |
TDF. But because of the use of conditional compilation, all the
|
|
|
2725 |
applications of X Windows tokens will be inside <code>X_cond</code>
|
|
|
2726 |
constructs on the branch corresponding to
|
|
|
2727 |
<code>HAVE_X_WINDOWS</code> being true. If it is actually false
|
|
|
2728 |
then these branches are stepped over and completely ignored. Thus
|
|
|
2729 |
it does not matter that these tokens are undefined. Hence the
|
|
|
2730 |
conditional compilation constructs within TDF give the same
|
|
|
2731 |
flexibility in the API implementation is this case as do those in
|
|
|
2732 |
C.</para>
|
|
|
2733 |
</sect3>
|
|
|
2734 |
</sect2>
|
|
|
2735 |
</sect1>
|
|
|
2736 |
|
|
|
2737 |
<sect1>
|
|
|
2738 |
<title>4. Conclusions</title>
|
|
|
2739 |
<para>The philosophy underlying the whole TDF
|
|
|
2740 |
approach to portability is that of separation or isolation. This
|
|
|
2741 |
separation of the various components of the compilation system
|
|
|
2742 |
means that to a large extent they can be considered
|
|
|
2743 |
independently. The separation is only possible because the
|
|
|
2744 |
definition of TDF has mechanisms which facilitate it - primarily
|
|
|
2745 |
the token mechanism, but also the capsule linkage scheme.</para>
|
|
|
2746 |
|
|
|
2747 |
<para>The most important separation is that of the abstract
|
|
|
2748 |
description of the syntactic aspects of the API, in the form of
|
|
|
2749 |
the target independent headers, from the API implementation. It
|
|
|
2750 |
is this which enables the separation of target independent from
|
|
|
2751 |
target dependent code which is necessary for any Architecture
|
|
|
2752 |
Neutral Distribution Format. It also means that programs can be
|
|
|
2753 |
checked against the abstract API description, instead of against
|
|
|
2754 |
a particular implementation, allowing for effective API
|
|
|
2755 |
conformance testing of applications. Furthermore, it isolates the
|
|
|
2756 |
actual program from the API implementation, thereby allowing the
|
|
|
2757 |
programmer to work in the idealised world envisaged by the API
|
|
|
2758 |
description, rather than the real world of API implementations
|
|
|
2759 |
and all their faults.</para>
|
|
|
2760 |
|
|
|
2761 |
<para>This isolation also means that these API implementation
|
|
|
2762 |
problems are seen to be genuinely separate from the main program
|
|
|
2763 |
development. They are isolated into a single process, TDF library
|
|
|
2764 |
building, which needs to be done only once per API
|
|
|
2765 |
implementation. Because of the separation of the API description
|
|
|
2766 |
from the implementation, this library building process also
|
|
|
2767 |
serves as a conformance check for the syntactic aspects of the
|
|
|
2768 |
API implementation. However the approach is evolutionary in that
|
|
|
2769 |
it can handle the current situation while pointing the way
|
|
|
2770 |
forward. Absolute API conformance is not necessary; the TDF
|
|
|
2771 |
libraries can be used as a medium for workarounds for minor
|
|
|
2772 |
implementation errors.</para>
|
|
|
2773 |
|
|
|
2774 |
<para>The same mechanism which is used to separate the API
|
|
|
2775 |
description and implementation can also be used within an
|
|
|
2776 |
application to separate the target dependent code from the main
|
|
|
2777 |
body of target independent code. This use of user-defined APIs
|
|
|
2778 |
also enables a separation of the portability requirements of the
|
|
|
2779 |
program from the particular ways these requirements are
|
|
|
2780 |
implemented on the various target machines. Again, the approach
|
|
|
2781 |
is evolutionary, and not prescriptive. Programs can be made more
|
|
|
2782 |
portable in incremental steps, with the degree of portability to
|
|
|
2783 |
be used being made a conscious decision.</para>
|
|
|
2784 |
|
|
|
2785 |
<para>In a sense the most important contribution TDF has to portability is
|
|
|
2786 |
in enabling the various tasks of API description, API implementation and
|
|
|
2787 |
program writing to be considered independently, while showing up the
|
|
|
2788 |
relationships between them. It is often said that well specified APIs
|
|
|
2789 |
are the solution to the world's portability and interoperability
|
|
|
2790 |
problems; but by themselves they can never be. Without methods of
|
|
|
2791 |
checking the conformance of programs which use the API and of API
|
|
|
2792 |
implementations, the APIs themselves will remain toothless. TDF, by
|
|
|
2793 |
providing syntactic API checking for both programs and implementations,
|
|
|
2794 |
is a significant first step towards solving this problem.</para>
|
|
|
2795 |
</sect1>
|
|
|
2796 |
|
|
|
2797 |
<para>
|
|
|
2798 |
[1] tcc User's Guide, DRA, 1993.
|
|
|
2799 |
[2] tspec - An API Specification Tool, DRA, 1993.
|
|
|
2800 |
[3] The C to TDF Producer, DRA, 1993.
|
|
|
2801 |
[4] A Guide to the TDF Specification, DRA, 1993.
|
|
|
2802 |
[5] TDF Facts and Figures, DRA, 1993.
|
|
|
2803 |
[6] TDF Specification, DRA, 1993.
|
|
|
2804 |
[7] The 80386/80486 TDF Installer, DRA, 1992.
|
|
|
2805 |
[8] A Guide to Porting using TDF, DRA, 1993.
|
|
|
2806 |
[9] The TDF Notation Compiler, DRA, 1993.
|
|
|
2807 |
</para>
|
|
|
2808 |
</chapter>
|
|
|
2809 |
</book>
|