Subversion Repositories tendra.SVN

Rev

Rev 6 | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
6 7u83 1
<?xml version="1.0" standalone="no"?>
2
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
3
  "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
4
 
5
<!--
6
  $Id$
7
-->
8
 
9
<book>
10
  <bookinfo>
11
    <title>A Guide to the TDF Specification, Issue 4.0</title>
12
 
13
    <corpauthor>The TenDRA Project</corpauthor>
14
 
15
    <author>
16
      <firstname>Jeroen</firstname>
17
      <surname>Ruigrok van der Werven</surname>
18
    </author>
19
    <authorinitials>JRvdW</authorinitials>
20
    <pubdate>2005</pubdate>
21
 
22
    <copyright>
23
      <year>2004</year>
24
      <year>2005</year>
25
 
26
      <holder>The TenDRA Project</holder>
27
    </copyright>
28
 
29
    <copyright>
30
      <year>1998</year>
31
 
32
      <holder>DERA</holder>
33
    </copyright>
34
  </bookinfo>
35
 
36
  <chapter id="introduction">
37
    <title>Introduction</title>
38
 
39
    <para>This memo is intended to be a fairly detailed commentary on the
40
      specification of TDF, a kind of Talmud to the Torah. If it conflicts
41
      with the specification document, it is wrong. The aim is elucidate the
42
      various constructions of TDF, giving examples of usages both from the
43
      point of view of a producer of TDF and how it is used to construct
44
      programs on particular platforms using various installers or
45
      translators. In addition, some attempt is made to give the reasons why
46
      the particular constructions have been chosen. Most of the commentary is
47
      a distillation of questions and answers raised by people trying to learn
48
      TDF from the specification document.</para>
49
 
50
    <para>Throughout this document, references like (S5.1) are headings in the
51
      <ulink url="../tdf/spec1.html">TDF specification, Issue 4.0</ulink>.
52
      I use the term "compiling" or "producing" to mean the production of TDF
53
      from some source language and "translating" to mean making a program for
54
      some specific platform from TDF.</para>
55
 
56
    <para>I use the first person where I am expressing my own opinions or
57
      preferences; these should not be taken as official opinions of DRA or
58
      the TenDRA team.</para>
59
  </chapter>
60
 
61
  <chapter>
62
    <sect1 id="sorts-and-tokens">
63
      <title>SORTs and TOKENs</title>
64
 
65
      <para>In the syntax of language like C or Pascal, we find various
66
        syntactic units like &lt;Expression&gt;, &lt;Identifier&gt; etc. A
67
        SORT bears the same relation to TDF as these syntactic units bear to
68
        the language; roughly speaking, the syntactic unit &lt;Expression&gt;
69
        corresponds to the SORT EXP and &lt;Identifier&gt; to TAG . However,
70
        instead of using BNF to compose syntactic units from others, TDF uses
71
        explicit constructors to compose its SORTs; each constructor uses
72
        other pieces of TDF of specified SORTs to make a piece of its result
73
        SORT. For example, the constructor plus uses an ERROR_TREATMENT and
74
        two EXPs to make another EXP.</para>
75
 
76
      <para>At the moment, there are 58 different SORTS, from ACCESS to
77
        VARIETY given in tables 1 and 2. Some of these have familiar analogues
78
        in standard language construction as with EXP and TAG above. Others
79
        will be less familiar since TDF must concern itself with issues not
80
        normally addressed in language definitions. For example, the process
81
        of linking together TDF programs is at the root of the architecture
82
        neutrality of TDF and so must form an integral part of its definition.
83
        On the other hand, TDF is not meant to be a language readily
84
        accessible to the human reader or writer; computers handle it much
85
        more easily. Thus a great many choices have been made in the
86
        definition which would be intolerable in a standard language
87
        definition for the human programmer but which, paradoxically enough,
88
        make it much simpler for a computer to produce and analyse TDF</para>
89
 
90
      <para>The SORTs and constructors in effect form a multi-sorted algebra.
91
        There were two principal reasons for choosing this algebraic form of
92
        definition. First, it is easy to extend - a new operation on existing
93
        constructs simply requires a new constructor.  Secondly, the algebraic
94
        form is highly amenable to the automatic construction of programs.
95
        Large parts of both TDF producers and TDF translators have been
96
        created by automatic transformation of the text of the specification
97
        document itself, by extracting the algebraic signature and
98
        constructing C program which can read or produce TDF. To this extent,
99
        one can regard the specification document as a formal description of
100
        the free algebra of TDF SORTs and constructors. Of course, most of the
101
        interesting parts of the definition of TDF lies in the equivalences of
102
        parts of TDF, so this formality only covers the easy bit.</para>
103
 
104
      <para>Another distinction between the TDF definition and language
105
        syntactic description is that TDF is to some extent conscious of its
106
        own SORTs so that it can specify a new construction of a given SORT.
107
        The analogy in normal languages would be that one could define a new
108
        construction with new syntax and say this is an example of an
109
        &lt;Expression&gt;, for example; I don't know of any standard language
110
        which permits this, although those of you with a historical bent might
111
        remember Algol-N which made a valiant attempt at it. Of course, the
112
        algebraic method of description makes it much easier to specify,
113
        rather than having to give syntax to provide the syntax for the new
114
        construction in a language.</para>
115
 
116
      <sect2>
117
        <title>Token applications and first-class SORTs</title>
118
 
119
        <para>A new construction is introduced by the SORT TOKEN; the
120
          constructors involving TOKENs allow one to give an expansion for the
121
          TOKEN in terms of other pieces of TDF, possibly including
122
          parameters. We can encapsulate a (possibly parameterised) fragment
123
          of TDF of a suitable SORT by giving it a TOKEN as identification.
124
          Not all of the SORTs are available for this kind of encapsulation
125
          - only those which have a SORTNAME constructor (from access to
126
          variety).  These are the "first-class" SORTs given in table 1.  Each
127
          of these have an appropriate _apply_token constructor (e.g.
128
          exp_apply_token) give the expansion. Most of these also have _cond
129
          constructors (e.g.see exp_cond in
130
          <link linkend="cond-constructors">section 9.1</link>)
131
          which allows translate time conditional expansion of the
132
          SORT.</para>
133
 
134
        <mediaobject>
135
          <imageobject>
136
            <imagedata fileref="images/guidetable3.png" format="PNG"/>
137
          </imageobject>
138
          <textobject>
139
            <phrase>TBD</phrase>
140
          </textobject>
141
          <caption>
142
            <para>TBD</para>
143
          </caption>
144
        </mediaobject>
145
 
146
        <para>Every TOKEN has a result SORT, i.e. the SORT of its resulting
147
          expansion and before it can be expanded, one must have its parameter
148
          SORTs. Thus, you can regard a TOKEN as having a type defined by its
149
          result and parameter SORTs and the _apply_token as the operator
150
          which expands the encapsulation and substitutes the
151
          parameters.</para>
152
 
153
        <para>However, if we look at the signature of exp_apply_token:</para>
154
 
155
        <programlisting>
156
token_value: TOKEN
157
token_args:  BITSTREAM param_sorts(token_value)
158
             -&gt; EXP x
159
        </programlisting>
160
 
161
        <para>we are confronted by the mysterious BITSTREAM where one might
162
          expect to find the actual parameters of the TOKEN.</para>
163
 
164
        <para>To explain BITSTREAMs requires a diversion into the bit-encoding
165
          of TDF. Constructors for a particular SORT are represented in a
166
          number of bits depending on the number of constructors for that
167
          SORT; the context will determine the SORT required, so no more bits
168
          are required. Thus since there is only one constructor for UNITs, no
169
          bits are required to represent make_unit; there are about 120
170
          different constructors for EXPs so 7 bits are required to cover all
171
          the EXPs. The parameters of each constructor have known SORTs and so
172
          their representations are just concatenated after the representation
173
          of the constructor.
174
 
175
          <footnote><para>There are facilities to allow extensions to the
176
            number of constructors, so it is not quite as simple as
177
            this.</para></footnote>
178
 
179
          While this is a very compact representation, it suffers from the
180
          defect that one must decode it even just to skip over it. This is
181
          very irksome is some applications, notably the TDF linker which is
182
          not interested detailed expansions.  Similarly, in translators there
183
          are places where one wishes to skip over a token application without
184
          knowledge of the SORTs of its parameters. Thus a BITSTREAM is just
185
          an encoding of some TDF, preceded by the number of bits it occupies.
186
          Applications can then skip over BITSTREAMs trivially.  Similar
187
          considerations apply to BYTESTREAMs used elsewhere; here the
188
          encoding is preceded by the number of bytes in the encoding and is
189
          aligned to a byte boundary to allow fast copying.</para>
190
      </sect2>
191
 
192
      <sect2>
193
        <title>Token definitions and declarations</title>
194
 
195
        <para>Thus the <parameter>token_args</parameter> parameter of
196
          exp_apply_token is just the BITSTREAM formed from the actual
197
          parameters in the sequence described by the definition of the
198
          <parameter>token_value</parameter> parameter. This will be given in
199
          a TOKEN_DEFN somewhere with constructor token_definition:</para>
200
 
201
        <programlisting>
202
result_sort: SORTNAME
203
tok_params:  LIST(TOKFORMALS)
204
body:        result_sort
205
             -&gt; TOKEN_DEFN
206
        </programlisting>
207
 
208
        <para>The <type>result_sort</type> is the SORT of the construction of
209
          <type> body</type>; e.g. if <type>result_sort</type> is formed from
210
          exp then <type>body</type> would be constructed using the EXP
211
          constructors and one would use exp_apply_token to give the
212
          expansion. The list <type>tok_params</type> gives the formal
213
          parameters of the definition in terms of TOKFORMALS constructed
214
          using make_tok_formals:</para>
215
 
216
        <programlisting>
217
sn: SORTNAME
218
tk: TDFINT
219
    -&gt; TOKFORMALS
220
        </programlisting>
221
 
222
        <para>The TDFINT <type>tk</type> will be the integer representation of
223
          the formal parameter expressed as a TOKEN whose result sort is
224
          <type>sn</type> (see more about name representation in
225
          <link linkend="make_capsule-and-name-spaces">section 3.1</link>). To
226
          use the parameter in the body of the TOKEN_DEFN, one simply uses the
227
          _apply_token appropriate to <type>sn</type>.Note that sn may be a
228
          TOKEN but the <type>result_sort</type> may not.</para>
229
 
230
        <para>Hence the BITSTREAM
231
          <type>param_sorts</type>(<parameter>token_value</parameter>) in the
232
          actual parameter of exp_apply_token above is simply formed by the
233
          catenation of constructions of the SORTs given by the SORTNAMEs in
234
          the <type>tok_params</type> of the TOKEN being expanded.</para>
235
 
236
        <para>Usually one gives a name to a TOKEN_DEFN using to form a TOKDEF
237
          using make_tokdef :</para>
238
 
239
        <programlisting>
240
tok:       TDFINT
241
signature: OPTION(STRING)
242
def:       BITSTREAM TOKEN_DEFN
243
           -&gt; TOKDEF
244
        </programlisting>
245
 
246
        <para>Here, <type>tok</type> gives the name that will be used to
247
          identify the TOKEN whose expansion is given by <type>def</type>. Any
248
          use of this TOKEN (e.g. in exp_apply_token) will be given by
249
          make_token(<type>tok</type>). Once again, a BITSTREAM is used to
250
          encapsulate the TOKEN_DEFN.</para>
251
 
252
        <para>The significance of the signature parameter is discussed in
253
          <link linkend="declaration-and-definition-signatures">section
254
            3.2.2</link>.</para>
255
 
256
        <para>Often, one wishes a token without giving its definition - the
257
          definition could, for example, be platform-dependent. A TOKDEC
258
          introduces such a token using make_tokdec:</para>
259
 
260
        <programlisting>
261
tok:       TDFINT
262
signature: OPTION(STRING)
263
s:         SORTNAME
264
           -&gt; TOKDEC
265
        </programlisting>
266
 
267
        <para>Here the SORTNAME, <type>s</type>, is given by token:</para>
268
 
269
        <programlisting>
270
result: SORTNAME
271
params: LIST(SORTNAME)
272
        -&gt; SORTNAME
273
        </programlisting>
274
 
275
        <para>which gives the result and parameter SORTs of
276
          <type>tok</type>.</para>
277
 
278
        <para>One can also use a TOKEN_DEFN in an anonymous fashion by giving
279
          it as an actual parameter of a TOKEN which itself demands a TOKEN
280
          parameter. To do this one simply uses use_tokdef:</para>
281
 
282
        <programlisting>
283
tdef: BITSTREAM TOKEN_DEFN
284
      -&gt; TOKEN
285
        </programlisting>
286
      </sect2>
287
 
288
      <sect2>
289
        <title>A simple use of a TOKEN</title>
290
 
291
        <para>The crucial use of TOKENs in TDF is to provide abstractions of
292
          APIs (see <link linkend="tokens-and-apis">section 10</link>) but
293
          they are also used as shorthand for commonly occurring
294
          constructions. For example, given the TDF constructor plus,
295
          mentioned above, we could define a plus with only two EXP parameters
296
          more suitable to C by using the wrap constructor as the
297
          ERROR_TREATMENT:</para>
298
 
299
        <programlisting>
300
make_tokdef(C_plus, empty, token_definition(exp(), (make_tokformals(exp(), l),
301
                                                    make_tokformals(exp(), r)),
302
            plus(wrap(), exp_apply_token(l, ()), exp_apply_token(r, ())))
303
        </programlisting>
304
      </sect2>
305
 
306
      <sect2>
307
        <title>Second class SORTs</title>
308
 
309
        <para>Second class SORTs (given in table 2) cannot be
310
          TOKENised. These are the "syntactic units" of TDF which the user
311
          cannot extend; he can only produce them using the constructors
312
          defined in core-TDF.</para>
313
 
314
        <para>Some of these constructors are implicit. For example, there are
315
          no explicit constructors for LIST or SLIST which are both used to
316
          form lists of SORTs; their construction is simply part of the
317
          encoding of TDF. However, it is forseen that LIST constructors would
318
          be highly desireable and there will probably extensions to TDF to
319
          promote LIST from a second-class SORT to a first-class one. This
320
          will not apply to SLIST or to the other SORTs which have implicit
321
          constructions. These include BITSTREAM, BYTESTREAM, TDFINT, TDFIDENT
322
          and TDFSTRING.</para>
323
      </sect2>
324
    </sect1>
325
 
326
    <sect1>
327
      <title>CAPSULEs and UNITs</title>
328
 
329
      <para>A CAPSULE is typically the result of a single compilation - one
330
        could regard it as being the TDF analogue of a Unix .o file. Just as
331
        with .o files, a set of CAPSULEs can be linked together to form
332
        another. Similarly, a CAPSULE may be translated to make program for
333
        some platform, provided certain conditions are met. One of these
334
        conditions is obviously that a translator exists for the platform, but
335
        there are others. They basically state that any names that are
336
        undefined in the CAPSULE can be supplied by the system in which it is
337
        to be run. For example, the translator could produce assembly code
338
        with external identifiers which will be supplied by some system
339
        library.</para>
340
 
341
      <sect2 id="make_capsule-and-name-spaces">
342
        <title>make_capsule and name-spaces</title>
343
 
344
        <para>The only constructor for a CAPSULE is make_capsule. Its basic
345
          function is to compose together UNITs which contain the declarations
346
          and definitions of the program. The signature of make_capsule looks
347
          rather daunting and is probable best represented graphically.</para>
348
 
349
        <mediaobject>
350
          <imageobject>
351
            <imagedata fileref="images/guide2.png" format="PNG"/>
352
          </imageobject>
353
          <textobject>
354
            <phrase>TBD</phrase>
355
          </textobject>
356
          <caption>
357
            <para>TBD</para>
358
          </caption>
359
        </mediaobject>
360
 
361
        <para>The diagram gives an example of a CAPSULE using the same
362
          components as in the following text.</para>
363
 
364
        <para>Each CAPSULE has its own name-space, distinct from all other
365
          CAPSULEs' name-spaces and also from the name-spaces of its component
366
          UNITs (see <link linkened="#11">section 3.1.2</link>).  There are
367
          several different kinds of names in TDF and each name-space is
368
          further subdivided into one for each kind of name.  The number of
369
          different kinds of names is potentially unlimited but only three are
370
          used in core-TDF, namely "tag", "token" and "al_tag". Those names in
371
          a "tag" name-space generally correspond to identifiers in normal
372
          programs and I shall use these as the paradigm for the properties of
373
          them all.</para>
374
 
375
        <para>The actual representations of a "tag" name in a given name-space
376
          is an integer, described as SORT TDFINT. These integers are drawn
377
          from a contiguous set starting from 0 up to some limit given by the
378
          constructor which introduces the name-space. For CAPSULE
379
          name-spaces, this is given by the <type>capsule_linking</type>
380
          parameter of make_capsule:</para>
381
 
382
        <programlisting>
383
capsule_linking: SLIST(CAPSULE_LINK)
384
        </programlisting>
385
 
386
        <para>In the most general case in core-TDF, there would be three
387
          entries in the list introducing limits using make_capsule_link for
388
          each of the "tag", "token" and "al_tag" name-spaces for the CAPSULE.
389
          Thus if:</para>
390
 
391
        <programlisting>
392
capsule_linking = (make_capsule_link("tag", 5),
393
                   make_capsule_link("token", 6),
394
                   make_capsule_link("al_tag", 7))
395
        </programlisting>
396
 
397
        <para>there are 5 CAPSULE "tag" names used within the CAPSULE, namely
398
          0, 1, 2, 3 and 4; similarly there are 6 "token" names and 7 "al_tag"
399
          names.</para>
400
 
401
        <sect3 id="external-linkages">
402
          <title>External linkages</title>
403
 
404
          <para>The context of usage will always determine when and how an
405
            integer is to be interpreted as a name in a particular name-space.
406
            For example, a TAG in a UNIT is constructed by make_tag applied to
407
            a TDFINT which will be interpreted as a name from that UNIT's
408
            "tag" name-space. An integer representing a name in the CAPSULE
409
            name-space would be found in a LINKEXTERN of the
410
            <type>external_linkage</type> parameter of make_capsule.</para>
411
 
412
          <programlisting>
413
external_linkage: SLIST(EXTERN_LINK)
414
          </programlisting>
415
 
416
          <para>Each EXTERN_LINK is itself formed from an SLIST of LINKEXTERNs
417
            given by make_extern_link . The order of the EXTERN_LINKs
418
            determines which name-space one is dealing with; they are in the
419
            same order as given by the <type>extern_linkage</type> parameter.
420
            Thus, with the <type>extern_linkage</type> given above, the first
421
            EXTERN_LINK would deal with the "tag" name-space; Each of its
422
            component LINKEXTERNs constructed by make_linkextern would be
423
            identifying a tag number with some name external to the CAPSULE;
424
            for example one might be:</para>
425
 
426
          <programlisting>
427
make_linkextern(4, string_extern("printf"))
428
          </programlisting>
429
 
430
          <para>This would mean: identify the CAPSULE's "tag" 4 with an name
431
            called "printf", external to the module. The name "printf" would
432
            be used to linkage external to the CAPSULE; any name required
433
            outside the CAPSULE would have to be linked like
434
            this.</para></sect3>
435
 
436
        <sect3 id="units">
437
          <title>UNITs</title>
438
 
439
          <para>This name "printf", of course, does not necessarily mean the C
440
            procedure in the system library. This depends both on the system
441
            context in which the CAPSULE is translated and also the meaning of
442
            the CAPSULE "tag" name 4 given by the component UNITs of the
443
            CAPSULE in the <type>groups</type> parameter of
444
            make_capsule:</para>
445
 
446
          <programlisting>
447
groups: SLIST(GROUP)
448
          </programlisting>
449
 
450
          <para>Each GROUP in the <type>groups</type> SLIST will be formed by
451
            sets of UNITs of the same kind. Once again, there are a
452
            potentially unlimited number of kinds of UNITs but core-TDF only
453
            uses those named "tld","al_tagdefs", "tagdecs", "tagdefs",
454
            "tokdecs" and "tokdefs".
455
 
456
            <footnote><para>The "tld" UNITs gives usage information for names
457
              to aid the linker, tld, to discover which namess have
458
              definitions and some usage information. The C producer also
459
              optionally constructs "diagnostics" UNITs (to give run-time
460
              diagnostic information).</para></footnote>
461
 
462
            These names will appear (in the same order as in
463
            <type>groups</type>) in the <type>prop_names</type> parameter of
464
            make_capsule, one for each kind of UNIT appearing in the
465
            CAPSULE:</para>
466
 
467
          <programlisting>
468
prop_names: SLIST(TDFIDENT)
469
          </programlisting>
470
 
471
          <para>Thus if:</para>
472
 
473
          <programlisting>
474
prop_names = ("tagdecs", "tagdefs")
475
          </programlisting>
476
 
477
          <para>then, the first element of <type>groups</type> would contain
478
            only "tagdecs" UNITs and and the second would contain only
479
            "tagdefs" UNITs. A "tagdecs" UNIT contains things rather like a
480
            set of global identifier declarations in C, while a "tagdefs" UNIT
481
            is like a set of global definitions of identifiers.</para>
482
        </sect3>
483
 
484
        <sect3 id="make_unit">
485
          <title>make_unit</title>
486
 
487
          <para>Now we come to the construction of UNITs using make_unit, as
488
            in the diagram below
489
 
490
            <mediaobject>
491
              <imageobject>
492
                <imagedata fileref="images/guide1.png" format="PNG"/>
493
              </imageobject>
494
              <textobject>
495
                <phrase>TBD</phrase>
496
              </textobject>
497
              <caption>
498
                <para>TBD</para>
499
              </caption>
500
            </mediaobject>
501
          </para>
502
 
503
          <para>First we give the limits of the various name-spaces local to
504
            the UNIT in the <type>local_vars</type> parameter:</para>
505
 
506
          <programlisting>
507
local_vars: SLIST(TDFINT)
508
          </programlisting>
509
 
510
          <para>Just in the same way as with <type>external_linkage</type>,
511
            the numbers in local_vars correspond (in the same order) to the
512
            spaces indicated in <type>capsule_linking</type> in
513
            <link linkend="make_capsule-and-name-spaces">section 3.1</link>.
514
            With our example,the first element of <type>local_vars</type>
515
            gives the number of "tag" names local to the UNIT, the second
516
            gives the number of "token" names local to the UNIT etc. These
517
            will include all the names used in the body of the UNIT. Each
518
            declaration of a TAG, for example, will use a new number from the
519
            "tag" name-space; there is no hiding or reuse of names within a
520
            UNIT.</para>
521
        </sect3>
522
 
523
        <sect3 id="link">
524
          <title>LINK</title>
525
 
526
          <para>Connections between the CAPSULE name-spaces and the UNIT
527
            name-spaces are made by LINKs in the <parameter>lks</parameter>
528
            parameter of make_unit:</para>
529
 
530
          <programlisting>
531
lks: SLIST(LINKS)
532
          </programlisting>
533
 
534
          <para>Once again, <type>lks</type> is effectively indexed by the
535
            kind of name-space a. Each LINKS is an SLIST of LINKs each of
536
            which which establish an identity between names in the CAPSULE
537
            name-space and names in the UNIT name-space. Thus if the first
538
            element of <type>lks</type> contains:</para>
539
 
540
          <programlisting>
541
make_link(42, 4)
542
          </programlisting>
543
 
544
          <para>then, the UNIT "tag" 42 is identical to the CAPSULE "tag"
545
            4.</para>
546
 
547
          <para>Note that names from the CAPSULE name-space only arise in two
548
            places, LINKs and LINK_EXTERNs. Every other use of names are
549
            derived from some UNIT name-space.</para>
550
        </sect3>
551
      </sect2>
552
 
553
      <sect2 id="definitions-and-declarations">
554
        <title>Definitions and declarations</title>
555
 
556
        <para>The encoding in the <type>properties</type>:BYTESTREAM parameter
557
          of a UNIT is a PROPS, for which there are five constructors
558
          corresponding to the kinds of UNITs in core-TDF, make_al_tagdefs,
559
          make_tagdecs, make_tagdefs, make_tokdefs and make_tokdecs. Each of
560
          these will declare or define names in the appropriate UNIT
561
          name-space which can be used by make_link in the UNIT's
562
          <type>lks</type> parameter as well as elsewhere in the
563
          <type>properties</type> parameter. The distinction between
564
          "declarations" and "definitions" is rather similar to C usage; a
565
          declaration provides the "type" of a name, while a definition gives
566
          its meaning. For tags, the "type" is the SORT SHAPE (see below). For
567
          tokens, the "type" is a SORTNAME constructed from the SORTNAMEs of
568
          the parameters and result of the TOKEN using token:
569
 
570
          <programlisting>
571
params: LIST(SORTNAME)
572
result: SORTNAME
573
        -&gt; SORTNAME
574
          </programlisting>
575
 
576
          Taking make_tagdefs as a paradigm for PROPS, we have:
577
 
578
          <programlisting>
579
no_labels: TDFINT
580
tds:       SLIST(TAGDEF)
581
           -&gt; TAGDEF_PROPS
582
          </programlisting>
583
 
584
          The <type>no_labels</type> parameter introduces the size of yet
585
          another name-space local to the PROPS, this time for the LABELs used
586
          in the TAGDEFs. Each TAGDEF in <type>tds</type> will define a "tag"
587
          name in the UNIT's name-space. The order of these TAGDEFs is
588
          immaterial since the initialisations of the tags are values which
589
          can be solved at translate time, load time or as unordered dynamic
590
          initialisations.</para>
591
 
592
        <para>There are three constructors for TAGDEFs, each with slightly
593
          different properties. The simplest is make_id_tagdef:</para>
594
 
595
        <programlisting>
596
t:         TDFINT
597
signature: OPTION(STRING)
598
e:         EXP x
599
           -&gt; TAGDEF
600
        </programlisting>
601
 
602
        <para>Here, <type>t</type> is the tag name and the evaluation of
603
          <type>e</type> will be the value of SHAPE <type>x</type> of an
604
          obtain_tag(<type>t</type>) in an EXP. Note that t is not a variable;
605
          the value of obtain_tag(<type>t</type>) will be invariant. The
606
          <type>signature</type> parameter gives a STRING (see
607
          <link linkend="sort-string">section 3.2.3</link>) which may be used
608
          as an name for the tag, external to TDF and also as a check
609
          introduced by the producer that a tagdef and its corresponding
610
          tagdec have the same notion of the language-specific type of the
611
          tag.</para>
612
 
613
        <para>The two other constructors for TAGDEF, make_var_tagdef and
614
          common_tagdef both define variable tags and have the same
615
          signature:</para>
616
 
617
        <programlisting>
618
t:          TDFINT
619
opt_access: OPTION(ACCESS)
620
signature:  OPTION(STRING)
621
e:          EXP x
622
            -&gt; TAGDEF
623
        </programlisting>
624
 
625
        <para>Once again <type>t</type> is tag name but now <type>e</type> is
626
          initialisation of the variable <type>t</type>. A use of
627
          obtain_tag(<type>t</type>) will give a pointer to the variable (of
628
          SHAPE POINTER x), rather than its contents.
629
 
630
          <footnote><para>There is a similar distinction between tags
631
            introduced to be locals of a procedure using identify and variable
632
            (see <link linkend="identify-variable">section 5.3.1</link>).</para>
633
          </footnote>
634
 
635
          There can only be one make_var_tagdef of a given tag in a program,
636
          but there may be more than one common_tagdef, possibly with
637
          different initialisations; however these initialisations must
638
          overlap consistently just as in common blocks in FORTRAN.</para>
639
 
640
        <para>The ACCESS parameter gives various properties required for the
641
          tag being defined and is discussed in
642
          <link linkend="sort-access">section 5.3.2</link>.</para>
643
 
644
        <para>The initialisation EXPs of TAGDEFs will be evaluated before the
645
          "main" program is started. An initialiation EXP must either be a
646
          constant (in the sense of
647
          <link linkend="constants">section 9</link>) or reduce to (either
648
          directly or by token or _cond expansions) to an initial_value:
649
 
650
          <programlisting>
651
init: EXP s
652
      -&gt; EXP s
653
          </programlisting>
654
 
655
          The translator will arrange that <type>init</type> will be evaluated
656
          once only before any procedure application, other than those
657
          themselves involved in initial_values, but after any constant
658
          initialisations. The order of evaluation of different initial_values
659
          is arbitrary.</para>
660
 
661
        <sect3 id="scopes-and-linking">
662
          <title>Scopes and linking</title>
663
 
664
          <para>Only names introduced by AL_TAGDEFS, TAGDEFS, TAGDECs, TOKDECs
665
            and TOKDEFs can be used in other UNITs (and then, only via the
666
            <type>lks</type> parameters of the UNITs involved). You can regard
667
            them as being similar to C global declarations. Token definitions
668
            include their declarations implicitly; however this is not true of
669
            tags. This means that any CAPSULE which uses or defines a tag
670
            across UNITs must include a TAGDEC for that tag in its "tagdecs"
671
            UNITs. A TAGDEC is constructed using either make_id_tagdec,
672
            make_var_tagdec or common_tagdec, all with the same form:
673
 
674
          <programlisting>
675
t_intro:   TDFINT
676
acc:       OPTION(ACCESS)
677
signature: OPTION(STRING)
678
x:         SHAPE
679
           -&gt; TAGDEC
680
          </programlisting>
681
 
682
          Here the tagname is given by <type>t_intro</type>; the SHAPE
683
          <type>x</type> will defined the space and alignment required for the
684
          tag (this is analogous to the type in a C declaration). The
685
          <type>acc</type> field will define certain properties of the tag not
686
          implicit in its SHAPE; I shall return to the kinds of properties
687
          envisaged in discussing local declarations in
688
          <link linkend="defining-and-using-locals">section 5.3</link>.</para>
689
 
690
        <para>Most program will appear in the "tagdefs" UNITs - they will
691
          include the definitions of the procedures of the program which in
692
          turn will include local definitions of tags for the locals of the
693
          procedures.</para>
694
 
695
        <para>The standard TDF linker allows one to link CAPSULEs together
696
          using the name identifications given in the LINKEXTERNs, perhaps
697
          hiding some of them in the final CAPSULE. It does this just by
698
          generating a new CAPSULE name-space, grouping together component
699
          UNITs of the same kind and replacing their
700
          <parameter>lks</parameter> parameters with values derived from the
701
          new CAPSULE name-space without changing the UNITs' name-spaces or
702
          their <parameter>props</parameter> parameters.  The operation of
703
          grouping together UNITs is effectively assumed to be associative,
704
          commutative and idempotent e.g. if the same tag is declared in two
705
          capsules it is assumed to be the same thing . It also means that
706
          there is no implied order of evaluation of UNITs or of their
707
          component TAGDEFs</para>
708
 
709
        <para>Different languages have different conventions for deciding how
710
          programs are actually run. For example, C requires the presence of a
711
          suitably defined "main" procedure; this is usually enforced by
712
          requiring the system ld utility to bind the name "main" along with
713
          the definitions of any library values required.  Otherwise, the C
714
          conventions are met by standard TDF linking.  Other languages have
715
          more stringent requirements. For example, C++ requires dynamic
716
          initialisation of globals, using initial_value. As the only runnable
717
          code in TDF is in procedures, C++ would probably require an
718
          additional linking phase to construct a "main" procedure which calls
719
          the initialisation procedures of each CAPSULE involved if the system
720
          linker did not provide suitable C++ linking.</para>
721
        </sect3>
722
 
723
        <sect3 id="declaration-and-definition-signatures">
724
          <title>Declaration and definition signatures</title>
725
 
726
          <para>The <type>signature</type> arguments of TAGDEFs and TAGDECs
727
            are designed to allow a measure of cross-UNIT checking when
728
            linking independently compiled CAPSULEs. Suppose that we have a
729
            tag, <type>t</type>, used in one CAPSULE and defined in another;
730
            the first CAPSULE would have to have a TAGDEC for <type>t</type>
731
            whose TAGDEF is in the second. The <type>signature</type> STRING
732
            of both could be arranged to represent the language-specific type
733
            of <type>t</type> as understood at compilation-time. Clearly, when
734
            the CAPSULEs are linked the types must be identical and hence
735
            their STRING representation must be the same - a translator will
736
            reject any attempt to link definitions and declarations of the
737
            same object with different signatures.</para>
738
 
739
          <para>Similar considerations apply to TOKDEFs and TOKDECs; the
740
            "type" of a TOKEN may not have any familiar analogue in most HLLs,
741
            but the principle remains the same.</para>
742
        </sect3>
743
 
744
        <sect3 id="sort-string">
745
          <title>STRING</title>
746
 
747
          <para>The SORT STRING is used in various constructs other than
748
            declarations and definitions. It is a first-class SORT with
749
            string_apply_token and string_cond. A primitive STRING is
750
            constructed from a TDFSTRING(k,n) which is an encoding of n
751
            integers,each of k bits, using make_string:
752
 
753
            <programlisting>
754
arg: TDFSTRING(k, n)
755
     -&gt; STRING(k, n)
756
            </programlisting>
757
 
758
            STRINGs may be concatenated using concat_string:
759
 
760
            <programlisting>
761
arg1: STRING(k, n)
762
arg2: STRING(k,m)
763
      -&gt; STRING(k, n + m)
764
            </programlisting>
765
 
766
            Being able to compose strings, including token applications etc,
767
            means that late-binding is possible in <type>signature</type>
768
            checking in definitions and declarations. This late-binding means
769
            that the representation of platform-dependent HLL types need only
770
            be fully expanded at install-time and hence the types could be
771
            expressed in their representational form on the specific
772
            platform.</para>
773
        </sect3>
774
      </sect2>
775
    </sect1>
776
 
777
    <sect1>
778
      <title>SHAPEs, ALIGNMENTs and OFFSETs.</title>
779
 
780
      <para>In most languages there is some notion of the type of a value.
781
        This is often an uncomfortable mix of a definition of a representation
782
        for the value and a means of choosing which operators are applicable
783
        to the value. The TDF analogue of the type of value is its SHAPE
784
        (S3.20). A SHAPE is only concerned with the representation of a value,
785
        being an abstraction of its size and alignment properties.  Clearly an
786
        architecture-independent representation of a program cannot say, for
787
        example, that a pointer is 32 bits long; the size of pointers has to
788
        be abstracted so that translations to particular architectures can
789
        choose the size that is apposite for the platform.</para>
790
 
791
      <sect2 id="S19">
792
        <title>Shapes</title>
793
 
794
        <para>There are ten different basic constructors for the SORT SHAPE
795
          from bitfield to top as shown in table 3.  SHAPEs arising from those
796
          constructors are used as qualifiers (just using an upper case
797
          version of the constructor name) to various SORTs in the definition;
798
          for example, EXP TOP is an expression with top SHAPE. This is just
799
          used for definitional purposes only; there is no SORT SHAPENAME as
800
          one has SORTNAME.</para>
801
 
802
        <para>In the TDF specification of EXPs, you will observe that all EXPs
803
          in constructor signatures are all qualified by the SHAPE name; for
804
          example, a parameter might be EXP INTEGER(v). This merely means that
805
          for the construct to be meaningful the parameter must be derived
806
          from a constructor defined to be an EXP INTEGER(v). You might be
807
          forgiven for assuming that TDF is hence strongly-typed by its
808
          SHAPEs. This is not true; the producer must get it right. There are
809
          some checks in translators, but these are not exhaustive and are
810
          more for the benefit of translator writers than for the user. A tool
811
          for testing the SHAPE correctness of a TDF program would be useful
812
          but has yet to be written.</para>
813
 
814
        <sect3 id="S20">
815
          <title>TOP, BOTTOM, LUB</title>
816
 
817
          <para>Two of the SHAPE constructions are rather specialised; these
818
            are TOP and BOTTOM. The result of any expression with a TOP shape
819
            will always be discarded; examples are those produced by assign
820
            and integer_test . A BOTTOM SHAPE is produced by an expression
821
            which will leave the current flow of control e.g. goto . The
822
            significance of these SHAPEs only really impinges on the
823
            computation of the shapes of constructs which have alternative
824
            expressions as results. For example, the result of conditional is
825
            the result of one of its component expressions. In this case, the
826
            SHAPE of the result is described as the LUB of the SHAPEs of the
827
            components. This simply means that if one of the component SHAPEs
828
            is TOP then the resulting SHAPE is TOP; if one is BOTTOM then the
829
            resulting SHAPE is the SHAPE of the other; otherwise both
830
            component SHAPEs must be equal and is the resulting SHAPE. Since
831
            this operation is associative, commutative and idempotent, we can
832
            speak quite unambiguously of the LUB of several SHAPEs.</para>
833
        </sect3>
834
 
835
        <sect3 id="S21">
836
          <title>INTEGER</title>
837
 
838
          <para>Integer values in TDF have shape INTEGER(v) where v is of SORT
839
            VARIETY. The constructor for this SHAPE is integer with a VARIETY
840
            parameter.  The basic constructor for VARIETY is var_limits which
841
            has a pair of signed natural numbers as parameters giving the
842
            limits of possible values that the integer can attain. The SHAPE
843
            required for a 32 bit signed integer would be:
844
 
845
            <programlisting>
846
integer(var_limits(-2<superscript>31</superscript>, 2<superscript>31</superscript> - 1))
847
            </programlisting>
848
 
849
            while an unsigned char is:
850
 
851
            <programlisting>
852
integer(var_limits(0, 255))
853
            </programlisting>
854
 
855
            A translator should represent each integer variety by an object
856
            big enough (or bigger) to contain all the possible values with
857
            limits of the VARIETY. That being said, I must confess that most
858
            current translators do not handle integers of more than the
859
            maximum given naturally by the target architecture, but this will
860
            be rectified in due course.
861
 
862
          <para>The other way of constructing a VARIETY is to specify the
863
            number of bits required for its 2s-complemennt representation
864
            using var_width:</para>
865
 
866
            <programlisting>
867
signed_width: BOOL
868
width:        NAT
869
              -&gt; VARIETY
870
            </programlisting>
871
          </para>
872
        </sect3>
873
 
874
        <sect3 id="S22">
875
          <title>FLOATING and complex</title>
876
 
877
          <para>Similarly, floating point and complex numbers have shape
878
            FLOATING qualified by a FLOATING_VARIETY.</para>
879
 
880
          <para>A FLOATING_VARIETY for a real number is constructed using
881
            fvar_parms:</para>
882
 
883
          <programlisting>
884
base:              NAT
885
mantissa_digits:   NAT
886
minimum_exponent:  NAT
887
maximum_exponent:: NAT
888
                   -&gt; FLOATING_VARIETY
889
          </programlisting>
890
 
891
          <para>A FLOATING_VARIETY specifies the base, number of mantissa
892
            digits, and maximum and minimum exponent. Once again, it is
893
            intended that the translator will choose a representation which
894
            will contain all possible values, but in practice only those which
895
            are included in IEEE float, double and extended are actually
896
            implemented.</para>
897
 
898
          <para>Complex numbers have a floating variety constructed by
899
            complex_parms which has the the same signature as fvar_parms. The
900
            representation of these numbers is likely to be a pair of real
901
            numbers each defined as if by fvar_parms with the same arguments.
902
            The real and imaginary parts of of a complex number can be
903
            extracted using real_part and imaginary_part; these could have
904
            been injected ito the complex number using make_complex or any of
905
            the complex operations. Many translators will simply transform
906
            complex numbers into COMPOUNDs consisting of two floating point
907
            numbers, transforming the complex operations into floating point
908
            operations on the fields.</para>
909
        </sect3>
910
 
911
        <sect3 id="S23">
912
          <title>BITFIELD</title>
913
 
914
          <para>A number of contiguous bits have shape BITFIELD, qualified by
915
            a BITFIELD_VARIETY (S3.4) which gives the number of bits involved
916
            and whether these bits are to be treated as signed or unsigned
917
            integers. Current translators put a maximum of 32 or 64 on the
918
            number of bits.</para>
919
        </sect3>
920
 
921
        <sect3 id="S24">
922
          <title>PROC</title>
923
 
924
          <para>The representational SHAPEs of procedure values is given by
925
            PROC with constructor proc . I shall return to this in the
926
            description of the operations which use it.</para>
927
        </sect3>
928
 
929
        <sect3 id="S25">
930
          <title>Non-primitive SHAPEs</title>
931
 
932
          <para>The construction of the other four SHAPEs involves either
933
            existing SHAPEs or the alignments of existing SHAPEs.  These are
934
            constructed by compound, nof, offset and pointer.  Before
935
            describing these, we require a digression into what is meant by
936
            alignments and offsets.</para>
937
        </sect3>
938
      </sect2>
939
 
940
      <sect2>
941
        <title>Alignments</title>
942
 
943
        <para>In most processor architectures there are limitations on how one
944
          can address particular kinds of objects in convenient ways. These
945
          limitations are usually defined as part of the ABI for the
946
          processor. For example, in the MIPs processor the fastest way to
947
          access a 32-bit integer is to ensure that the address of the integer
948
          is aligned on a 4-byte boundary in the address space; obviously one
949
          can extract a mis-aligned integer but not in one machine
950
          instruction. Similarly, 16-bit integers should be aligned on a
951
          2-byte boundary. In principle, each primitive object could have
952
          similar restrictions for efficient access and these restrictions
953
          could vary from platform to platform. Hence, the notion of alignment
954
          has to be abstracted to form part of the architecture independent
955
          TDF - we cannot assume that any particular alignment regime will
956
          hold universally.</para>
957
 
958
        <para>The abstraction of alignments clearly has to cover compound
959
          objects as well as primitive ones like integers. For example, if a
960
          field of structure in C is to be accessed efficiently, then the
961
          alignment of the field will influence the alignment of the structure
962
          as whole; the structure itself could be a component of a larger
963
          object whose alignment must then depend on the alignment of the
964
          structure and so on. In general, we find that a compound alignment
965
          is given by the maximum alignment of its components, regardless of
966
          the form of the compound object e.g. whether it is a structure,
967
          union, array or whatever.</para>
968
 
969
        <para>This gives an immediate handle on the abstraction of the
970
          alignment of a compound object - it is just the set of abstractions
971
          of the alignments of its components. Since "maximum" is associative,
972
          commutative and idempotent, the component sets can be combined using
973
          normal set-union rules. In other words, a compound alignment is
974
          abstracted as the set of alignments of the primitive objects which
975
          make up the compound object. Thus the alignment abstraction of a C
976
          structure with only float fields is the singleton set containing the
977
          alignment of a float while that of a C union of an int and this
978
          structure is a pair of the alignments of an int and a float.</para>
979
 
980
        <sect3 id="alignement-constructors">
981
          <title>ALIGNMENT constructors</title>
982
 
983
          <para>The TDF abstraction of an alignment has SORT ALIGNMENT. The
984
            constructor, unite_alignments, gives the set-union of its
985
            ALIGNMENT parameters; this would correspond to taking a maximum of
986
            two real alignments in the translator.</para>
987
 
988
          <para>The constructor, alignment, gives the ALIGNMENT of a given
989
            SHAPE according to the rules given in the definition. These rules
990
            effectively <type>define</type> the primitive ALIGNMENTs as in the
991
            ALIGNMENT column of table 3. Those for PROC, all OFFSETs and all
992
            POINTERs are constants regardless of any SHAPE qualifiers. Each of
993
            the INTERGER VARIETYs, each of the FLOATING VARIETYs and each of
994
            the BITFIELD VARIETYs have their own ALIGNMENTs. These ALIGNMENTs
995
            will be bound to values apposite to the particular platform at
996
            translate-time. The ALIGNMENT of TOP is conventionally taken to be
997
            the empty set of ALIGNMENTs (corresponding to the minimum
998
            alignment on the platform).</para>
999
 
1000
          <para>The alignment of a procedure parameter clearly has to include
1001
            the alignment of its SHAPE; however, most ABIs will mandate a
1002
            greater alignment for some SHAPEs e.g. the alignment of a byte
1003
            parameter is usually defined to be on a 32-bit rather than an
1004
            8-bit boundary. The constructor, parameter_alignment, gives the
1005
            ALIGNMENT of a parameter of given SHAPE.</para>
1006
        </sect3>
1007
 
1008
        <sect3 id="S28">
1009
          <title>Special alignments</title>
1010
 
1011
          <para>There are several other special ALIGNMENTs.</para>
1012
 
1013
          <para>The alignment of a code address is {<type>code</type>} given
1014
            by code_alignment; this will be the alignment of a pointer given
1015
            by make_local_lv giving the value of a label.</para>
1016
 
1017
          <para>The other special ALIGNMENTs are considered to include all of
1018
            the others, but remain distinct. They are all concerned with
1019
            offsets and pointers relevant to procedure frames, procedure
1020
            parameters and local allocations and are collectively known as
1021
            frame alignments. These frame alignments differ from the normal
1022
            alignments in that their mapping to a given architecture is rather
1023
            more than just saying that it describes some n-bit boundary. For
1024
            example, alloca_alignment describes the alignment of dynamic space
1025
            produced by local_alloc (roughly the C alloca).  Now, an ABI could
1026
            specify that the alloca space is a stack disjoint from the normal
1027
            procedure stack; thus manipulations of space at alloca_alignment
1028
            may involve different code to space generated in other
1029
            ways.</para>
1030
 
1031
          <para>Similar considerations apply to the other special alignments,
1032
            callees_alignment(b), callers_alignment(b) and locals_alignment.
1033
            The first two give the alignments of the bases of the two
1034
            different parameter spaces in procedures (q.v.) and
1035
            locals_alignment gives the alignment of the base of locally
1036
            declared tags within a procedure. The exact interpretation of
1037
            these depends on how the frame stack is defined in the target ABI,
1038
            e.g. does the stack grow downwards or upwards?</para>
1039
 
1040
          <para>The final special alignment is var_param_alignment. This
1041
            describes the alignment of a special kind of parameter to a
1042
            procedure which can be of arbitrary length (see
1043
            <link linkend="vartag-varparam">section 5.1.1</link>).</para>
1044
        </sect3>
1045
 
1046
        <sect3 id="al_tag-make_al_tagdef">
1047
          <title>AL_TAG, make_al_tagdef</title>
1048
 
1049
          <para>Alignments can also be named as AL_TAGs using make_al_tagdef.
1050
            There is no corresponding make_al_tagdec since AL_TAGs are
1051
            implicitly declared by their constructor, make_al_tag. The main
1052
            reason for having names for alignments is to allow one to resolve
1053
            the ALIGNMENTs of recursive data structures. If, for example, we
1054
            have mutually recursive structures, their ALIGNMENTs are best
1055
            named and given as a set of equations formed by AL_TAGDEFs. A
1056
            translator can then solve these equations trivially by
1057
            substitution; this is easy because the only significant operation
1058
            is set-union.</para>
1059
        </sect3>
1060
      </sect2>
1061
 
1062
      <sect2 id="pointer-and-offset-shapes">
1063
        <title>Pointer and offset SHAPEs</title>
1064
 
1065
        <para>A pointer value must have a form which reflects the alignment of
1066
          the object that it points to; for example, in the MIPs processor,
1067
          the bottom two bits of a pointer to an integer must be zero. The TDF
1068
          SHAPE for a pointer is POINTER qualified by the ALIGNMENT of the
1069
          object pointed to.  The constructor pointer uses this alignment to
1070
          make a POINTER SHAPE.</para>
1071
 
1072
        <sect3 id="offset">
1073
          <title>OFFSET</title>
1074
 
1075
          <para>Expressions which give sizes or offsets in TDF have an OFFSET
1076
            SHAPE. These are always described as the difference between two
1077
            pointers.  Since the alignments of the objects pointed to could be
1078
            different, an OFFSET is qualified by these two ALIGNMENTs. Thus an
1079
            EXP OFFSET(X,Y) is the difference between an EXP POINTER(X) and an
1080
            EXP POINTER(Y). In order for the alignment rules to apply, the set
1081
            X of alignments must include Y. The constructor offset uses two
1082
            such alignments to make an OFFSET SHAPE. However, many instances
1083
            of offsets will be produced implicitly by the offset arithmetic,
1084
            e.g., offset_pad:
1085
 
1086
            <programlisting>
1087
a:    ALIGNMENT
1088
arg1: EXP OFFSET(z, t)
1089
      -&gt; EXP OFFSET(z xc8 a, a)
1090
            </programlisting>
1091
 
1092
            This gives the next OFFSET greater or equal to <type>arg1</type>
1093
            at which an object of ALIGNMENT <parameter>a</parameter> can be
1094
            placed. It should be noted that the calculation of shapes and
1095
            alignments are all translate-time activities; only EXPs should
1096
            produce runnable code.  This code, of course, may depend on the
1097
            shapes and alignments involved; for example, offset_pad might
1098
            round up <type>arg1</type> to be a multiple of four bytes if
1099
            <parameter>a</parameter> was an integer ALIGNMENT and
1100
            <parameter>z</parameter> was a character ALIGNMENT. Translators
1101
            also do extensive constant analysis, so if <type>arg1</type> was a
1102
            constant offset, then the round-off would be done at
1103
            translate-time to produce another constant.</para>
1104
        </sect3>
1105
      </sect2>
1106
 
1107
      <sect2>
1108
        <title>Compound SHAPEs</title>
1109
 
1110
        <para>The alignments of compound SHAPEs (i.e. those arising from the
1111
          constructors compound and nof) are derived from the constructions
1112
          which produced the SHAPE. To take the easy one first, the
1113
          constructor nof has signature:
1114
 
1115
          <programlisting>
1116
n: NAT
1117
s: SHAPE
1118
   -&gt; SHAPE
1119
          </programlisting>
1120
 
1121
          This SHAPE describes an array of <type>n</type> values all of SHAPE
1122
          <type>s</type>; note that <type>n</type> is a natural number and
1123
          hence is a constant known to the producer. Throughout the definition
1124
          this is referred to as the SHAPE NOF(n, s). The ALIGNMENT of such a
1125
          value is alignment(s); i.e. the alignment of an array is just the
1126
          alignment of its elements.</para>
1127
 
1128
        <para>The other compound SHAPEs are produced using compound:</para>
1129
 
1130
        <programlisting>
1131
sz: EXP OFFSET(x, y)
1132
    -&gt; SHAPE
1133
        </programlisting>
1134
 
1135
        <para>The <type>sz</type> parameter gives the minimum size which can
1136
          accommodate the SHAPE.</para>
1137
 
1138
        <sect3 id="offset-arithmetic">
1139
          <title>Offset arithmetic with compound shapes</title>
1140
 
1141
          <para>The constructors offset_add, offset_zero and shape_offset are
1142
            used together with offset_pad to implement (<emphasis>inter
1143
              alia</emphasis>) selection from structures represented by
1144
            COMPOUND SHAPEs. Starting from the zero OFFSET given by
1145
            offset_zero, one can construct an EXP which is the offset of a
1146
            field by padding and adding offsets until the required field is
1147
            reached. The value of the field required could then be extracted
1148
            using component or add_to_ptr. Most producers would define a TOKEN
1149
            for the EXP OFFSET of each field of a structure or union used in
1150
            the program simply to reduce the size of the TDF.</para>
1151
 
1152
          <para>The SHAPE of a C structure consisting of an char followed by
1153
            an int would require <varname>x</varname> to be the set consisting
1154
            of two INTEGER VARIETYs, one for int and one for char, and
1155
            <type>sz</type> would probably have been constructed like:</para>
1156
 
1157
          <programlisting>
1158
sz = offset_add(offset_pad(int_al, shape_offset(char)), shape_offset(int))
1159
          </programlisting>
1160
 
1161
          <para>The various rules for the ALIGNMENT qualifiers of the OFFSETs
1162
            give the required SHAPE; these rules also ensure that offset
1163
            arithmetic can be implemented simply using integer arithmetic for
1164
            standard architectures (see
1165
            <link linkend="model-for-32bit-architecture">section 13.1</link>).
1166
            Note that the OFFSET computed here is the minimum size for the
1167
            SHAPE. This would not in general be the same as the difference
1168
            between successive elements of an array of these structures which
1169
            would have SHAPE OFFSET(<parameter>x</parameter>,
1170
            <parameter>x</parameter>) as produced by
1171
            offset_pad(<parameter>x</parameter>, <parameter>sz</parameter>).
1172
            For examples of the use of OFFSETs to access and create
1173
            structures, see
1174
            <link linkend="tdf-expansions-of-offsets">section 12</link>.</para>
1175
        </sect3>
1176
 
1177
        <sect3 id="S34">
1178
          <title>offset_mult</title>
1179
 
1180
          <para>In C, all structures have size known at translate-time. This
1181
            means that OFFSETs for all field selections of structures and
1182
            unions are translate-time constants; there is never any need to
1183
            produce code to compute these sizes and offsets. Other languages
1184
            (notably Ada) do have variable size structures and so sizes and
1185
            offsets within these structures may have to be computed
1186
            dynamically. Indexing in C will require the computation of dynamic
1187
            OFFSETs; this would usually be done by using offset_mult to
1188
            multiply an offset expression representing the stride by an
1189
            integer expression giving the index:
1190
 
1191
            <programlisting>
1192
arg1: EXP OFFSET(x, x)
1193
arg2: EXP INTEGER(v)
1194
      -&gt; EXP OFFSET(x, x)
1195
            </programlisting>
1196
 
1197
            and using add_to_ptr with a pointer expression giving the base of
1198
            the array with the resulting OFFSET.</para>
1199
        </sect3>
1200
 
1201
        <sect3 id="S35">
1202
          <title>OFFSET ordering and representation</title>
1203
 
1204
          <para>There is an ordering defined on OFFSETs with the same
1205
            alignment qualifiers, as given by offset_test and offset_max
1206
            having properties like:
1207
 
1208
            <programlisting>
1209
shape_offset(S) xb3  offset_zero(alignment(S))
1210
A xb3  B        iff offset_max(A,B) = A
1211
offset_add(A, B) xb3  A         where B xb3  offset_zero(some compatible alignment)
1212
            </programlisting>
1213
 
1214
            In most machines, OFFSETs would be represented as single integer
1215
            values with the OFFSET ordering corresponding to simple integer
1216
            ordering. The offset_add constructor just translates to simple
1217
            addition with offset_zero as 0 with similar correspondences for
1218
            the other offset constructors. You might well ask why TDF does not
1219
            simply use integers for offsets, instead of introducing the rather
1220
            complex OFFSET SHAPE. The reasons are two-fold. First, following
1221
            the OFFSET arithmetic rules concerned with the ALIGNMENT
1222
            qualifiers will ensure that one never extracts a value from a
1223
            pointer with the wrong alignment by, for example, applying
1224
            contents to an add_to_pointer. This frees TDF from having to
1225
            define the effect of strange operations like forming a float by
1226
            taking the contents of a pointer to a character which may be
1227
            mis-aligned with respect to floats - a heavy operation on most
1228
            processors. The second reason is quite simple; there are machines
1229
            which cannot represent OFFSETs by a single integer value.</para>
1230
 
1231
          <para>The iAPX-432 is a fairly extreme example of such a machine; it
1232
            is a "capability" machine which must segregate pointer values and
1233
            non-pointer values into different spaces. On this machine a value
1234
            of SHAPE POINTER({<type>pointer</type>, int}) (e.g. a pointer to a
1235
            structure containing both integers and pointers) could have two
1236
            components; one referring to the pointers and another to the
1237
            integers. In general, offsets from this pointer would also have
1238
            two components, one to pick out any pointer values and the other
1239
            the integer values. This would obviously be the case if the
1240
            original POINTER referred to an array of structures containing
1241
            both pointers and integers; an offset to an element of the array
1242
            would have SHAPE OFFSET({<type>pointer</type>, int},
1243
            {<type>pointer</type>, int}); both elements of the offset would
1244
            have to be used as displacements to the corresponding elements of
1245
            the pointer to extract the structure element. The OFFSET ordering
1246
            is now given by the comparison of both displacements. Using this
1247
            method, one finds that pointers in store to non-pointer alignments
1248
            are two words in different blocks and pointers to
1249
            pointer-alignments are four words, two in one block and two in
1250
            another. This sounds a very unwieldy machine compared to normal
1251
            machines with linear addressing. However, who knows what similar
1252
            strange machines will appear in future; the basic conflicts
1253
            between security, integrity and flexibility that the iAPX-432
1254
            sought to resolve are still with us. For more on the modelling of
1255
            pointers and offsets see
1256
            <link linkend="models-of-the-tdf-algebra">section 13</link>.
1257
          </para>
1258
        </sect3>
1259
      </sect2>
1260
 
1261
      <sect2 id="bitfield-alignments">
1262
        <title>BITFIELD alignments</title>
1263
 
1264
        <para>Even in standard machines, one finds that the size of a pointer
1265
          may depend on the alignment of the data pointed at. Most machines do
1266
          not allow one to construct pointers to bits with the same facility
1267
          as other alignments. This usually means that pointers in memory to
1268
          BITFIELD VARIETYs must be implemented as two words with an address
1269
          and bit displacement.  One might imagine that a translator could
1270
          implement BITFIELD alignments so that they are the same as the
1271
          smallest natural alignment of the machine and avoid the bit
1272
          displacement, but this is not the intention of the definition. On
1273
          any machine for which it is meaningful, the alignment of a BITFIELD
1274
          must be one bit; in other words successive BITFIELDs are butted
1275
          together with no padding bits.
1276
 
1277
          <footnote>
1278
            <para>Note that is not generally true for C bitfields; most C ABIs
1279
              have (different) rules for putting in padding bits depending on
1280
              the size of the bitfield and its relation with the natural
1281
              alignments. This is a fruitful source of errors in data exchange
1282
              between different C ABIs For more on similar limitations of
1283
              bitfields in TDF (see
1284
              <link linkend="assigning-and-extracting-bitfields">Assigning and
1285
                extracting bitfields</link>).</para>
1286
          </footnote>
1287
 
1288
          Within the limits of what one can extract from BITFIELDs, namely
1289
          INTEGER VARIETYs, this is how one should implement non-standard
1290
          alignments, perhaps in constructing data, such as protocols, for
1291
          exchange between machines. One could implement some Ada
1292
          representational statements in this way; certainly the most commonly
1293
          used ones.</para>
1294
 
1295
        <para>TDF Version 3.0 does not allow one to construct a pointer of
1296
          SHAPE POINTER(b) where b consists entirely of bitfield alignments;
1297
          this relieves the translators of the burden of doing general
1298
          bit-addressing. Of course, this simply shifts the burden to the
1299
          producer. If the high level language requires to construct a pointer
1300
          to an arbitrary bit position, then the producer is required to
1301
          represent such a pointer as a pair consisting of pointer to some
1302
          alignment including the required bitfield and an offset from this
1303
          alignment to the bitfield. For example, Ada may require the
1304
          construction of a pointer to a boolean for use as the parameter to a
1305
          procedure; the SHAPE of the rep resentation of this Ada pointer
1306
          could be a COMPOUND formed from a POINTER({x,b}) and an OFFSET({x,
1307
          b}, b) where b is the alignment given by a 1 bit alignment. To
1308
          access the boolean, the producer would use the elements of this pair
1309
          as arguements to bitfield_assign and bitfield_contents
1310
          (<link linkend="assigning-and-extracting-bitfields">Assigning and
1311
            extracting bitfields</link>.).</para>
1312
      </sect2>
1313
    </sect1>
1314
 
1315
    <sect1>
1316
      <title>Procedures and Locals</title>
1317
 
1318
      <para>All procedures in TDF are essentially global; the only values
1319
        which are accessible from the body of a procedure are those which are
1320
        derived from global TAGs (introduced by TAGDEFs or TAGDECs), local
1321
        TAGs defined within the procedure and parameter TAGs of the
1322
        procedure.</para>
1323
 
1324
      <para>All executable code in TDF will arise from an EXP PROC made by
1325
        either make_proc or make_general_proc. They differ in their treatment
1326
        of how space for the actual parameters of a call is managed; in
1327
        particular, is it the caller or the callee which deallocates the
1328
        parameter space?</para>
1329
 
1330
      <para>With make_proc, this management is conceptually done by the caller
1331
        at an apply_proc; i.e. the normal C situation. This suffers from the
1332
        limitation that tail-calls of procedures are then only possible in
1333
        restricted circumstances (e.g. the space for the parameters of the
1334
        tail-call must be capable of being included in caller's parameters)
1335
        and could only be implemented as an optimisation within a translator.
1336
        A producer could not predict these circumstances in a machine
1337
        independent manner, whether or not it knew that a tail-call was
1338
        valid.</para>
1339
 
1340
      <para>An alternative would be to make the management of parameter space
1341
        the responsibility of the called procedure. Rather than do this,
1342
        make_general_proc (and apply_general_proc) splits the parameters into
1343
        two sets, one whose allocation is the responsibility of the caller and
1344
        the other whose allocation is dealt with by the callee. This allows an
1345
        explicit tail_call to be made to a procedure with new callee
1346
        parameters; the caller parameters for the tail_call will be the same
1347
        as (or some initial subset of) the caller parameters of the procedure
1348
        containing the tail_call .</para>
1349
 
1350
      <para>A further refinement of make_general_proc is to allow access to
1351
        the caller parameter space in a postlude at the call of the procedure
1352
        using an apply_general_proc. This allows simple implementations of Ada
1353
        out_parameters, or more generally, multiple results of
1354
        procedures.</para>
1355
 
1356
      <sect2 id="S38">
1357
        <title>make_proc and apply_proc</title>
1358
 
1359
        <para>The make_proc constructor has signature:
1360
 
1361
          <programlisting>
1362
result_shape: SHAPE
1363
params_intro: LIST(TAGSHACC)
1364
var_intro:    OPTION(TAGACC)
1365
body:         EXP BOTTOM
1366
              -&gt; EXP PROC
1367
          </programlisting>
1368
 
1369
          The <type>params_intro</type> and <type>var_intro</type> parameters
1370
          introduce the formal parameters of the procedure which may be used
1371
          in <type>body</type>. The procedure result will have SHAPE
1372
          <type>result_shape</type> and will be usually given by some return
1373
          construction within <type>body</type>. The basic model is that space
1374
          will be provided to copy actual parameters (into space supplied by
1375
          some apply_proc) by value into these formals and the body will treat
1376
          this space effectively as local variables.</para>
1377
 
1378
        <para>Each straightforward formal parameter is introduced by an
1379
          auxiliary SORT TAGSHACC using make_tagshacc:
1380
 
1381
          <programlisting>
1382
sha:        SHAPE
1383
opt_access: OPTION(LIST(ACCESS))
1384
tg_intro:   TAG POINTER(alignment(sha))
1385
            -&gt; TAGSHACC
1386
          </programlisting>
1387
        </para>
1388
 
1389
        <para>Within <type>body</type>, the formal will be accessed using
1390
          <type>tg_intro</type>; it is always considered to be a pointer to
1391
          the space of SHAPE <type>sha</type> allocated by apply_proc, hence
1392
          the pointer SHAPE.</para>
1393
 
1394
        <para>For example, if we had a simple procedure with one integer
1395
          parameter, <type>var_intro</type> would be empty and
1396
          <varname>params_intro</varname> might be:
1397
 
1398
        <programlisting>
1399
params_intro = make_tagshacc(integer(v), empty, make_tag(13))
1400
        </programlisting>
1401
 
1402
        Then, TAG 13 from the enclosing UNIT's name-space is identified with
1403
        the formal parameter with SHAPE POINTER(INTEGER(v)). Any use of
1404
        obtain_tag(make_tag(13)) in <type>body</type> will deliver a pointer
1405
        to the integer parameter. I shall return to the meaning of
1406
        <type>opt_access</type> and the ramifications of the scope and extent
1407
        of TAGs involved in conjunction with local declarations in
1408
        <link linkend="identify-variable">section 5.3.1</link>.</para>
1409
 
1410
      <para>Procedures, whether defined by make_proc or make_general_proc,
1411
        will usually terminate and deliver its result with a return:
1412
 
1413
        <programlisting>
1414
arg1: EXP x
1415
      -&gt; EXP BOTTOM
1416
        </programlisting>
1417
 
1418
        Here <varname>x</varname> must be identical to the
1419
        <type>result_shape</type> of the call of the procedure There may be
1420
        several returns in body; and the SHAPE <varname>x</varname> in each
1421
        will be the same. Some languages allow different types to be returned
1422
        depending on the particular call.  The producer must resolve this
1423
        issue. For example, C allows one to deliver void if the resulting
1424
        value is not used.  In TDF a dummy value must be provided at the
1425
        return; for example make_value(<type>result_shape</type>).</para>
1426
 
1427
      <para>Note that the <type>body</type> has SHAPE bottom since all
1428
        possible terminations to a procedure have SHAPE BOTTOM..</para>
1429
 
1430
      <para>Procedures defined by make_proc are called using
1431
        apply_proc:</para>
1432
 
1433
      <programlisting>
1434
result_shape: SHAPE
1435
arg1:         EXP PROC
1436
arg2:         LIST(EXP)
1437
varparam:     OPTION(EXP)
1438
              -&gt; EXP result_shape
1439
      </programlisting>
1440
 
1441
      <para>Here <type>arg1</type> is the procedure to be called and
1442
        <type>arg2</type> gives the actual parameters. There must be at least
1443
        as many actual parameters as given (with the same SHAPE) in the
1444
        <type>params_intro</type> of the corresponding make_proc for arg1.
1445
 
1446
        <footnote>
1447
          <para>The vararg construction in C are implemented by giving more
1448
            actuals than formals; the extra parameters are accessed by offset
1449
            arithmetic with a pointer to a formal, using parameter_alignment
1450
            to pad the offsets.</para>
1451
        </footnote>
1452
 
1453
        The values of <type>arg2</type> will be copied into space managed by
1454
        caller.</para>
1455
 
1456
        <para>The SHAPE of the result of the call is given by
1457
          <type>result_shape</type> which must be identical to the
1458
          <type>result_shape</type> of the make_proc.</para>
1459
 
1460
        <sect3 id="vartag-varparam">
1461
          <title>vartag, varparam</title>
1462
 
1463
          <para>Use of the <type>var_intro</type> OPTION in make_proc and the
1464
            corresponding <type>varparam</type> in apply_proc allows one to
1465
            have a parameter of any SHAPE, possibly differing from call to
1466
            call where the actual SHAPE can be deduced in some way by the
1467
            <type>body</type> of the make_proc . One supplies an extra actual
1468
            parameter, <type>varparam</type>, which usually would be a
1469
            structure grouping some set of values. The body of the procedure
1470
            can then access these values using the pointer given by the TAG
1471
            <type>var_intro</type>, using add_to_ptr with some computed
1472
            offsets to pick out the individual fields.</para>
1473
 
1474
          <para>This is a slightly different method of giving a variable
1475
            number of parameters to a procedure, rather than simply giving
1476
            more actuals than formals. The principle difference is in the
1477
            alignment of the components of <type>varparam</type>; these will
1478
            be laid out according to the default padding defined by the
1479
            component shapes. In most ABIs, this padding is usually different
1480
            to the way parameters are laid out; for example, character
1481
            parameters are generally padded out to a full word.  Thus a
1482
            sequence of parameters of given shape has a different layout in
1483
            store to the same sequence of shapes in a structure. If one wished
1484
            to pass an arbitrary structure to a procedure, one would use the
1485
            <type>varparam</type> option rather passing the fields
1486
            individually as extra actual parameters.</para>
1487
        </sect3>
1488
      </sect2>
1489
 
1490
      <sect2 id="S40">
1491
        <title>make_general_proc and apply_general_proc</title>
1492
 
1493
        <para>A make_general_proc has signature:
1494
 
1495
          <programlisting>
1496
result_shape: SHAPE
1497
prcprops:     OPTION(PROCPROPS)
1498
caller_intro: LIST(TAGSHACC)
1499
callee_intro: LIST(TAGSHACC)
1500
body:         EXP BOTTOM
1501
              -&gt; EXP PROC
1502
          </programlisting>
1503
 
1504
          Here the formal parameters are split into two sets,
1505
          <type>caller_intro</type> and <type>callee_intro</type>, each given
1506
          by a list of TAGSHACCs just as in make_proc. The distinction between
1507
          the two sets is that the make_general_proc is responsible for
1508
          de_allocating any space required for the callee parameter set; this
1509
          really only becomes obvious at uses of tail_call within
1510
          <type>body</type>.</para>
1511
 
1512
        <para>The <type>result_shape</type> and <type>body</type> have the
1513
          same general properties as in make_proc. In addition
1514
          <type>prcprops</type> gives other information both about
1515
          <type>body</type> and the way that that the procedure is called.
1516
          PROCPROPS are a set drawn from check_stack, inline,
1517
          no_long_jump_dest, untidy, var_callees and var_callers. The set is
1518
          composed using add_procprops. The PROCPROPS no_long_jump_dest is a
1519
          property of <type>body</type> only; it indicates that none of the
1520
          labels within <type>body</type> will be the target of a long_jump
1521
          construct. The other properties should also be given consistently at
1522
          all calls of the procedure; theu are discussed in
1523
          <link linkend="procprops">section 5.2.2</link>.</para>
1524
 
1525
        <para>A procedure, <type>p</type>, constructed by make_general_proc is
1526
          called using apply_general_proc:</para>
1527
 
1528
        <programlisting>
1529
result_shape:  SHAPE
1530
prcprops:      OPTION(PROCPROPS)
1531
p:             EXP PROC
1532
caller_params: LIST(OTAGEXP)
1533
callee_params: CALLEES
1534
postlude:      EXP TOP
1535
               -&gt; EXP result_shape
1536
        </programlisting>
1537
 
1538
        <para>The actual caller parameters are given by
1539
          <type>caller_params</type> as a list of OTAGEXPs constructed using
1540
          make_otagexp:</para>
1541
 
1542
        <programlisting>
1543
tgopt: OPTION(TAG x / i>)
1544
e:     EXP x / i>
1545
       -&gt; OTAGEXP
1546
        </programlisting>
1547
 
1548
        <para>Here, <type>e</type> is the value of the parameter and
1549
          <type>tgopt</type>, if present, is a TAG which will bound to the
1550
          final value of the parameter (after <type>body</type> is evaluated)
1551
          in the <type>postlude</type> expression of the apply_general_proc.
1552
 
1553
          <footnote>
1554
            <para>If a formal parameter is to be used in this way, it should
1555
              be marked as having out_par ACCESS in its corresponding TAGSHACC
1556
              in <type>callers_intro</type>.</para>
1557
          </footnote>
1558
 
1559
          Clearly, this allows one to use a caller parameter as an extra
1560
          result of the procedure; for example, as in Ada
1561
          out-parameters.</para>
1562
 
1563
        <para>The actual <type>callee_params</type> may be constructed in
1564
          three different ways. The usual method is to use make_callee_list,
1565
          giving a list of actual EXP parameters, corresponding to the
1566
          <type>caller_intro</type> list in the obvious way.The constructor,
1567
          same_callees allows one to use the callees of the current procedure
1568
          as the callees of the call; this, of course, assumes that the
1569
          formals of the current procedure are compatible with the formals
1570
          required for the call The final method allows one to construct a
1571
          dynamically sized set of CALLEES; make_dynamic_callees takes a
1572
          pointer and a size (expressed as an OFFSET) to make the CALLEES;
1573
          this will be used in conjunction with a var_callees PROCPROPS (see
1574
          <link linkend="procprops">section 5.2.2</link>).</para>
1575
 
1576
        <para>Some procedures can be expressed using either make_proc or
1577
          make_general_proc. For example:</para>
1578
 
1579
        <para>make_proc(S, L, empty, B) = make_general_proc(S, var_callers, L,
1580
          empty, B)</para>
1581
 
1582
        <sect3 id="tail_call">
1583
          <title>tail_call</title>
1584
 
1585
          <para>Often the result of a procedure, <function>f</function>, is
1586
            simply given by the call of another (or the same) procedure,
1587
            <function>g</function>. In appropriate circumstances, the same
1588
            stack space can be used for the call of <function>g</function> as
1589
            the call of <function>f</function>. This can be particularly
1590
            important where heavily recursive routines are involved; some
1591
            languages even use tail recursion as the preferred method of
1592
            looping.</para>
1593
 
1594
          <para>One condition for such a tail call to be applicable is knowing
1595
            that <function>g</function> does not require any pointers to
1596
            locals of <function>f</function>; this is often implicit in the
1597
            language involved.  Equally important is that the action on the
1598
            return from <function>f</function> is indistiguishable from the
1599
            return from <function>g</function>. For example, if it were the
1600
            callers responsibility to pop the the space for the parameters on
1601
            return from a call, then the tail call of <function>g</function>
1602
            would only work if <function>g</function> had the same parameter
1603
            space as <function>f</function>.</para>
1604
 
1605
          <para>This is the justification for splitting the parameter set of a
1606
            general proc; it is (at least conceptually) the caller's
1607
            responsibility for popping the caller-parameters only - the
1608
            callee-parameters are dealt with by the procedure itself. Hence we
1609
            can define tail_call which uses the same caller-parameters, but a
1610
            different set of callee-parameters:
1611
 
1612
            <programlisting>
1613
prcprops:      OPTION(PROCPROPS)
1614
p:             EXP PROC
1615
callee_params: CALLEES
1616
               -&gt; EXP BOTTOM
1617
            </programlisting>
1618
 
1619
            The procedure p will be called with the same caller parameters as
1620
            the current procedure and the new <type>callee_params</type> and
1621
            return to the call site of the current procedure. Semantically, if
1622
            S is the return SHAPE of the current procedure, and L is its
1623
            caller-parameters:</para>
1624
 
1625
          <para>tail_call(P, p, C) = return(apply_general_proc(S, P, p, L, C,
1626
            make_top()))</para>
1627
 
1628
          <para>However an implementation is expected to conserve stack by
1629
            using the same space for the call of p as the current
1630
            procedure.</para>
1631
        </sect3>
1632
 
1633
        <sect3 id="procprops">
1634
          <title>PROCPROPS</title>
1635
 
1636
          <para>The presence of var_callees (or var_callers) means that the
1637
            procedure can be called with more actual callee (or caller)
1638
            parameters than are indicated in <type>callee_intro</type> (or
1639
            <type>caller_intro</type>). These extra parameters would be
1640
            accessed within body using offset calculations with respect to the
1641
            named parameters. The offsets should be calculated using
1642
            parameter_alignment to give the packing of the parameter
1643
            packs.</para>
1644
 
1645
          <para>The presence of untidy means that <type>body</type> may be
1646
            terminated by an untidy_return. This returns the result of the
1647
            procedure as in return, but the lifetime of the local space of the
1648
            procedure is extended (in practice this is performed by not
1649
            returning the stack to its original value at the call). A
1650
            procedure containing an untidy_return is a generalisation of a
1651
            local_alloc(see <link linkend="local_alloc">section 5.3.4</link>).
1652
            For example the procedure could do some complicated local
1653
            allocation (a triangular array, say) and untidily return a pointer
1654
            to it so that the space is still valid in the calling procedure.
1655
            The space will remain valid for the lifetime of the calling
1656
            procedure unless some local_free is called within it, just as if
1657
            the space had been generated by a local_alloc in the calling
1658
            procedure.</para>
1659
 
1660
          <para>The presence of inline is just a hint to the translator that
1661
            the procedure body is a good candidate for inlining at the
1662
            call.</para>
1663
 
1664
          <para>The presence of check_stack means that the static stack
1665
            requirements of the procedure will be checked on entry to see that
1666
            they do not exceed the limits imposed by set_stack_limit; if they
1667
            are exceeded a TDF exception with ERROR_CODE stack_overflow (see
1668
            <link linkend="exceptional-flow">section 6.3</link>) will be
1669
            raised.</para>
1670
        </sect3>
1671
      </sect2>
1672
 
1673
      <sect2 id="defining-and-using-locals">
1674
        <title>Defining and using locals</title>
1675
 
1676
        <sect3 id="identify-variable">
1677
          <title>identify, variable</title>
1678
 
1679
          <para>Local definitions within the <type>body</type> of a procedure
1680
            are given by two EXP constructors which permit one to give names
1681
            to values over a scope given by the definition. Note that this is
1682
            somewhat different to declarations in standard languages where the
1683
            declaration is usually embedded in a larger construct which
1684
            defines the scope of the name; here the scope is explicit in the
1685
            definition. The reason for this will become more obvious in the
1686
            discussion of TDF transformations. The simpler constructor is
1687
            identify:
1688
 
1689
            <programlisting>
1690
opt_access: OPTION(ACCESS)
1691
name_intro: TAG x
1692
definition: EXP x
1693
body:       EXP y
1694
            -&gt; EXP y
1695
            </programlisting>
1696
 
1697
            The <type>definition</type> is evaluated and its result is
1698
            identified with the TAG given by <type>name_intro</type> within
1699
            its scope <type>body</type>.  Hence the use of any
1700
            obtain_tag(<type>name_intro</type>) within <type>body</type> is
1701
            equivalent to using this result. Anywhere else,
1702
            obtain_tag(<type>name_intro</type>) is meaningless, including in
1703
            other procedures.</para>
1704
 
1705
          <para>The other kind of local definition is variable:
1706
 
1707
            <programlisting>
1708
opt_access: OPTION(ACCESS)
1709
name_intro: TAG x
1710
init:       EXP x
1711
body:       EXP y
1712
            -&gt; EXP y
1713
            </programlisting>
1714
 
1715
            Here the <type>init</type> EXP is evaluated and its result serves
1716
            as an initialisation of space of SHAPE <varname>x</varname> local
1717
            to the procedure.  The TAG name_intro is then identified with a
1718
            pointer to that SPACE within body. A use of
1719
            obtain_tag(<type>name_intro</type>) within <type>body</type> is
1720
            equivalent to using this pointer and is meaningless outside
1721
            <type>body</type> or in other procedures. Many variable
1722
            declarations in programs are uninitialised; in this case, the
1723
            <type>init</type> argument could be provided by make_value which
1724
            will produce some value with SHAPE given by its parameter.</para>
1725
        </sect3>
1726
 
1727
        <sect3 id="sort-access">
1728
          <title>ACCESS</title>
1729
 
1730
          <para>The ACCESS SORT given in tag declarations is a way of
1731
            describing a list of properties to be associated with the tag.
1732
            They are basically divided into two classes, one which describes
1733
            global properties of the tag with respect to the model for locals
1734
            and the other which gives "hints" on how the value will be used.
1735
            Any of these can be combined using add_access.</para>
1736
 
1737
          <sect4 id="locals-model">
1738
            <title>Locals model</title>
1739
 
1740
            <para>At the moment there are just three possibilities in the
1741
              first class of ACCESS constructors. They are standard_access
1742
              (the default), visible, out_par and long_jump_access.</para>
1743
 
1744
            <para>The basic model used for the locals and parameters of a
1745
              procedure is a frame within a stack of nested procedure calls.
1746
              One could implement a procedure by allocating space according to
1747
              SHAPEs of all of the parameter and local TAGs so that the
1748
              corresponding values are at fixed offsets either from the start
1749
              of the frame or some pointer within it.</para>
1750
 
1751
            <para>Indeed, if the ACCESS <type>opt_access</type> parameter in a
1752
              TAG definition is produced by visible, then a translator is
1753
              almost bound to do just that for that TAG. This is because it
1754
              allows for the possibility of the value to be accessed in some
1755
              way other than by using obtain_tag which is the standard way of
1756
              recovering the value bound to the TAG. The principal way that
1757
              this could happen within TDF is by the combined use of
1758
              env_offset to give the offset and current_env to give a pointer
1759
              to the current frame (see
1760
              <link linkend="current_env">section 5.3.3</link>).</para>
1761
 
1762
            <para>The out_par ACCESS is only applicable to caller parameters
1763
              of procedures; it indicates that the value of the TAG concerned
1764
              will accessed by the postlude part of an apply_general_proc.
1765
              Hence, the value of the parameter must be accessible after the
1766
              call; usually this will be on the stack in the callers
1767
              frame.</para>
1768
 
1769
            <para>The long_jump_access flag is used to indicate that the tag
1770
              must be available after a long_jump. In practice, if either
1771
              visible or long_jump_access is set, most translators would
1772
              allocate the space for the declaration on the main-store stack
1773
              rather than in an available register. If it is not set, then a
1774
              translator is free to use its own criteria for whether space
1775
              which can fit into a register is allocated on the stack or in a
1776
              register, provided there is no observable difference (other than
1777
              time or program size) between the two possibilities.</para>
1778
 
1779
            <para>Some of these criteria are rather obvious; for example, if a
1780
              pointer to local variable is passed outside the procedure in an
1781
              opaque manner, then it is highly unlikely that one can allocate
1782
              the variable in a register. Some might be less obvious. If the
1783
              only uses of a TAG t was in obtain_tag(t)s which are operands of
1784
              contents or the left-hand operands of assigns, most ABIs would
1785
              allow the tag to be placed in a register. We do not necessarily
1786
              have to generate a pointer value if it can be subsumed by the
1787
              operations available.</para>
1788
          </sect4>
1789
 
1790
          <sect4 id="access-hints">
1791
            <title>Access "hints"</title>
1792
 
1793
            <para>A variable tag with ACCESS constant is a write-once value;
1794
              once it is initialised the variable will always contain the
1795
              initialisation. In other words the tag is a pointer to a
1796
              constant value; translators can use this information to apply
1797
              various optimisations.</para>
1798
 
1799
            <para>A POINTER tag with ACCESS no_other_read or no_other_write is
1800
              asserting that there are no "aliassed" accesses to the contents
1801
              of the pointer. For example, when applied to a parameter of a
1802
              procedure, it is saying that the original pointer of the tag is
1803
              distinct from any other tags used (reading/writing) in the
1804
              lifetime of the tag. These other tags could either be further
1805
              parameters of the procedure or globals. Clearly, this is useful
1806
              for describing the limitations imposed by Fortran parameters,
1807
              for example.</para>
1808
          </sect4>
1809
        </sect3>
1810
 
1811
        <sect3 id="current_env">
1812
          <title>current_env, env_offset</title>
1813
 
1814
          <para>The constructor current_env gives a pointer to the current
1815
            procedure frame of SHAPE POINTER(<type>fa</type>) where
1816
            <type>fa</type> is depends on how the procedure was defined and
1817
            will be some set of the special frame ALIGNMENTs. This set will
1818
            always include locals_alignment - the alignment of any locals
1819
            defined within the procedure. If the procedure has any caller-
1820
            parameters, the set will also include callers_alignment(b) where b
1821
            indicates whether there can be a variable number of them;
1822
            similarly for callee-parameters.</para>
1823
 
1824
          <para>Offsets from the current_env of a procedure to a tag declared
1825
            in the procedure are constructed by env_offset:
1826
 
1827
            <programlisting>
1828
fa: ALIGNMENT
1829
y:  ALIGNMENT
1830
t:  TAG x
1831
    -&gt; EXP OFFSET(fa, y)
1832
            </programlisting>
1833
 
1834
            The frame ALIGNMENT <type>fa</type> will be the appropriate one
1835
            for the TAG <type>t</type>; i.e. if <type>t</type> is a local then
1836
            the <type>fa</type> will be locals_alignment; if <type>t</type> is
1837
            a caller parameter, <type>fa</type> will be callers_alignment(b);
1838
            if <type>t</type> is a callee_parameter, <type>fa</type> will be
1839
            callees_alignment(b). The alignment <type>y</type> will be the
1840
            alignment of the initialisation of <type>t</type>.</para>
1841
 
1842
          <para>The offset arithmetic operations allow one to access the
1843
            values of tags non-locally using values derived from current_env
1844
            and env_offset. They are effectively defined by the following
1845
            identities:
1846
 
1847
            <programlisting>
1848
    If TAG t is derived from a variable definition
1849
            add_to_ptr(current_env(), env_offset(locals_alignment, A, t)) = obtain_tag(t)
1850
    if TAG t is derived from an identify definition:
1851
            contents(S, add_to_ptr(current_env(), env_offset(locals_alignment, A, t))) = obtain_tag(t)
1852
    if TAG t is derived from a caller parameter:
1853
            add_to_ptr(current_env(), env_offset(callers_alignment(b), A, t)) = obtain_tag(t)
1854
    if TAG t is derived from a callee parameter:
1855
            add_to_ptr(current_env(), env_offset(callees_alignment(b), A, t)) = obtain_tag(t)
1856
            </programlisting>
1857
 
1858
            These identities are valid throughout the extent of t, including
1859
            in inner procedure calls. In other words, one can dynamically
1860
            create a pointer to the value by composing current_env and
1861
            env_offset.</para>
1862
 
1863
          <para>The importance of this is that env_offset(t) is a constant
1864
            OFFSET and can be used anywhere within the enclosing UNIT, in
1865
            other procedures or as part of constant TAGDEF; remember that the
1866
            TDFINT underlying t is unique within the UNIT. The result of a
1867
            current_env could be passed to another procedure (as a parameter,
1868
            say) and this new procedure could then access a local of the
1869
            original by using its env_offset. This would be the method one
1870
            would use to access non-local, non-global identifiers in a
1871
            language which allowed one to define procedures within procedures
1872
            such as Pascal or Algol. Of course, given the stack-based model,
1873
            the value given by current_env becomes meaningless once the
1874
            procedure in which it is invoked is exited.</para>
1875
        </sect3>
1876
 
1877
        <sect3 id="local_alloc">
1878
          <title>local_alloc, local_free_all, last_local</title>
1879
 
1880
          <para>The size of stack frame produced by variable and identify
1881
            definitions is a translate-time constant since the frame is
1882
            composed of values whose SHAPEs are known. TDF also allows one to
1883
            produce dynamically sized local objects which are conceptually
1884
            part of the frame. These are produced by local_alloc:
1885
 
1886
            <programlisting>
1887
arg1: EXP OFFSET(x, y)
1888
      -&gt; EXP POINTER(alloca_alignment)
1889
            </programlisting>
1890
 
1891
            The operand <token>arg1</token> gives the size of the new object
1892
            required and the result is a pointer to the space for this object
1893
            "on top of the stack" as part of the frame. The quotation marks
1894
            indicate that a translator writer might prefer to maintain a
1895
            dynamic stack as well as static one. There are some disadvantages
1896
            in putting everything into one stack which may well out-weigh the
1897
            trouble of maintaining another stack which is relatively
1898
            infrequently used. If a frame has a known size, then all
1899
            addressing of locals can be done using a stack-front register; if
1900
            it is dynamically sized, then another frame-pointer register must
1901
            be used - some ABIs make this easy but not all. The majority of
1902
            procedures contain no local_allocs, so their addressing of locals
1903
            can always be done relative to a stack-front; only the others have
1904
            to use another register for a frame pointer.</para>
1905
 
1906
          <para>The alignment of pointer result is alloca_alignment which must
1907
            include all SHAPE alignments.</para>
1908
 
1909
          <para>There are two constructors for releasing space generated by
1910
            local_alloc. To release all such space generated in the current
1911
            procedure one does local_free_all(); this reduces the size of the
1912
            current frame to its static size.</para>
1913
 
1914
          <para>The other constructor is local_free whch is effectively a
1915
            "pop" to local_alloc's "push":
1916
 
1917
            <programlisting>
1918
a: EXP OFFSET(x, y)
1919
p: EXP POINTER(alloca_alignment)
1920
   -&gt; EXP TOP
1921
            </programlisting>
1922
 
1923
            Here <type>p</type> must evaluate to a pointer generated either by
1924
            local_alloc or last_local . The effect is to free all of the space
1925
            locally allocated after p. The usual implementation (with a
1926
            downward growing stack) of this is that p becomes the "top of
1927
            stack" pointer.</para>
1928
 
1929
          <para>The use of a procedure with an untidy_return is just a
1930
            generalisation of the idea of local_alloc and the space made
1931
            available by its use can be freed in the same way as normal local
1932
            allocations. Of course, given that it could be the result of the
1933
            procedure it can be structured in an arbitrarily complicated
1934
            way.</para>
1935
        </sect3>
1936
      </sect2>
1937
 
1938
      <sect2 id="heap-storage">
1939
        <title>Heap storage</title>
1940
 
1941
        <para>At the moment, there are no explicit constructors of creating
1942
          dynamic off-stack storage in TDF. Any off-stack storage requirements
1943
          must be met by the API in which the system is embedded, using the
1944
          standard procedural interface.  For example, the ANSI C API allows
1945
          the creation of heap space using standard library procedures like
1946
          malloc.</para>
1947
      </sect2>
1948
    </sect1>
1949
 
1950
    <sect1>
1951
      <title>Control Flow within procedures</title>
1952
 
1953
      <sect2>
1954
        <title>Unconditional flow</title>
1955
 
1956
        <sect3 id="sequence">
1957
          <title>sequence</title>
1958
 
1959
          <para>To perform a sequential set of operations in TDF, one uses the
1960
            constructor sequence:
1961
 
1962
            <programlisting>
1963
statements: LIST(EXP)
1964
result:     EXP x
1965
            -&gt; EXP x
1966
            </programlisting>
1967
 
1968
            Each of the <type>statements</type> are evaluated in order,
1969
            throwing away their results. Then, <type>result</type> is
1970
            evaluated and its result is the result of the sequence.</para>
1971
 
1972
          <para>A translator is free to rearrange the order of evaluation if
1973
            there is no observable difference other than in time or space.
1974
            This applies anywhere I say "something is evaluated and then ...".
1975
            We find this kind of statement in definitions of local variables
1976
            in <link linkend="defining-and-using-locals">section 5.3</link>,
1977
            and in the controlling parts of the conditional constructions
1978
            below.</para>
1979
 
1980
          <para>For a more precise discussion of allowable reorderings see
1981
            (<fix>S7.14</fix>) .</para>
1982
        </sect3>
1983
      </sect2>
1984
 
1985
      <sect2 id="conditional-flow">
1986
        <title>Conditional flow</title>
1987
 
1988
        <sect3 id="labelled">
1989
          <title>labelled, make_label</title>
1990
 
1991
          <para>All simple changes of flow of control within a TDF procedure
1992
            are done by jumps or branches to LABELs, mirroring what actually
1993
            happens in most computers. There are three constructors which
1994
            introduce LABELs; the most general is labelled which allows
1995
            arbitrary jumping between its component EXPs:
1996
 
1997
            <programlisting>
1998
placelabs_intro: LIST(LABEL)
1999
starter:         EXP x
2000
places:          LIST(EXP)
2001
                 -&gt; EXP w
2002
            </programlisting>
2003
 
2004
            Each of the EXPs in <type>places</type> is labelled by the
2005
            corresponding LABEL in <type>placelabs_intro</type>; these LABELs
2006
            are constructed by make_label applied to a TDFINT uniquely drawn
2007
            from the LABEL name-space introduced by the enclosing PROPS. The
2008
            evaluation starts by evaluating <type>starter</type>; if this runs
2009
            to completion the result of the labelled is the result of
2010
            <type>starter.</type> If there is some jump to a LABEL in
2011
            <type>placelabs_intro</type> then control passes to the
2012
            corresponding EXP in <type>places</type> and so on.  If any of
2013
            these EXPS runs to completion then its result is the result of the
2014
            labelled; hence the SHAPE of the result, w, is the LUB of the
2015
            SHAPEs of the component EXPs.</para>
2016
 
2017
          <para>Note that control does not automatically pass from one EXP to
2018
            the next; if this is required the appropriate EXP must end with an
2019
            explicit goto.</para>
2020
        </sect3>
2021
 
2022
        <sect3 id="goto">
2023
          <title>goto, make_local_lv, goto_local_lv, long_jump,
2024
            return_to_label</title>
2025
 
2026
          <para>The unconditional goto is the simplest method of jumping. In
2027
            common with all the methods of jumping using LABELs, its LABEL
2028
            parameter must have been introduced in an enclosing construction,
2029
            like labelled, which scopes it.</para>
2030
 
2031
          <para>One can also pick up a label value of SHAPE POINTER {code}
2032
            (usually implemented as a program address) using make_local_lv for
2033
            later use by an "indirect jump" such as goto_local_lv . Here the
2034
            same prohibition holds - the construction which introduced the
2035
            LABEL must still be active.</para>
2036
 
2037
          <para>The construction goto_local_lv only permits one to jump within
2038
            the current procedure; if one wished to do a jump out of a
2039
            procedure into a calling one, one uses long_jump which requires a
2040
            pointer to the destination frame (produced by current_env in the
2041
            destination procedure) as well as the label value. If a long_jump
2042
            is made to a label, only those local TAGs which have been defined
2043
            with a visible ACCESS are guaranteed to have preserved their
2044
            values; the translator could allocate the other TAGs in scope as
2045
            registers whose values are not necessarily preserved.</para>
2046
 
2047
          <para>A slightly "shorter" long jump is given by return_to_label.
2048
            Here the destination of the jump must a label value in the calling
2049
            procedure. Usually this value would be passed as parameter of the
2050
            call to provide an alternative exit to the procedure.</para>
2051
        </sect3>
2052
 
2053
        <sect3 id="integer_test">
2054
          <title>integer_test, NTEST</title>
2055
 
2056
          <para>Conditional branching is provided by the various _test
2057
            constructors, one for each primitive SHAPE except BITFIELD.  I
2058
            shall use integer_test as the paradigm for them all:
2059
 
2060
            <programlisting>
2061
nt:   NTEST
2062
dest: LABEL
2063
arg1: EXP INTEGER(v)
2064
arg2: EXP INTEGER(v)
2065
      -&gt; EXP TOP
2066
            </programlisting>
2067
 
2068
            The NTEST <type>nt</type> chooses a dyadic test (e.g. =, &gt;=,
2069
            &lt;, etc.) that is to be applied to the results of evaluating
2070
            <type>arg1</type> and <type>arg2</type>. If <type>arg1</type>
2071
            <type>nt</type> <type>arg2</type> then the result is TOP;
2072
            otherwise control passes to the LABEL <type>dest</type>.  In other
2073
            words, integer_test acts like an assertion where if the assertion
2074
            is false, control passes to the LABEL instead of continuing in the
2075
            normal fashion.</para>
2076
 
2077
          <para>Some of the constructors for NTESTs are disallowed for some
2078
            _tests (e.g. proc_test) while others are redundant for some
2079
            _tests; for example, not_greater_than is the same as
2080
            less_than_or_equal for all except possibly floating_test. where
2081
            the use of NaNs (in the IEEE sense) as operands may give different
2082
            results.</para>
2083
        </sect3>
2084
 
2085
        <sect3 id="case">
2086
          <title>case</title>
2087
 
2088
          <para>There are only two other ways of changing flow of control
2089
            using LABELs. One arises in ERROR_TREATMENTs which will be dealt
2090
            with in the arithmetic operations. The other is case:
2091
 
2092
            <programlisting>
2093
exhaustive: BOOL
2094
control:    EXP INTEGER(v)
2095
branches:   LIST(CASELIM)
2096
            -&gt; EXP (exhaustive ? BOTTOM : TOP)
2097
            </programlisting>
2098
 
2099
            Each CASELIM is constructed using make_caselim:
2100
 
2101
            <programlisting>
2102
branch: LABEL
2103
lower:  SIGNED_NAT
2104
upper:  SIGNED_NAT
2105
        -&gt; CASELIM
2106
            </programlisting>
2107
 
2108
            In the case construction, the <type>control</type> EXP is
2109
            evaluated and tested to see whether its value lies inclusively
2110
            between some <type>lower</type> and <type>upper</type> in the list
2111
            of CASELIMs. If so, control passes to the corresponding
2112
            <type>branch</type>. The order in which these tests are performed
2113
            is undefined, so it is probably best if the tests are exclusive.
2114
            The exhaustive flag being true asserts that one of the branches
2115
            will be taken and so the SHAPE of the result is BOTTOM.
2116
            Otherwise, if none of the branches are taken, its SHAPE is TOP and
2117
            execution carries on normally.</para>
2118
        </sect3>
2119
 
2120
        <sect3 id="conditional">
2121
          <title>conditional, repeat</title>
2122
 
2123
          <para>Besides labelled, two other constructors, conditional and
2124
            repeat, introduce LABELs which can be used with the various jump
2125
            instructions.  Both can be expressed as labelled, but have extra
2126
            constraints which make assertions about the use of the LABELs
2127
            introduced and could usually be translated more efficiently; hence
2128
            producers are advised to use them where possible. A conditional
2129
            expression or statement would usually be done using conditional:
2130
 
2131
            <programlisting>
2132
alt_label_intro: LABEL
2133
first:           EXP x
2134
alt:             EXP y
2135
                 -&gt; EXP(LUB(x, y))
2136
            </programlisting>
2137
 
2138
            Here <type>first</type> is evaluated; if it terminates normally,
2139
            its result is the result of the conditional. If a jump to
2140
            <type>alt_label_intro</type> occurs then <type>alt</type> is
2141
            evaluated and its result is the result of the conditional.
2142
            Clearly, this, so far, is just the same as
2143
            labelled((<type>alt_label_intro</type>), <type>first</type>,
2144
            (<type>alt</type>)). However, conditional imposes the constraint
2145
            that <type>alt</type> cannot use <type>alt_label_intro</type>. All
2146
            jumps to <type>alt_label_intro</type> are "forward jumps" - a
2147
            useful property to know in a translator.</para>
2148
 
2149
          <para>Obviously, this kind of conditional is rather different to
2150
            those found in traditional high-level languages which usually have
2151
            three components, a boolean expression, a then-part and an
2152
            else-part. Here, the <type>first</type> component includes both
2153
            the boolean expression and the then-part; usually we find that it
2154
            is a sequence of the tests (branching to
2155
            <type>alt_label_intro</type>) forming the boolean expression
2156
            followed by the else-part. This formulation means that HLL
2157
            constructions like "andif" and "orelse" do not require special
2158
            constructions in TDF.</para>
2159
 
2160
          <para>A simple loop can be expressed using repeat:
2161
 
2162
            <programlisting>
2163
repeat_label_intro: LABEL
2164
start:              EXP TOP
2165
body:               EXP y
2166
                    -&gt; EXP y
2167
            </programlisting>
2168
 
2169
            The EXP <type>start</type> is evaluated, followed by
2170
            <type>body</type> which is labelled by
2171
            <type>repeat_label_intro</type>. If a jump to
2172
            <type>repeat_label_intro</type> occurs in <type>body</type>, then
2173
            <type>body</type> is re-evaluated. If <type>body</type> terminates
2174
            normally then its result is the result of the repeat. This is just
2175
            the same as:
2176
 
2177
            <programlisting>
2178
labelled((repeat_label_intro), sequence((start), goto(repeat_label_intro)), (body))
2179
            </programlisting>
2180
 
2181
            except that no jumps to <type>repeat_label_intro</type> are
2182
            allowed in <type>start</type> - a useful place to do
2183
            initialisations for the loop.</para>
2184
 
2185
          <para>Just as with conditionals, the tests for the continuing or
2186
            breaking the loop are included in <type>body</type> and require no
2187
            special constructions.</para>
2188
        </sect3>
2189
      </sect2>
2190
 
2191
      <sect2 id="exceptional-flow">
2192
        <title>Exceptional flow</title>
2193
 
2194
        <para>A further way of changing the flow of control in a TDF program
2195
          is by means of exceptions. TDF exceptions currently arise from three
2196
          sources - overflow in arithmetic operations with trap
2197
          ERROR_TREATMENT(see
2198
          <link linkend="error_treatment">section 8.1.1</link>), an attempt to
2199
          access a value via a nil pointer using assign_with_mode,
2200
          contents_with_mode or move_some(see
2201
          <link linkend="transfer_mode-operations">section 7.3</link>) or a
2202
          stack overflow on procedure entry with PROCPROPS check_stack(see
2203
          <link linkend="procprops">section 5.2.2</link>) or a
2204
          local_alloc_check.</para>
2205
 
2206
        <para>Each of these exceptions have an ERROR_CODE ascribed to them,
2207
          namely overflow, nil_access and stack_overflow. Each ERROR_CODE can
2208
          be made into a distinct NAT by means of the constructor error_val;
2209
          these NATs could be used, for example, to discriminate the different
2210
          kinds of errors using a case construction.</para>
2211
 
2212
        <para>When one of these exceptions is raised, the translator will
2213
          arrange that a TOKEN, ~Throw, is called with the appropriate
2214
          ERROR_CODE as its (sole) parameter. Given that every language has a
2215
          different way of both characterising and handling exceptions, the
2216
          exact expansion of ~Throw must be given by the producer for that
2217
          language - usually it will involve doing a long_jump to some label
2218
          specifying a signal handler and translating the ERROR_CODE into its
2219
          language-specific representation.</para>
2220
 
2221
        <para>The expansion of ~Throw forms part of the language specific
2222
          environment required for the translation of TDF derived from the
2223
          language, just as the exact shape of FILE must be given for the
2224
          translation of C.</para>
2225
      </sect2>
2226
    </sect1>
2227
 
2228
    <sect1>
2229
      <title>Values, variables and assignments.</title>
2230
 
2231
      <para>TAGs in TDF fulfil the role played by identifiers in most
2232
        programming languages. One can apply obtain_tag to find the value
2233
        bound to the TAG. This value is always a constant over the scope of a
2234
        particular definition of the TAG. This may sound rather strange to
2235
        those used to the concepts of left-hand and right-hand values in C,
2236
        for example, but is quite easily explained as follows.</para>
2237
 
2238
      <para>If a TAG, id, is introduced by an identify, then the value bound
2239
        is fixed by its <type>definition</type> argument. If, on the other
2240
        hand, v was a TAG introduced by a variable definition, then the value
2241
        bound to v is a pointer to fixed space in the procedure frame (i.e.
2242
        the left-hand value in C).</para>
2243
 
2244
      <sect2 id="S62">
2245
        <title>contents</title>
2246
 
2247
        <para>In order to get the contents of this space (the right-hand value
2248
          in C), one must apply the contents operator to the pointer:
2249
 
2250
          <programlisting>
2251
contents(shape(v), obtain_tag(v))
2252
          </programlisting>
2253
 
2254
          In general, the contents constructor takes a SHAPE and an
2255
          expression delivering pointer:
2256
 
2257
          <programlisting>
2258
s:    SHAPE
2259
arg1: EXP POINTER(x)
2260
      -&gt; EXP s
2261
          </programlisting>
2262
 
2263
          It delivers the value of SHAPE <type>s</type>, pointed at by the
2264
          evaluation of <type>arg1</type>. The alignment of <type>s</type>
2265
          need not be identical to <varname>x</varname>. It only needs to be
2266
          included in it; this would allow one, for example, to pick out the
2267
          first field of a structure from a pointer to it.</para>
2268
      </sect2>
2269
 
2270
      <sect2 id="S63">
2271
        <title>assign</title>
2272
 
2273
        <para>A simple assignment in TDF is done using assign:
2274
 
2275
        <programlisting>
2276
arg1: EXP POINTER(x)
2277
arg2: EXP y
2278
      -&gt; EXP TOP
2279
        </programlisting>
2280
 
2281
        The EXPs <type>arg1</type> and <type>arg2</type> are evaluated (no
2282
        ordering implied) and the value of SHAPE <varname>y</varname> given by
2283
        <type>arg2</type> is put into the space pointed at by
2284
        <type>arg1</type>. Once again, the alignment of <varname>y</varname>
2285
        need only be included in <varname>x</varname>, allowing the assignment
2286
        to the first field of a structure using a pointer to the structure. An
2287
        assignment has no obvious result so its SHAPE is TOP.</para>
2288
 
2289
      <para>Some languages give results to assignments. For example, C defines
2290
        the result of an assignment to be its right-hand expression, so that
2291
        if the result of (v = exp) was required, it would probably be
2292
        expressed as:
2293
 
2294
        <programlisting>
2295
identify(empty, newtag, exp, sequence((assign(obtain_tag(v), obtain_tag(newtag))), obtain_tag(newtag)))
2296
        </programlisting>
2297
 
2298
        From the definition of assign, the destination argument,
2299
        <type>arg1</type>, must have a POINTER shape. This means that given
2300
        the TAG id above, assign(obtain_tag(id), lhs) is only legal if the
2301
        <type>definition</type> of its identify had a POINTER SHAPE. A trivial
2302
        example would be if id was defined:
2303
 
2304
        <programlisting>
2305
identify(empty, id, obtain_tag(v), assign(obtain_tag(id), lhs))
2306
        </programlisting>
2307
 
2308
        This identifies id with the variable v which has a POINTER SHAPE, and
2309
        assigns lhs to this pointer. Given that id does not occur in lhs, this
2310
        is identical to:
2311
 
2312
        <programlisting>
2313
assign(obtain_tag(v), lhs).
2314
        </programlisting>
2315
 
2316
        Equivalences like this are widely used for transforming TDF in
2317
        translators and other tools (see
2318
        <link linkend= "tdf-transformations">section 11</link>).</para>
2319
      </sect2>
2320
 
2321
      <sect2 id="transfer_mode-operations">
2322
        <title>TRANSFER_MODE operations</title>
2323
 
2324
        <para>The TRANSFER_MODE operations allow one to do assignment and
2325
          contents operations with various qualifiers to control how the
2326
          access is done in a more detailed manner than the standard contents
2327
          and assign operations.</para>
2328
 
2329
        <para>For example, the value assigned in assign has some fixed SHAPE;
2330
          its size is known at translate-time. Variable sized objects can be
2331
          moved by move_some:
2332
 
2333
          <programlisting>
2334
md:   TRANSFER_MODE
2335
arg1: EXP POINTER x
2336
arg2: EXP POINTER y
2337
arg3: EXP OFFSET(z, t)
2338
      -&gt; EXP TOP
2339
          </programlisting>
2340
 
2341
          The EXP <type>arg1</type> is the destination pointer, and
2342
          <type>arg2</type> is a source pointer. The amount moved is given by
2343
          the OFFSET <type>arg3</type>.</para>
2344
 
2345
        <para>The TRANSFER_MODE <type>md</type> parameter controls the way
2346
          that the move will be performed. If overlap is present, then the
2347
          translator will ensure that the move is equivalent to moving the
2348
          source into new space and then copying it to the destination; it
2349
          would probably do this by choosing a good direction in which to step
2350
          through the value.  The alternative, standard_transfer_mode,
2351
          indicates that it does not matter.</para>
2352
 
2353
        <para>If the TRANSFER_MODE trap_on_nil is present and
2354
          <type>arg1</type> is a nil pointer, a TDF exception with ERROR_CODE
2355
          nil_access is raised.</para>
2356
 
2357
        <para>There are variants of both the contents and assign constructors.
2358
          The signature of contents_with_mode is:
2359
 
2360
          <programlisting>
2361
md:   TRANSFER_MODE
2362
s:    SHAPE
2363
arg1: EXP POINTER(x)
2364
      -&gt; EXP s
2365
          </programlisting>
2366
 
2367
          Here, the only significant TRANSFER_MODE constructors
2368
          <type>md</type> are trap_on_nil and volatile. The latter is
2369
          principally intended to implement the C volatile construction; it
2370
          certainly means that the contents_with_mode operation will never be
2371
          "optimised" away.</para>
2372
 
2373
        <para>Similar considerations apply to assign_with_mode; here the
2374
          overlap TRANSFER_MODE is also possible with the same meaning as in
2375
          move_some.</para>
2376
      </sect2>
2377
 
2378
      <sect2 id="assigning-and-extracting-bitfields">
2379
        <title>Assigning and extracting bitfields</title>
2380
 
2381
        <para>Since pointers to bits are forbidden, two special operations are
2382
          provided to extract and assign bitfields.  These require the use of
2383
          a pointer value and a bitfield offset from the pointer. The
2384
          signature of bitfield_contents which extracts a bitfield in this
2385
          manner is:
2386
 
2387
          <programlisting>
2388
v:    BITFIELD_VARIETY
2389
arg1: EXP POINTER(x)
2390
arg2: EXP OFFSET(y,z)
2391
      -&gt; EXP bitfield(v)
2392
          </programlisting>
2393
 
2394
          Here <type>arg1</type> is a pointer to an alignment
2395
          <parameter>x</parameter> which includes <type>v</type>, the required
2396
          bitfield alignment. In practice, <parameter>x</parameter> must
2397
          include an INTEGER VARIETY whose representation can contain the
2398
          entire bitfield. Thus on a standard architecture, if v is a 15 bit
2399
          bitfield, x must include at least a 16 bit integer variety; a 27
2400
          bitfield would require a 32 bit integer variety and so on. Indeed
2401
          the constraint is stronger than this since there must be an integer
2402
          variety, accessible from arg1, which entirely contains the
2403
          bitfield.</para>
2404
 
2405
        <para>This constraint means that producers cannot expect that
2406
          arbitrary bitfield lengths can be accomodated without extra padding;
2407
          clearly it also means that the maximum bitfield length possible is
2408
          the maximum size of integer variety that can be implemented on the
2409
          translator concerned (this is defined to be at least 32). On
2410
          standard architectures, the producer can expect that an array of
2411
          bitfields of lenth 2<emphasis>n</emphasis> will be packed without
2412
          padding; this, of course, includes arrays of booleans.  For
2413
          structures of several different bitfields, he can be sure of no
2414
          extra padding bits if the total number of bits involved is less than
2415
          or equal to 32; similarly if he can subdivide the bitfields so that
2416
          each of the subdivisions (except the last) is exactly equal to 32
2417
          and the last is less than or equal to 32.  This could be relaxed to
2418
          64 bits if the translator deals with 64 bit integer varieties, but
2419
          would require that conditional TDF is produced to maintain
2420
          portability to 32 bit platforms - and on these platforms the
2421
          assurance of close packing would be lost.</para>
2422
 
2423
        <para>Since a producer is ignorant of the exact representational
2424
          varieties used by a translator, the onus is on the translator writer
2425
          to provide standard tokens which can be used by a producer to
2426
          achieve both optimum packing of bitfields and minimum alignments for
2427
          COMPOUNDs containing them(see
2428
          <link linkend="bitfield-offsets">Bitfield offsets</link>). These
2429
          tokens would allow one to construct an offset of the form OFFSET(x,
2430
          b) (where b is some bitfield alignment and x is the `minimum'
2431
          alignment which could contain it) in a manner analogous to the
2432
          normal padding operations for offsets. This offset could then used
2433
          both in the construction of a compound shape and in the extraction
2434
          and assignment constructors.</para>
2435
 
2436
        <para>The assignment of bitfields follows the same pattern with the
2437
          same constraints using bitfield_assign:
2438
 
2439
          <programlisting>
2440
arg1: EXP POINTER(x)
2441
arg2: EXP OFFSET(y,z)
2442
arg3: EXP BITFIELD_VARIETY(v)
2443
      -&gt; EXP TOP
2444
          </programlisting>
2445
        </para>
2446
      </sect2>
2447
    </sect1>
2448
 
2449
    <sect1>
2450
      <title>Operations</title>
2451
 
2452
      <para>Most of the arithmetic operations of TDF have familiar analogues
2453
        in standard languages and processors.  They differ principally in how
2454
        error conditions (e.g. numeric overflow) are handled. There is a wide
2455
        diversity in error handling in both languages and processors, so TDF
2456
        tries to reduce it to the simplest primitive level compatible with
2457
        their desired operation in languages and their implementation on
2458
        processors.  Before delving into the details of error handling, it is
2459
        worthwhile revisiting the SHAPEs and ranges in arithmetic
2460
        VARIETYs.</para>
2461
 
2462
      <sect2 id="S67">
2463
        <title>VARIETY and overflow</title>
2464
 
2465
        <para>An INTEGER VARIETY, for example, is defined by some range of
2466
          signed natural numbers. A translator will fit this range into some
2467
          possibly larger range which is convenient for the processor in
2468
          question. For example, the integers with variety(1,10) would
2469
          probably be represented as unsigned characters with range (0..255),
2470
          a convenient representation for both storage and arithmetic.</para>
2471
 
2472
        <para>The question then arises of what is meant by overflow in an
2473
          operation which is meant to deliver an integer of this VARIETY - is
2474
          it when the integer result is outside the range (1..10) or outside
2475
          the range (0..255)? For purely pragmatic reasons, TDF chooses the
2476
          latter - the result is overflowed when it is outside its
2477
          representational range (0..255). If the program insists that it must
2478
          be within (1..10), then it can always test for it. If the program
2479
          uses the error handling mechanism and the result is outside (1..10)
2480
          but still within the representational limits, then, in order for the
2481
          program to be portable, then the error handling actions must in some
2482
          sense be "continuous" with the normal action. This would not be the
2483
          case if, for example, the value was used to index an array with
2484
          bounds (1..10), but will usually be the case where the value is used
2485
          in further arithmetic operations which have similar error handling.
2486
          The arithmetic will continue to give the mathematically correct
2487
          result provided the representational bounds are not exceeded.</para>
2488
 
2489
        <para>The limits in a VARIETY are there to provide a guide to its
2490
          representation, and not to give hard limits to its possible values.
2491
          This choice is consistent with the general TDF philosophy of how
2492
          exceptions are to be treated. If, for example, one wishes to do
2493
          array-bound checking, then it must be done by explicit tests on the
2494
          indices and jumping to some exception action if they fail.
2495
          Similarly, explicit tests can be made on an integer value, provided
2496
          its representational limits are not exceeded. It is unlikely that a
2497
          translator could produce any more efficient code, in general, if the
2498
          tests were implicit. The representational limits can be exceeded in
2499
          arithmetic operations, so facilities are provided to either to
2500
          ignore it, to allow one to jump to a label, or to obey a TDF
2501
          exception handler if it happens.</para>
2502
 
2503
        <sect3 id="error_treatment">
2504
          <title>ERROR_TREATMENT</title>
2505
 
2506
          <para>Taking integer addition as an example, plus has signature:
2507
 
2508
            <programlisting>
2509
ov_err: ERROR_TREATMENT
2510
arg1:   EXP INTEGER(v)
2511
arg2:   EXP INTEGER(v)
2512
        -&gt; EXP INTEGER(v)
2513
            </programlisting>
2514
 
2515
            The result of the addition has the same integer VARIETY as its
2516
            parameters.  If the representational bounds of
2517
            <parameter>v</parameter> are exceeded, then the action taken
2518
            depends on the ERROR_TREATMENT <type>ov_err</type>.</para>
2519
 
2520
          <para>The ERROR_TREATMENT, impossible, is an assertion by the
2521
            producer that overflow will not occur; on its head be it if it
2522
            does.</para>
2523
 
2524
          <para>The ERROR_TREATMENTS continue and wrap give "fixup" values for
2525
            the result. For continue the fixup value is undefined. For wrap,
2526
            the the answer will be modulo 2 to the power of the number of bits
2527
            in the representational variety.Thus, integer arithmetic with byte
2528
            representational variety is done modulo 256. This just corresponds
2529
            to what happens in most processors and, incidentally, the
2530
            definition of C.</para>
2531
 
2532
          <para>The ERROR_TREATMENT that one would use if one wished to jump
2533
            to a label is error_jump:
2534
 
2535
            <programlisting>
2536
lab: LABEL
2537
     -&gt; ERROR_TREATMENT
2538
            </programlisting>
2539
 
2540
            A branch to <type>lab</type> will occur if the result
2541
            overflows.</para>
2542
 
2543
          <para>The ERROR_TREATMENT, trap(overflow) will raise a TDF
2544
            exception(see <link linkend="exceptional-flow">section 6.3</link>)
2545
            with ERROR_CODE overflow if overflow occurs.</para>
2546
        </sect3>
2547
      </sect2>
2548
 
2549
      <sect2 id="S69">
2550
        <title>Division and remainder</title>
2551
 
2552
        <para>The various constructors in involving integer division (e.g.
2553
          div1, rem1) have two ERROR_TREATMENT parameters, one for overflow
2554
          and one for divide-by-zero e.g. div1 is:
2555
 
2556
          <programlisting>
2557
div_by_zero_error: ERROR_TREATMENT
2558
ov_err:            ERROR_TREATMENT
2559
arg1:              EXP INTEGER(v)
2560
arg2:              EXP INTEGER(v)
2561
                   -&gt; EXP INTEGER(v)
2562
          </programlisting>
2563
 
2564
          There are two different kinds of division operators (with
2565
          corresponding remainder operators) defined. The operators div2 and
2566
          rem2 are those generally implemented directly by processor
2567
          instructions giving the sign of the remainder the same as the sign
2568
          of the quotient. The other pair, div1 and rem1, is less commonly
2569
          implemented in hardware, but have rather more consistent
2570
          mathematical properties; here the sign of remainder is the same as
2571
          the sign of divisor. Thus, div1(x, 2) is the same as shift_right(x,
2572
          1) which is only true for div2 if x is positive. The two pairs of
2573
          operations give the same results if both operands have the same
2574
          sign. The constructors div0 and rem0 allow the translator to choose
2575
          whichever of the two forms of division is convenient - the producer
2576
          is saying that he does not care which is used, as long as they are
2577
          pairwise consistent. The precise definition of the divide operations
2578
          is given in (<fix>S7.4</fix>).</para>
2579
      </sect2>
2580
 
2581
      <sect2 id="change_variety">
2582
        <title>change_variety</title>
2583
 
2584
        <para>Conversions between the various INTEGER varieties are provided
2585
          for by change_variety:
2586
 
2587
          <programlisting>
2588
ov_err: ERROR_TREATMENT
2589
r:      VARIETY
2590
arg1:   EXP INTEGER(v)
2591
        -&gt; EXP INTEGER(r)
2592
          </programlisting>
2593
 
2594
          If the value <type>arg1</type> is outside the limits of the
2595
          representational variety of <type>r</type>, then the ERROR_TREATMENT
2596
          <type>ov_err</type> will be invoked.</para>
2597
      </sect2>
2598
 
2599
      <sect2 id="S71">
2600
        <title>and, or, not, xor</title>
2601
 
2602
        <para>The standard logical operations, and, not, or and xor are
2603
          provided for all integer varieties.  Since integer varieties are
2604
          defined to be represented in twos-complement the result of these
2605
          operations are well defined.</para>
2606
      </sect2>
2607
 
2608
      <sect2 id="S72">
2609
        <title>Floating-point operations, ROUNDING_MODE</title>
2610
 
2611
        <para>All of the floating-point (including complex) operations include
2612
          ERROR-TREATMENTs. If the result of a floating-point operation cannot
2613
          be represented in the desired FLOATING_VARIETY, the error treatment
2614
          is invoked. If the ERROR_TREATMENT is wrap or impossible, the result
2615
          is undefined; otherwise the jump operates in the same way as for
2616
          integer operations. Both floating_plus and floating_mult are defined
2617
          as n-ary operations. In general, floating addition and
2618
          multiplication are not associative, but a producer may not care
2619
          about the order in which they are to be performed. Making them
2620
          appear as though they were associative allows the translator to
2621
          choose an order which is convenient to the hardware.</para>
2622
 
2623
        <para>Conversions from integer to floating are done by float_int and
2624
          from floating to integers by round_with_mode . This latter
2625
          constructor has a parameter of SORT ROUNDING_MODE which effectively
2626
          gives the IEEE rounding mode to be applied to the float to produce
2627
          its integer result.</para>
2628
 
2629
        <para>One can extract the real and imaginary parts of a complex
2630
          FLOATING using real_part and imaginary_part. A complex FLOATING can
2631
          be constructed using make_complex. Normal complex arithmetic applies
2632
          to all the other FLOATING constructors except for those explicitly
2633
          excluded (eg floating_abs, floating_max etc.)</para>
2634
      </sect2>
2635
 
2636
      <sect2 id="change_bitfield_to_int">
2637
        <title>change_bitfield_to_int, change_int_to_bitfield</title>
2638
 
2639
        <para>There are two bit-field operation, change_bitfield_to_int and
2640
          change_int_to_bitfield to transform between bit-fields and integers.
2641
          If the varieties do not fit the result is undefined; the producer
2642
          can always get it right.</para>
2643
      </sect2>
2644
 
2645
      <sect2 id="make_compound">
2646
        <title>make_compound, make_nof, n_copies</title>
2647
 
2648
        <para>There is one operation to make values of COMPOUND SHAPE,
2649
          make_compound:
2650
 
2651
          <programlisting>
2652
arg1: EXP OFFSET(base, y)
2653
arg2: LIST(EXP)
2654
      -&gt; EXP COMPOUND(sz)
2655
          </programlisting>
2656
 
2657
          The OFFSET <type>arg1</type> is evaluated as a translate-time
2658
          constant to give <parameter>sz</parameter>, the size of the compound
2659
          object.  The EXPs of arg2 are alternately OFFSETs (also
2660
          translate-time constants) and values which will be placed at those
2661
          offsets. This constructor is used to construct values given by
2662
          structure displays; in C, these only occur with constant
2663
          <constant>val[i]</constant> in global definitions. It is also used
2664
          to provide union injectors; here <parameter>sz</parameter> would be
2665
          the size of the union and the list would probably two elements with
2666
          the first being an offset_zero.</para>
2667
 
2668
        <para>Constant sized array values may be constructed using make_nof,
2669
          make_nof_int, and n_copies. Again, they only occur in C as constants
2670
          in global definitions.</para>
2671
      </sect2>
2672
    </sect1>
2673
 
2674
    <sect1 id="constants">
2675
      <title>Constants</title>
2676
 
2677
      <para>The representation of constants clearly has peculiar difficulties
2678
        in any architecture neutral format. Leaving aside any problems of how
2679
        numbers are to be represented, we also have the situation where a
2680
        "constant" can have different values on different platforms. An
2681
        obvious example would be the size of a structure which, although it is
2682
        a constant of any particular run of a program, may have different
2683
        values on different machines. Further, this constant is in general the
2684
        result of some computation involving the sizes of its components which
2685
        are not known until the platform is chosen. In TDF, sizes are always
2686
        derived from some EXP OFFSET constructed using the various OFFSET
2687
        arithmetic operations on primitives like shape_offset and offset_zero.
2688
        Most such EXP OFFSETs produced are in fact constants of the platform;
2689
        they include field displacements of structure as well as their sizes.
2690
        TDF assumes that, if these EXPs can be evaluated at translate-time
2691
        (i.e. when the sizes and alignments of primitive objects are known),
2692
        then they must be evaluated there. An example of why this is so arises
2693
        in make_compound; the SHAPE of its result EXP depends on its
2694
        <type>arg1</type> EXP OFFSET parameter and all SHAPEs must be
2695
        translate-time values.</para>
2696
 
2697
      <para>An initialisation of a TAGDEF is a constant in this sense
2698
 
2699
        <footnote>
2700
          <para>However see also initial_value in
2701
            <link linkend="definitions-and-declarations">section 3.2</link>.
2702
          </para>
2703
        </footnote>
2704
 
2705
        ; this allows one to ignore any difficulties about their order of
2706
        evaluation in the UNIT and consequently the order of evaluation of
2707
        UNITs.  Once again all the EXPs which are initialisations must be
2708
        evaluated before the program is run; this obviously includes any
2709
        make_proc or make_general_proc. . The limitation on an initialisation
2710
        EXP to ensure this is basically that one cannot take the contents of a
2711
        variable declared outside the EXP after all tokens and conditional
2712
        evaluation is taken into account. In other words, each TDF translator
2713
        effectively has an TDF interpreter which can do evaluation of
2714
        expressions (including conditionals etc.) involving only constants
2715
        such as numbers, sizes and addresses of globals. This corresponds very
2716
        roughly to the kind of initialisations of globals that are permissible
2717
        in C; for a more precise definition, see (<fix>S7.3</fix>).</para>
2718
 
2719
      <sect2 id="cond-constructors">
2720
        <title>_cond constructors</title>
2721
 
2722
        <para>Another place where translate-time evaluation of constants is
2723
          mandated is in the various _cond constructors which give a kind of
2724
          "conditional compilation" facility; every SORT which has a SORTNAME,
2725
          other that TAG, TOKEN and LABEL, has one of these constructors e.g.
2726
          exp_cond:
2727
 
2728
          <programlisting>
2729
control: EXP INTEGER(v)
2730
e1:      BITSTREAM EXP x
2731
e2:      BITSTREAM EXP y
2732
         -&gt; EXP x or EXP y
2733
          </programlisting>
2734
 
2735
          The constant, <type>control</type>, is evaluated at translate time.
2736
          If it is not zero the entire construction is replaced by the EXP in
2737
          <type>e1</type>; otherwise it is replaced by the one in
2738
          <type>e2</type>. In either case, the other BITSTREAM is totally
2739
          ignored; it even does not need to be sensible TDF. This kind of
2740
          construction is use extensively in C pre-processing directives e.g.:
2741
 
2742
          <programlisting>
2743
#if (sizeof(int) == sizeof(long)) ...
2744
          </programlisting>
2745
        </para>
2746
      </sect2>
2747
 
2748
      <sect2 id="primitive-constant-constructors">
2749
        <title>Primitive constant constructors</title>
2750
 
2751
        <para>Integer constants are constructed using make_int:
2752
 
2753
          <programlisting>
2754
v:     VARIETY
2755
value: SIGNED_NAT
2756
       -&gt; EXP INTEGER(v)
2757
          </programlisting>
2758
 
2759
          The SIGNED_NAT <type>value</type> is an encoding of the binary value
2760
          required for the integer; this value must lie within the limits
2761
          given by <type>v</type>. I have been rather slip-shod in writing
2762
          down examples of integer constants earlier in this document; where I
2763
          have written 1 as an integer EXP, for example, I should have written
2764
          make_int(v, 1) where v is some appropriate VARIETY.</para>
2765
 
2766
        <para>Constants for both floats and strings use STRINGs. A constant
2767
          string is just an particular example of make_nof_int:
2768
 
2769
          <programlisting>
2770
v:   VARIETY
2771
str: STRING(k, n)
2772
     -&gt; EXP NOF(n, INTEGER(v))
2773
          </programlisting>
2774
 
2775
          Each unsigned integer in <type>str</type> must lie in the variety
2776
          <type>v</type> and the result is the constant array whose elements
2777
          are the integers considered to be of VARIETY <type>v</type>. An
2778
          ASCI-C constant string might have <type>v</type> = variety(-128,127)
2779
          and <parameter>k</parameter> = 7; however, make_nof_int can be used
2780
          to make strings of any INTEGER VARIETY; a the elements of a Unicode
2781
          string would be integers of size 16 bits.</para>
2782
 
2783
        <para>A floating constant uses a STRING which contains the ASCI
2784
          characters of a expansion of the number to some base in
2785
          make_floating:
2786
 
2787
          <programlisting>
2788
f:        FLOATING_VARIETY
2789
rm:       ROUNDING_MODE
2790
sign:     BOOL
2791
mantissa: STRING(k, n)
2792
base:     NAT
2793
exponent: SIGNED_NAT
2794
          -&gt; EXP FLOATING(f)
2795
          </programlisting>
2796
 
2797
          For a normal floating point number, each integer in
2798
          <type>mantissa</type> is either the ASCII `.'-symbol or the ASCII
2799
          representation of a digit of the representation in the given
2800
          <type>base</type>; i.e. if c is the ASCII symbol, the digit value is
2801
          c-'0'. The resulting floating point number has SHAPE FLOATING(f) and
2802
          value <type>mantissa</type> * <type>base</type>
2803
          <type>exponent</type> rounded according to <type>rm</type>. Usually
2804
          the base will be 10 (sometimes 2) and the rounding mode to_nearest.
2805
          Any floating-point evaluation of expressions done at translate-time
2806
          will be done to an accuracy greater that implied by the
2807
          FLOATING_VARIETY involved, so that floating constants will be as
2808
          accurate as the platform permits.</para>
2809
 
2810
        <para>The make_floating construct does not apply apply to a complex
2811
          FLOATING_VARIETY <type>f</type>; to construct a complex constant use
2812
          make_complex with two make_floating arguments.</para>
2813
 
2814
        <para>Constants are also provided to give unique null values for
2815
          pointers, label values and procs i.e.: make_null_ptr,
2816
          make_null_local_lv and make_null_proc. Any significant use of these
2817
          values (e.g. taking the contents of a null pointer) is undefined,
2818
          but they can be assigned and used in tests in the normal way.</para>
2819
      </sect2>
2820
    </sect1>
2821
 
2822
    <sect1 id="tokens-and-apis">
2823
      <title>Tokens and APIs</title>
2824
 
2825
      <para>All of the examples of the use of TOKENs so far given have really
2826
        been as abbreviations for commonly used constructs, e.g. the EXP
2827
        OFFSETS for fields of structures. However, the real justification for
2828
        TOKENs are their use as abstractions for things defined in libraries
2829
        or application program interfaces (APIs).</para>
2830
 
2831
      <sect2 id="S79">
2832
        <title>Application programming interfaces</title>
2833
 
2834
        <para>APIs usually do not give complete language definitions of the
2835
          operations and values that they contain; generally, they are defined
2836
          informally in English giving relationships between the entities
2837
          within them.  An API designer should allow implementors the
2838
          opportunity of choosing actual definitions which fit their hardware
2839
          and the possibility of changing them as better algorithms or
2840
          representations become available.</para>
2841
 
2842
        <para>The most commonly quoted example is the representation of the
2843
          type FILE and its related operations in C. The ANSI C definition
2844
          gives no common representation for FILE; its implementation is
2845
          defined to be platform-dependent. A TDF producer can assume nothing
2846
          about FILE; not even that it is a structure. The only things that
2847
          can alter or create FILEs are also entities in the Ansi-C API and
2848
          they will always refer to FILEs via a C pointer.  Thus TDF abstracts
2849
          FILE as a SHAPE TOKEN with no parameters, make_tok(T_FILE) say. Any
2850
          program that uses FILE would have to include a TOKDEC introducing
2851
          T_FILE:
2852
 
2853
          <programlisting>
2854
make_tokdec(T_FILE, empty, shape())
2855
          </programlisting>
2856
 
2857
          and anywhere that it wished to refer to the SHAPE of FILE it would
2858
          do:
2859
 
2860
          <programlisting>
2861
shape_apply_token(make_tok(T_FILE), ())
2862
          </programlisting>
2863
 
2864
          Before this program is translated on a given platform, the actual
2865
          SHAPE of FILE must be supplied. This would be done by linking a TDF
2866
          CAPSULE which supplies the TOKDEF for the SHAPE of FILE which is
2867
          particular to the target platform.</para>
2868
 
2869
        <para>Many of the C operations which use FILEs are explicitly allowed
2870
          to be expanded as either procedure calls or as macros.  For example,
2871
          putc(c,f) may be implemented either as a procedure call or as the
2872
          expansion of macro which uses the fields of f directly. Thus, it is
2873
          quite natural for putc(c, f) to be represented in TDF as an EXP
2874
          TOKEN with two EXP parameters which allows it to be expanded in
2875
          either way. Of course, this would be quite distinct from the use of
2876
          putc as a value (as a proc parameter of a procedure for example)
2877
          which would require some other representation. One such
2878
          representation that comes to mind might be to simply to make a
2879
          TAGDEC for the putc value, supplying its TAGDEF in the Ansi API
2880
          CAPSULE for the platform. This might prove to be rather
2881
          short-sighted, since it denies us the possibility that the putc
2882
          value itself might be expanded from other values and hence it would
2883
          be better as another parameterless TOKEN. I have not come across an
2884
          actual API expansion for the putc value as other than a simple TAG;
2885
          however the FILE* value stdin is sometimes expressed as:
2886
 
2887
          <programlisting>
2888
#define stdin &amp;_iob[0]
2889
          </programlisting>
2890
 
2891
          which illustrates the point. It is better to have all of the
2892
          interface of an API expressed as TOKENs to give both generality and
2893
          flexibility across different platforms.</para>
2894
      </sect2>
2895
 
2896
      <sect2 id="S80">
2897
        <title>Linking to APIs</title>
2898
 
2899
        <para>In general, each API requires platform-dependent definitions to
2900
          be supplied by a combination of TDF linking and system linking for
2901
          that platform. This is illustrated in the following diagram giving
2902
          the various phases involved in producing a runnable program.</para>
2903
 
2904
        <mediaobject>
2905
          <imageobject>
2906
            <imagedata fileref="images/guide3.png" format="PNG"/>
2907
          </imageobject>
2908
          <textobject>
2909
            <phrase>TBD</phrase>
2910
          </textobject>
2911
          <caption>
2912
            <para>TBD</para>
2913
          </caption>
2914
        </mediaobject>
2915
 
2916
        <para>There will be CAPSULEs for each API on each platform giving the
2917
          expansions for the TOKENs involved, usually as uses of identifiers
2918
          which will be supplied by system linking from some libraries. These
2919
          CAPSULEs would be derived from the header files on the platform for
2920
          the API in question, usually using some automatic tools. For
2921
          example, there will be a TDF CAPSULE (derived from &lt;stdio.h&gt;)
2922
          which defines the TOKEN T_FILE as the SHAPE for FILE, together with
2923
          definitions for the TOKENs for putc, stdin, etc., in terms of
2924
          identifiers which will be found in the library libc.a.</para>
2925
 
2926
        <sect3 id="S81">
2927
          <title>Target independent headers, unique_extern</title>
2928
 
2929
          <para>Any producer which uses an API will use system independent
2930
            information to give the common interface TOKENs for this API. In
2931
            the C producer, this is provided by header files using pragmas,
2932
            which tell the producer which TOKENs to use for the particular
2933
            constructs of the API . In any target-independent CAPSULE which
2934
            uses the API, these TOKENs would be introduced as TOKDECs and made
2935
            globally accessible by using make_linkextern. For a world-wide
2936
            standard API, the EXTERNAL "name" for a TOKEN used by
2937
            make_linkextern should be provided by an application of
2938
            unique_extern on a UNIQUE drawn from a central repository of names
2939
            for entities in standard APIs; this repository would form a kind
2940
            of super-standard for naming conventions in all possible APIs. The
2941
            mechanism for controlling this super-standard has yet to be set
2942
            up, so at the moment all EXTERN names are created by
2943
            string_extern.</para>
2944
 
2945
          <para>An interesting example in the use of TOKENs comes in
2946
            abstracting field names. Often, an API will say something like
2947
            "the type Widget is a structure with fields alpha, beta ..."
2948
            without specifying the order of the fields or whether the list of
2949
            fields is complete. The field selection operations for Widget
2950
            should then be expressed using EXP OFFSET TOKENs; each field would
2951
            have its own TOKEN giving its offset which will be filled in when
2952
            the target is known. This gives implementors on a particular
2953
            platform the opportunity to reorder fields or add to them as they
2954
            like; it also allows for extension of the standard in the same
2955
            way.</para>
2956
 
2957
          <para>The most common SORTs of TOKENs used for APIs are SHAPEs to
2958
            represent types, and EXPs to represent values, including
2959
            procedures and constants. NATs and VARIETYs are also sometimes
2960
            used where the API does not specify the types of integers
2961
            involved. The other SORTs are rarely used in APIs; indeed it is
2962
            difficult to imagine <emphasis>any</emphasis> realistic use of
2963
            TOKENs of SORT BOOL. However, the criterion for choosing which
2964
            SORTs are available for TOKENisation is not their immediate
2965
            utility, but that the structural integrity and simplicity of TDF
2966
            is maintained. It is fairly obvious that having BOOL TOKENs will
2967
            cause no problems, so we may as well allow them.</para>
2968
        </sect3>
2969
      </sect2>
2970
 
2971
      <sect2 id="S82">
2972
        <title>Language programming interfaces</title>
2973
 
2974
          <para>So far, I have been speaking as though a TOKENised API could
2975
            only be some library interface, built on top of some language,
2976
            like xpg3, posix, X etc. on top of C. However, it is possible to
2977
            consider the constructions of the language itself as ideal
2978
            candidates for TOKENisation. For example, the C for-statement
2979
            could be expressed as TOKEN with four parameters. This TOKEN could
2980
            be expanded in TDF in several different ways, all giving the
2981
            correct semantics of a for-statement. A translator (or other
2982
            tools) could choose the expansion it wants depending on context
2983
            and the properties of the parameters. The C producer could give a
2984
            default expansion which a lazy translator writer could use, but
2985
            others might use expansions which might be more advantageous. This
2986
            idea could be extended to virtually all the constructions of the
2987
            language, giving what is in effect a C-language API; perhaps this
2988
            might be called more properly a language programming interface
2989
            (LPI). Thus, we would have TOKENs for C for-statements, C
2990
            conditionals, C procedure calls, C procedure definitions etc.
2991
 
2992
            <footnote>
2993
              <para>Exercise for the reader: what are the SORTs of these
2994
                parameters?</para>
2995
 
2996
              <para>The current C producer does this for some of the
2997
                constructs, but not in any systematic manner; perhaps it will
2998
                change.</para>
2999
            </footnote>
3000
          </para>
3001
 
3002
          <para>The notion of a producer for any language working to an LPI
3003
            specific to the constructs of the language is very attractive. It
3004
            could use different TOKENs to reflect the subtle differences
3005
            between uses of similar constructs in different languages which
3006
            might be difficult or impossible to detect from their expansions,
3007
            but which could allow better optimisations in the object code.
3008
            For example, Fortran procedures are slightly different from C
3009
            procedures in that they do not allow aliasing between parameters
3010
            and globals. While application of the standard TDF procedure calls
3011
            would be semantically correct, knowledge of that the non-aliasing
3012
            rule applies would allow some procedures to be translated to more
3013
            efficient code. A translator without knowledge of the semantics
3014
            implicit in the TOKENs involved would still produce correct code,
3015
            but one which knew about them could take advantage of that
3016
            knowledge.</para>
3017
 
3018
          <para>I also think that LPIs would be a very useful tool for
3019
            crystalising ideas on how languages should be translated, allowing
3020
            one to experiment with expansions not thought of by the producer
3021
            writer. This decoupling is also an escape clause allowing the
3022
            producer writer to defer the implementation of a construct
3023
            completely to translate-time or link-time, as is done at the
3024
            moment in C for off-stack allocation. As such it also serves as a
3025
            useful test-bed for TOKEN constructions which may in future become
3026
            new constructors of core TDF.</para>
3027
      </sect2>
3028
    </sect1>
3029
 
3030
    <sect1 id="tdf-transformations">
3031
      <title>TDF transformations</title>
3032
 
3033
      <para>TDF to TDF transformations form the basis of most of the tools of
3034
        TDF, including translators. TDF has a rich set of easily performed
3035
        transformations; this is mainly due to its algebraic nature, the
3036
        liberality of its sequencing rules, and the ease with which one can
3037
        introduce new names over limited scopes. For example, a translator is
3038
        always free to transform:
3039
 
3040
        <programlisting>
3041
assign(e1, e2)
3042
        </programlisting>
3043
 
3044
  to:
3045
 
3046
        <programlisting>
3047
identify(empty, new_tag, e1, assign(obtain_tag(new_tag), e2))
3048
        </programlisting>
3049
 
3050
        i.e. identify the evaluation of the left-hand side of the assignment
3051
        with a new TAG and use that TAG as the left-hand operand of a new
3052
        assignment in the body of the identification. Note that the reverse
3053
        transformation is only valid if the evaluation of e1 does not
3054
        side-effect the evaluation of e2. A producer would have had to use the
3055
        second form if it wished to evaluate e1 before e2.  The definition of
3056
        assign allows its operands to be evaluated in any order while identify
3057
        insists that the evaluation of its <type>definition</type> is
3058
        conceptually complete before starting on its <type>body</type>.</para>
3059
 
3060
      <para>Why would a translator wish to make the more complicated form from
3061
        the simpler one? This would usually depend on the particular forms of
3062
        e1 and e2 as well as the machine idioms available for implementing the
3063
        assignment. If, for example, the joint evaluation of e1 and e2 used
3064
        more evaluation registers than is available, the transformation is
3065
        probably a good idea. It would not necessarily commit one to putting
3066
        the new tag value into the stack; some other more global criteria
3067
        might lead one to allocate it into a register disjoint from the
3068
        evaluation registers. In general, this kind of transformation is used
3069
        to modify the operands of TDF constructions so that the
3070
        code-production phase of the translator can just "churn the handle"
3071
        knowing that the operands are already in the correct form for the
3072
        machine idioms.</para>
3073
 
3074
      <para>Transformations like this are also used to give optimisations
3075
        which are largely independent of the target architecture. In general,
3076
        provided that the sequencing rules are not violated, any EXP
3077
        construction, F(X), say, where X is some inner EXP, can be replaced
3078
        by:
3079
 
3080
        <programlisting>
3081
identify(empty, new_tag, X, F(obtain_tag(new_tag))).
3082
        </programlisting>
3083
 
3084
        This includes the extraction of expressions which are constant over a
3085
        loop; if F was some repeat construction and one can show that the EXP
3086
        X is invariant over the repeat, the transformation does the constant
3087
        extraction.</para>
3088
 
3089
      <para>Most of the transformations performed by translators are of the
3090
        above form, but there are many others. Particular machine idioms might
3091
        lead to transformations like changing a test (i&gt;=1) to (i&gt;0)
3092
        because the test against zero is faster; replacing multiplication by a
3093
        constant integer by shifts and adds because multiplication is slow;
3094
        and so on. Target independent transformations include things like
3095
        procedure inlining and loop unrolling. Often these target independent
3096
        transformations can be profitably done in terms of the TOKENs of an
3097
        LPI; loop unrolling is an obvious example.</para>
3098
 
3099
      <sect2 id="transformations-as-definitions">
3100
        <title>Transformations as definitions</title>
3101
 
3102
        <para>As well being a vehicle for expressing optimisation, TDF
3103
          transformations can be used as the basis for defining TDF. In
3104
          principle, if we were to define all of the allowable transformations
3105
          of the TDF constructions, we would have a complete definition of TDF
3106
          as the initial model of the TDF algebra. This would be a fairly
3107
          impracticable project, since the totality of the rules including all
3108
          the simple constructs would be very unwieldy, difficult to check for
3109
          inconsistencies and would not add much illumination to the
3110
          definition. However, knowledge of allowable transformations of TDF
3111
          can often answer questions of the meaning of diverse constructs by
3112
          relating them to a single construct. What follows is an alphabet of
3113
          generic transformations which can often help to answer knotty
3114
          questions.  Here, E[X \ Y] denotes an EXP E with all internal
3115
          occurrences of X replaced by Y.</para>
3116
 
3117
        <para>If F is any non order-specifying
3118
 
3119
          <footnote>
3120
            <para>The order-specifying constructors are conditional,
3121
              identify, repeat, labelled, sequence and
3122
              variable.</para>
3123
          </footnote>
3124
 
3125
          EXP constructor and E is one of the EXP operands of F, then:
3126
 
3127
          <programlisting>
3128
F(..., E, ...) xde identify(empty, newtag, E, F(..., obtain_tag(newtag), ...))
3129
          </programlisting>
3130
        </para>
3131
 
3132
        <para>If E is a non side-effecting
3133
          <footnote>
3134
            <para>A sufficient condition for not side-effecting in this sense
3135
              is that there are no apply_procs or local_allocs in E; that any
3136
              assignments in E are to variables defined in E; and that any
3137
              branches in E are to labels defined in conditionals in
3138
              E.</para>
3139
          </footnote>
3140
 
3141
          EXP and none of the variables used in E are assigned to in B:
3142
          identify(v, tag, E, B) xde B[obtain_tag(tag) \ E]</para>
3143
 
3144
        <para>If all uses of tg in B are of the form contents(shape(E),
3145
          obtain_tag(tg)):
3146
 
3147
          <programlisting>
3148
variable(v, tg, E, B) xde identify(v, nt, E, B[contents(shape(E), obtain_tag(tg)) \ obtain_tag(nt)])
3149
sequence((S<subscript>1</subscript>, ..., S<subscript>n</subscript>),
3150
         sequence((P<subscript>1</subscript>, ..., P<subscript>m</subscript>), R) xdb
3151
         sequence((S<subscript>1</subscript>, ..., S<subscript>n</subscript>, P<subscript>1</subscript>, ..., P<subscript>m</subscript>), R)
3152
          </programlisting>
3153
        </para>
3154
 
3155
        <para>If S<subscript>i</subscript> =
3156
          sequence((P<subscript>1</subscript>, ...,
3157
          P<subscript>m</subscript>), R):
3158
 
3159
          <programlisting>
3160
sequence((S<subscript>1</subscript>, ..., S<subscript>n</subscript>), T) xdb
3161
         sequence((S<subscript>1</subscript>, ...,
3162
                  S<subscript>i-1</subscript>, P<subscript>1</subscript>, ..., P<subscript>m</subscript>, R,
3163
                  S<subscript>i+1</subscript>, ..., S<subscript>n</subscript>), T) E xdb sequence((), E)
3164
          </programlisting>
3165
        </para>
3166
 
3167
        <para>If D is either identify or variable:
3168
 
3169
          <programlisting>
3170
D(v, tag, sequence((S<subscript>1</subscript>, ..., S<subscript>n</subscript>), R), B) xde
3171
                   sequence((S<subscript>1</subscript>, ..., S<subscript>n</subscript>), D(v, tag, R, B))
3172
          </programlisting>
3173
        </para>
3174
 
3175
        <para>If S<subscript>i</subscript> is an EXP BOTTOM, then:
3176
 
3177
          <programlisting>
3178
sequence((S<subscript>1</subscript>, S<subscript>2</subscript>, ... S<subscript>n</subscript>), R) xde
3179
sequence((S<subscript>1</subscript>, ...  S<subscript>i - 1</subscript>), S<subscript>i</subscript>)
3180
          </programlisting>
3181
        </para>
3182
 
3183
 
3184
        <para>If E is an EXP BOTTOM, and if D is either identify or variable:
3185
 
3186
          <programlisting>
3187
D(v, tag, E, B) xde E
3188
          </programlisting>
3189
        </para>
3190
 
3191
        <para>If S<subscript>i</subscript> is make_top(), then:
3192
 
3193
          <programlisting>
3194
sequence((S<subscript>1</subscript>,
3195
S<subscript>2</subscript>, ...  S<subscript>n</subscript>), R) xdb
3196
sequence((S<subscript>1</subscript>, ...  S<subscript>i - 1</subscript>,
3197
S<subscript>i + 1</subscript>, ...S<subscript>n</subscript>), R)
3198
          </programlisting>
3199
        </para>
3200
 
3201
        <para>If S<subscript>n</subscript> is an EXP TOP:
3202
          <programlisting>
3203
sequence((S<subscript>1</subscript>, ...
3204
          S<subscript>n</subscript>), make_top()) xdb sequence((S<subscript>1</subscript>, ...,
3205
          S<subscript>n - 1</subscript>), S<subscript>n</subscript>)
3206
          </programlisting>
3207
        </para>
3208
 
3209
        <para>If E is an EXP TOP and E is not side-effecting then:
3210
          <programlisting>
3211
E xde make_top()
3212
          </programlisting>
3213
        </para>
3214
 
3215
        <para>If C is some non order-specifying and non side-effecting
3216
          constructor, and S<subscript>i</subscript> is C(P<subscript>1</subscript>,...,
3217
          P<subscript>m</subscript>) where P<subscript>1..m</subscript> are the EXP operands of C:
3218
          <programlisting>
3219
sequence((S<subscript>1</subscript>, ..., S<subscript>n</subscript>), R) xde
3220
sequence((S<subscript>1</subscript>, ..., S<subscript>i - 1</subscript>, P<subscript>1</subscript>,
3221
          ..., P<subscript>m</subscript>, S<subscript>i + 1</subscript>, ..., S<subscript>n</subscript>), R)
3222
          </programlisting>
3223
        </para>
3224
 
3225
        <para>If none of the S<subscript>i</subscript> use the label L:
3226
          <programlisting>
3227
conditional(L,
3228
            sequence((S<subscript>1</subscript>, ..., S<subscript>n</subscript>), R), A) xde
3229
            sequence((S<subscript>1</subscript>, ..., S<subscript>n</subscript>), conditional(L, R, A))
3230
          </programlisting>
3231
        </para>
3232
 
3233
        <para>If there are no uses of L in X
3234
 
3235
          <footnote>
3236
            <para>There are analogous rules for labelled and repeat with
3237
              unused LABELs.</para>
3238
          </footnote>
3239
 
3240
          :
3241
          <programlisting>
3242
conditional(L, X, Y) xde X conditional(L, E, goto(Z)) xde E[L \ Z]
3243
          </programlisting>
3244
        </para>
3245
 
3246
        <para>If EXP X contains no use of the LABEL L:
3247
          <programlisting>
3248
conditional(L, conditional(M, X, Y), Z) xde conditional(M, X, conditional(L, Y, Z))
3249
repeat(L, I, E) xde sequence((I), repeat(L, make_top(), E))
3250
repeat(L, make_top(), E) xde conditional(Z, E[L \ Z], repeat(L, make_top(), E))
3251
          </programlisting>
3252
        </para>
3253
 
3254
        <para>If there are no uses of L in E:
3255
          <programlisting>
3256
repeat(L, make_top(), sequence((S, E), make_top())) xde
3257
    conditional(Z, S[L \ Z], repeat(L, make_top(), sequence((E, S), make_top())))
3258
          </programlisting>
3259
        </para>
3260
 
3261
        <para>If f is a procedure defined
3262
 
3263
          <footnote>
3264
            <para>This has to be modified if B contains any uses of
3265
              local_free_all or last_local.</para>
3266
          </footnote>
3267
 
3268
          as:
3269
 
3270
          <programlisting>
3271
make_proc(rshape, formal<subscript>1 .. n</subscript>, vtg,
3272
          B(return R<subscript>1</subscript>, ..., return R<subscript>m</subscript>))
3273
          </programlisting>
3274
 
3275
          where:
3276
 
3277
          <programlisting>
3278
formal<subscript>i</subscript> = make_tagshacc(s<subscript>i</subscript>, vi, tgi)
3279
          </programlisting>
3280
 
3281
          and B is an EXP with all of its internal return constructors indicated
3282
          parametrically then:
3283
 
3284
          <programlisting>
3285
if Ai has SHAPE si apply_proc(rshape, f, (A1, ..., An), V) xde
3286
variable(empty, newtag, make_value((rshape=BOTTOM) ?  TOP : rshape),
3287
         labelled((L), variable(v1, tg1, A1, ...,
3288
                  variable(vn, tgn, An,
3289
                  variable(empty, vtg, V,
3290
                  B(sequence(assign(obtain_tag(newtag), R1), goto(L)),
3291
                    ...,
3292
                    sequence(assign(obtain_tag(newtag), Rm), goto(L))))),
3293
                  contents(rshape, obtain_tag(newtag))))
3294
assign(E, make_top()) xde sequence((E), make_top())
3295
contents(TOP, E) xde sequence((E), make_top())
3296
make_value(TOP) xde make_top()
3297
component(s, contents(COMPOUND(S), E), D) xde contents(s, add_to_ptr(E, D))
3298
make_compound(S, ((E1, D1), ..., (En, Dn))) xde variable(empty, nt, make_value(COMPOUND(S)),
3299
    sequence((assign(add_to_ptr(obtain_tag(nt), D1), E1), ...,
3300
              assign(add_to_ptr(obtain_tag(nt), Dn), En)),
3301
             contents(S, obtain_tag(nt))))
3302
          </programlisting>
3303
        </para>
3304
 
3305
        <sect3 id="examples-of-transformations">
3306
          <title>Examples of transformations</title>
3307
 
3308
          <para>Any of these transformations may be performed by the TDF
3309
            translators. The most important is probably {A} which allows one
3310
            to reduce all of the EXP operands of suitable constructors to
3311
            obtain_tags. The expansion rules for identification, {G}, {H} and
3312
            {I}, gives definition to complicated operands as well as strangely
3313
            formed ones, e.g. return(...  return(X)...). Rule {A} also
3314
            illustrates neatly the lack of ordering constraints on the
3315
            evaluation of operands. For example, mult(et, exp1, exp2) could be
3316
            expanded by applications of {A} to either:
3317
 
3318
            <programlisting>
3319
identify(empty, t1, exp1, identify(empty, t2, exp2, mult(et, obtain_tag(t1), obtain_tag(t2))))
3320
            </programlisting>
3321
 
3322
            or:
3323
 
3324
            <programlisting>
3325
identify(empty, t2, exp2, identify(empty, t1, exp1, mult(et, obtain_tag(t1), obtain_tag(t2))))
3326
            </programlisting>
3327
 
3328
            Both orderings of the evaluations of exp1 and exp2 are acceptable,
3329
            regardless of any side-effects in them. There is no requirement
3330
            that both expansions should produce the same answer for the
3331
            multiplications; the only person who can say whether either result
3332
            is "wrong" is the person who specified the program.</para>
3333
 
3334
          <para>Many of these transformations often only come into play when
3335
            some previous transformation reveals some otherwise hidden
3336
            information. For example, after procedure in-lining given by {U}
3337
            or loop un-rolling given by {S}, a translator can often deduce the
3338
            behaviour of a _test constructor, replacing it by either a
3339
            make_top or a goto. This may allow one to apply either {J} or {H}
3340
            to eliminate dead code in sequences and in turn {N} or {P} to
3341
            eliminate entire conditions and so on.</para>
3342
 
3343
          <para>Application of transformations can also give expansions which
3344
            are rather pathological in the sense that a producer is very
3345
            unlikely to form them. For example, a procedure which returns no
3346
            result would have result statements of the form
3347
            return(make_top()). In-lining such a procedure by {U} would have a
3348
            form like:
3349
 
3350
            <programlisting>
3351
variable(empty, nt, make_shape(TOP),
3352
         labelled((L),
3353
         ... sequence((assign(obtain_tag(nt), make_top())),
3354
goto(L)) ...
3355
         contents(TOP, obtain_tag(nt))
3356
         )
3357
)
3358
            </programlisting>
3359
 
3360
            The rules {V}, {W} and {X} allow this to be replaced by:
3361
 
3362
            <programlisting>
3363
variable(empty, nt, make_top(),
3364
         labelled((L),
3365
         ... sequence((obtain_tag(nt)), goto(L)) ...
3366
         sequence((obtain_tag(nt)), make_top())
3367
         )
3368
)
3369
            </programlisting>
3370
 
3371
            The obtain_tags can be eliminated by rule {M} and then the
3372
            sequences by {F}. Sucessive applications of {C} and {B} then give:
3373
 
3374
            <programlisting>
3375
labelled((L),
3376
         ... goto(L) ...
3377
         make_top()
3378
        )
3379
            </programlisting>
3380
          </para>
3381
        </sect3>
3382
 
3383
        <sect3 id="S86">
3384
          <title>Programs with undefined values</title>
3385
 
3386
          <para>The definitions of most of the constructors in the TDF
3387
            specification are predicated by some conditions; if these
3388
            conditions are not met the effect and result of the constructor is
3389
            not defined for all possible platforms.
3390
 
3391
            <footnote>
3392
              <para>However, we may find that the mapping of a constraint
3393
                allows extra relationships for a class of architectures which
3394
                do not hold in all generality; this may mean that some
3395
                constructions are defined on this class while still being
3396
                undefined in others (see
3397
                <link linkend="models-of-the-tdf-algebra">section 13</link>).
3398
              </para>
3399
            </footnote>
3400
 
3401
            Any value which is dependent on the effect or result of an
3402
            undefined construction is also undefined. This is not the same as
3403
            saying that a program is undefined if it can construct an
3404
            undefined value - the dynamics of the program might be such that
3405
            the offending construction is never obeyed.</para>
3406
        </sect3>
3407
      </sect2>
3408
    </sect1>
3409
 
3410
    <sect1 id="tdf-expansions-of-offsets">
3411
      <title>TDF expansions of offsets</title>
3412
 
3413
      <para>Consider the C structure defined by:
3414
 
3415
        <programlisting>
3416
typedef struct{ int i; double d; char c;} mystruct;
3417
        </programlisting>
3418
 
3419
        Given that sh_int, sh_char and sh_double are the SHAPEs for int, char
3420
        and double, the SHAPE of <type>mystruct</type> is constructed by:
3421
 
3422
        <programlisting>
3423
SH_mystruct = compound(S_c)
3424
        </programlisting>
3425
 
3426
        where:
3427
 
3428
        <programlisting>
3429
S_c = offset_add(O_c, shape_offset(sh_char))
3430
        </programlisting>
3431
 
3432
        where:
3433
 
3434
        <programlisting>
3435
O_c = offset_pad(alignment(sh_char), S_d)
3436
        </programlisting>
3437
 
3438
        where:
3439
 
3440
        <programlisting>
3441
S_d = offset_add(O_d, shape_offset(sh_double))
3442
        </programlisting>
3443
 
3444
        where:
3445
 
3446
        <programlisting>
3447
O_d = offset_pad(alignment(sh_double), S_i)
3448
        </programlisting>
3449
 
3450
        where
3451
 
3452
        <footnote>
3453
          <para>I could equally have given simply shape_offset(sh_int) for
3454
            S_i, but the above formulation is more uniform with respect to
3455
            selection OFFSETs.</para>
3456
        </footnote>:
3457
 
3458
        <programlisting>
3459
        S_i = offset_add(O_i, shape_offset(sh_int))
3460
        </programlisting>
3461
 
3462
        and:
3463
 
3464
        <programlisting>
3465
        O_i = offset_zero(alignment(sh_int))
3466
        </programlisting>
3467
 
3468
        Each of S_c, S_d and S_i gives the minimum "size" of the space
3469
        required to upto and including the field c, d and i respectively. Each
3470
        of O_c, O_d and O_i gives the OFFSET "displacement" from a pointer to
3471
        a <type>mystruct</type> required to select the fields c, d and i
3472
        respectively. The C program fragment:
3473
 
3474
        <programlisting>
3475
mystruct s;
3476
.... s.d = 1.0; ...
3477
        </programlisting>
3478
 
3479
        would translate to something like:
3480
 
3481
        <programlisting>
3482
variable(empty, tag_s, make_value(compound(S_c)),
3483
         sequence(...,
3484
         assign(add_to_ptr(obtain_tag(tag_s), O_d), 1.0),
3485
         ...
3486
         )
3487
        )
3488
        </programlisting>
3489
 
3490
        Each of the OFFSET expressions above are ideal candidates for
3491
        tokenisation; a producer would probably define tokens for each of them
3492
        and use exp_apply_token to expand them at each of their uses.</para>
3493
 
3494
        <para>From the definition, we find that:
3495
 
3496
        <programlisting>
3497
S_c = shape_offset(SH_mystruct)
3498
i.e. an OFFSET(alignment(sh_int) xc8  alignment(sh_char) xc8  alignment(sh_double), {})
3499
        </programlisting>
3500
 
3501
        This would not be the OFFSET required to describe
3502
        <type>sizeof(mystruct)</type> in C, since this is defined to be the
3503
        difference between successive elements an array of
3504
        <type>mystruct</type>s. The <type>sizeof</type> OFFSET would have to
3505
        pad S_c to the alignment of SH_mystruct:
3506
 
3507
        <programlisting>
3508
offset_pad(alignment(SH_mystruct), S_c)
3509
        </programlisting>
3510
 
3511
        This is the OFFSET that one would use to compute the displacement of
3512
        an element of an array of <type>mystruct</type>s using offset_mult
3513
        with the index.</para>
3514
 
3515
      <para>The most common use of OFFSETs is in add_to_ptr to compute the
3516
        address of a structure or array element. Looking again at its
3517
        signature in a slightly different form:
3518
 
3519
        <programlisting>
3520
arg1: EXP POINTER(y xc8 A)
3521
arg2: EXP OFFSET(y, z)
3522
      -&gt; EXP POINTER(z)
3523
      ... for any ALIGNMENT A
3524
        </programlisting>
3525
 
3526
        one sees that <type>arg2</type> can measure an OFFSET from a value of
3527
        a "smaller" alignment than the value pointed at by <type>arg1</type>.
3528
        If <type>arg2</type> were O_d, for example, then <type>arg1</type>
3529
        could be a pointer to any structure of the form struct {int i, double
3530
        d,...} not just <type>mystruct</type>. The general principle is that
3531
        an OFFSET to a field constructed in this manner is independent of any
3532
        fields after it, corresponding to normal usage in both languages and
3533
        machines. A producer for a language which conflicts with this would
3534
        have to produce less obvious TDF, perhaps by re-ordering the fields,
3535
        padding the offsets by later alignments or taking maxima of the sizes
3536
        of the fields.</para>
3537
 
3538
      <sect2 id="bitfield-offsets">
3539
        <title>Bitfield offsets</title>
3540
 
3541
        <para>Bitfield offsets are governed by rather stricter rules. In order
3542
          to extract or assign a bitfield, we have to find an integer variety
3543
          which entirely contains the bitfield. Suppose that we wished to
3544
          extract a bitfield by:
3545
 
3546
          <programlisting>
3547
bitfield_contents(v, p:POINTER(X), b:OFFSET(Y, B))
3548
          </programlisting>
3549
 
3550
          Y must be an alignment(I) where I is some integer SHAPE, contained
3551
          in X. Further, this has to be equivalent to:
3552
 
3553
          <programlisting>
3554
bitfield_contents(v, add_ptr(p, d:OFFSET(Y,Y)), b':OFFSET(Y, B))
3555
          </programlisting>
3556
 
3557
          for some d and b' such that:
3558
 
3559
          <programlisting>
3560
offset_pad(v, shape_offset(I)) &gt;= b'
3561
          </programlisting>
3562
 
3563
          and
3564
 
3565
          <programlisting>
3566
offset_add(offset_pad(v, offset_mult(d, sizeof(I)), b') = b
3567
          </programlisting>
3568
 
3569
          Clearly, we have a limitation on the length of bitfields to the
3570
          maximum integer variety available; in addition, we cannot have a
3571
          bitfield which overlaps two such varieties.</para>
3572
 
3573
        <para>The difficulties inherent in this may be illustrated by
3574
          attempting to construct an array of bitfields using the nof
3575
          constructor. Assuming a standard architecture, we find that we
3576
          cannot usefully define an object of SHAPE nof(N,
3577
          bitfield(bfvar_bits(b, M))) without padding this shape out to some
3578
          integer variety which can contain M bits. In addition, they can only
3579
          be usefully indexed (using bitfield_contents)either if M is some
3580
          power of 2 or M*N is less than the length of the maximum integer
3581
          variety. Thus a producer must be sure that these conditions hold if
3582
          he is to generate and index this object simply. Even here he is in
3583
          some dificulty, since he does not know the representational
3584
          varieties of the integers available to him; also it is difficult for
3585
          him to ensure that the alignment of the entire array is in some
3586
          sense minimal. Similar difficulties occur with bitfields in
3587
          structures - they are not restricted to arrays.</para>
3588
 
3589
        <para>The solution to this conundrum in its full generality requires
3590
          knowledge of the available representational varieties. Particular
3591
          languages have restrictions which means that sub-optimal solutions
3592
          will satisfy its specification on the use of bitfields.  For
3593
          example, C is satisfied with bitfields of maximum length 32 and
3594
          simple alignment constraints. However, for the general optimal
3595
          solution, I can see no reasonable alternative to the installer
3596
          defining some tokens to produce bitfield offsets which are
3597
          guaranteed to obey the alignment rules and also give optimal packing
3598
          of fields and alignments of the total object for the platform in
3599
          question. I believe that three tokens are sufficient to do this;
3600
          these are analogous to the constructors offset_zero, offset_pad and
3601
          offset_mult with ordinary alignments and their signatures could be:
3602
 
3603
          <programlisting>
3604
~Bitfield_offset_zero:
3605
n:        NAT
3606
issigned: BOOL
3607
          -&gt; EXP OFFSET(A, bfvar_bits(issigned, n))
3608
          </programlisting>
3609
 
3610
          Here the result is a zero offset to the bitfield with `minimum'
3611
          integer variety alignment A.
3612
 
3613
          <programlisting>
3614
~Bitfield_offset_pad:
3615
n:        NAT
3616
issigned: BOOL
3617
sh:       SHAPE
3618
          -&gt; EXP OFFSET(alignment(sh) xc8  A, bfvar_bits(issigned, n))
3619
          </programlisting>
3620
 
3621
          Here the result is the shape_offset of <type>sh</type> padded with
3622
          the `minimum' alignment A so that it can accomodate the bitfield.
3623
          Note that this may involve padding <type>sh</type> with the
3624
          alignment of the maximum integer variety if there are not enough
3625
          bits left at the end of <type>sh.</type>
3626
 
3627
          <programlisting>
3628
~Bitfield_offset_mult:
3629
n:        NAT
3630
issigned: BOOL
3631
ind:      EXP INTEGER(v)
3632
          -&gt; EXP OFFSET(A, bfvar_bits(issigned, n))
3633
          </programlisting>
3634
 
3635
          Here the result is an offset which gives the displacement of
3636
          <type>ind</type>th element of an array of <type>n</type>-bit
3637
          bitfields with `minimum' alignment A. Note that this will correspond
3638
          to a normal multiplication only if <type>n</type> is a power of 2 or
3639
          <type>ind</type> * <type>n</type> &lt;= length of the maximum
3640
          integer variety.</para>
3641
 
3642
        <para>These tokens can be expressed in TDF if the lengths of the
3643
          available varieties are known, i.e., they are installer defined.
3644
 
3645
          <footnote>
3646
            <para>For most architectures, these definition are dependent only
3647
              on a few constants such as the maximum length of bitfield.,
3648
              expessed as tokens for the target. The precise specification of
3649
              such target dependent tokens is of current interest outside the
3650
              scope of this document.</para>
3651
          </footnote>
3652
 
3653
          They ought to be used in place of offset_zero, offset_pad and
3654
          offset_mult whereever the alignment or shape (required to construct
3655
          a SHAPE or an argument to the bitfield constructs) is a pure
3656
          bitfield.  The constructor nof should never be used on a pure
3657
          bitfield; instead it should be replaced by:
3658
 
3659
          <programlisting>
3660
S = compound(~Bitfield_offset_mult(M, b, N))
3661
          </programlisting>
3662
 
3663
          to give a shape, S, representing an array of N M-bit bitfields. This
3664
          may not be just N*M bits; for example ~Bitfield_offset_mult may be
3665
          implemented to pack an array of 3-bit bitfields as 10 fields to a
3666
          32-bit word. In any case, one would expect that normal rules for
3667
          offset arithmetic are preserved, e.g.
3668
 
3669
          <programlisting>
3670
offset_add(~Bitfield_offset_pad(M, b, S), size(bitfield(bfvar_bits(b, N)))) =
3671
    ~Bitfield_offset_mult(M, b, N + 1)
3672
 
3673
where size(X) = offset_pad(alignment(X), shape_offset(X))
3674
          </programlisting>
3675
        </para>
3676
      </sect2>
3677
    </sect1>
3678
 
3679
    <sect1 id="models-of-the-tdf-algebra">
3680
      <title>Models of the TDF algebra</title>
3681
 
3682
      <para>TDF is a multi-sorted abstract algebra. Any implementation of TDF
3683
        is a model of this algebra, formed by a mapping of the algebra into a
3684
        concrete machine. An algebraic mapping gives a concrete representation
3685
        to each of the SORTs in such a way that the representation of any
3686
        construction of TDF is independent of context; it is a homomorphism.
3687
        In other words if we define the mapping of a TDF constructor, C, as
3688
        MAP[C] and the representation of a SORT, S, as REPR[S] then:
3689
 
3690
        <programlisting>
3691
REPR[C(P1, ..., Pn)] = MAP[C](REPR(P1), ..., REPR(Pn))
3692
        </programlisting>
3693
 
3694
        Any mapping has to preserve the equivalences of the abstract algebra,
3695
        such as those exemplified by the transformations {A} - {Z} in
3696
        <link linkend="transformations-as-definitions">section 11.1</link>.
3697
        Similarly, the mappings of any predicates on the constructions, such
3698
        as those giving "well-formed" conditions, must be satisfied in terms
3699
        of the mapped representations.</para>
3700
 
3701
      <para>In common with most homomorphisms, the mappings of constructions
3702
        can exhibit more equivalences than are given by the abstract algebra.
3703
        The use of these extra equivalences is the basis of most of the
3704
        target-dependent optimisations in a TDF translator; it can make use of
3705
        "idioms" of the target architecture to produce equivalent
3706
        constructions which may work faster than the "obvious" translation. In
3707
        addition, we may find that may find that more predicates are satisfied
3708
        in a mapping than would be in the abstract algebra. a particular
3709
        concrete mapping might allow more constructions to be well-formed than
3710
        are permitted in the abstract; a producer can use this fact to target
3711
        its output to a particular class of architectures. in this case, the
3712
        producer should produce tdf so that any translator not targeted to
3713
        this class can fail gracefully.</para>
3714
 
3715
      <para>giving a complete mapping for a particular architecture here is
3716
        tantamount to writing a complete translator. however, the mappings for
3717
        the small but important sub-algebra concerned with offsets and
3718
        alignments illustrates many of the main principles.  what follows is
3719
        two sets of mappings for disparate architectures; the first gives a
3720
        more or less standard meaning to alignments but the second may be less
3721
        familiar.</para>
3722
 
3723
      <sect2 id="model-for-32bit-architecture">
3724
        <title>Model for a 32-bit standard architecture</title>
3725
 
3726
        <para>Almost all current architectures use a "flat-store" model of
3727
          memory. There is no enforced segregation of one kind of data from
3728
          another - in general, one can access one unit of memory as a linear
3729
          offset from any other. Here, TDF ALIGNMENTs are a reflection of
3730
          constraints for the efficient access of different kinds of data
3731
          objects - usually one finds that 32-bit integers are most
3732
          efficiently accessed if they start at 32 bit boundaries and so
3733
          on.</para>
3734
 
3735
        <sect3 id="S91">
3736
          <title>Alignment model</title>
3737
 
3738
          <para>The representation of ALIGNMENT in a typical standard
3739
            architecture is a single integer where:
3740
 
3741
            <programlisting>
3742
REPR[{ }] = 1
3743
REPR[{bitfield}] = 1
3744
REPR[{char_variety}] = 8
3745
REPR[{short_variety}] = 16
3746
            </programlisting>
3747
 
3748
            Otherwise, for all other primitive ALIGNMENTS a:
3749
 
3750
            <programlisting>
3751
REPR[{a}] = 32
3752
            </programlisting>
3753
 
3754
            The representation of a compound ALIGNMENT is given by:
3755
 
3756
            <programlisting>
3757
REPR[A xc8 B] = Max(REPR[A], REPR[B])
3758
i.e. MAP[unite_alignment] = Max
3759
            </programlisting>
3760
 
3761
            while the ALIGNMENT inclusion predicate is given by:
3762
 
3763
            <programlisting>
3764
REPR[A ... B]= REPR[A] xb3 REPR[B]
3765
            </programlisting>
3766
 
3767
            All the constructions which make ALIGNMENTs are represented here
3768
            and they will always reduce to an integer known at translate-time.
3769
            Note that the mappings for xc8 and ... must preserve the basic
3770
            algebraic properties derived from sets; for example the mapping of
3771
            xc8 must be idempotent, commutative and associative, which is true
3772
            for Max.</para>
3773
        </sect3>
3774
 
3775
        <sect3 id="offset-and-pointer-model">
3776
          <title>Offset and pointer model</title>
3777
 
3778
          <para>Most standard architectures use byte addressing; to address
3779
            bits requires more complication. Hence, a value with SHAPE
3780
            POINTER(A) where REPR[A)]xb9 1 is represented by a 32-bit byte
3781
            address.</para>
3782
 
3783
          <para>We are not allowed to construct pointers where REPR[A] = 1,
3784
            but we still have offsets whose second alignment is a bitfield.
3785
            Thus a offsets to bitfield are represented differently to offsets
3786
            to other alignments:</para>
3787
 
3788
          <para>A value with SHAPE OFFSET(A, B) where REPR(B) xb9 1 is
3789
            represented by a 32-bit byte-offset.</para>
3790
 
3791
          <para>A value with SHAPE OFFSET(A, B) where REPR(B) = 1 is
3792
            represented by a 32-bit bit-offset.</para>
3793
        </sect3>
3794
 
3795
        <sect3 id="size-model">
3796
          <title>Size model</title>
3797
 
3798
          <para>In principle, the representation of a SHAPE is a pair of an
3799
            ALIGNMENT and a size, given by shape_offset applied to the SHAPE.
3800
            This pair is constant which can be evaluated at translate time.
3801
            The construction, shape_offset(S), has SHAPE OFFSET(alignment(s),
3802
            { }) and hence is represented by a bit-offset:
3803
 
3804
            <programlisting>
3805
REPR[shape_offset(top())] = 0
3806
REPR[shape_offset(integer(char_variety))] = 8
3807
REPR[shape_offset(integer(short_variety))] = 16
3808
.... etc. for other numeric varieties
3809
REPR[shape_offset(pointer(A))]= 32
3810
REPR[shape_offset(compound(E))] = REPR[E]
3811
REPR[shape_offset(bitfield(bfvar_bits(b, N)))] = N
3812
REPR[shape_offset(nof(N, S))] = N * REPR[offset_pad(alignment(S), shape_offset(S))]
3813
        where S is not a bitfield shape
3814
            </programlisting>
3815
 
3816
            Similar considerations apply to the other offset-arithmetic
3817
            constructors. In general, we have:
3818
 
3819
            <programlisting>
3820
REPR[offset_zero(A)] = 0             for all A
3821
 
3822
REPR[offset_pad(A, X:OFFSET(C, D)) =
3823
    ((REPR[X] + REPR[A] - 1) / (REPR[A])) * REPR[A] / 8
3824
        if REPR[A] xb9 1 xd9 REPR[D] =1
3825
            </programlisting>
3826
 
3827
            Otherwise:
3828
 
3829
            <programlisting>
3830
REPR[offset_pad(A, X:OFFSET(C, D)) =
3831
    ((REPR[X] + REPR[A] - 1) / (REPR[A])) * REPR[A]
3832
 
3833
REPR[offset_add(X:OFFSET(A, B), Y:OFFSET(C, D))] = REPR[X] * 8 + REPR[Y]
3834
        if REPR[B] xb9 1 xd9 REPR[D] = 1
3835
            </programlisting>
3836
 
3837
            Otherwise:
3838
 
3839
            <programlisting>
3840
REPR[offset_add(X, Y)] = REPR[X] + REPR[Y]
3841
 
3842
REPR[offset_max(X:OFFSET(A, B), Y:OFFSET(C, D))] = Max(REPR[X], 8 * REPR[Y]
3843
        if REPR[B] = 1 xd9 REPR[D] xb9 1
3844
REPR[offset_max(X:OFFSET(A, B), Y:OFFSET(C, D))] = Max(8 * REPR[X], REPR[Y]
3845
        if REPR[D] = 1 xd9 REPR[B] xb9 1
3846
            </programlisting>
3847
 
3848
            Otherwise:
3849
 
3850
            <programlisting>
3851
REPR[offset_max(X, Y)] = Max(REPR[X], REPR[Y])
3852
 
3853
REPR[offset_mult(X, E)] = REPR[X] * REPR[E]
3854
            </programlisting>
3855
 
3856
            IA translator working to this model maps ALIGNMENTs into the
3857
            integers and their inclusion constraints into numerical
3858
            comparisons.  As a result, it will correctly allow many OFFSETs
3859
            which are disallowed in general; for example, OFFSET({pointer},
3860
            {char_variety}) is allowed since REPR[{pointer}] xb3
3861
            REPR[{char_variety}]. Rather fewer of these extra relationships
3862
            are allowed in the next model considered.</para>
3863
        </sect3>
3864
      </sect2>
3865
 
3866
      <sect2>
3867
        <title>Model for machines like the iAPX-432</title>
3868
 
3869
        <para>The iAPX-432 does not have a linear model of store. The address
3870
          of a word in store is a pair consisting of a block-address and a
3871
          displacement within that block. In order to take full advantage of
3872
          the protection facilities of the machine, block-addresses are
3873
          strictly segregated from scalar data like integers, floats,
3874
          displacements etc. There are at least two different kind of blocks,
3875
          one which can only contain block-addresses and the other which
3876
          contains only scalar data. There are clearly difficulties here in
3877
          describing data-structures which contain both pointers and scalar
3878
          data.</para>
3879
 
3880
        <para>Let us assume that the machine has bit-addressing to avoid the
3881
          bit complications already covered in the first model. Also assume
3882
          that instruction blocks are just scalar blocks and that block
3883
          addresses are aligned on 32-bit boundaries.</para>
3884
 
3885
        <sect3 id="S95">
3886
          <title>Alignment model</title>
3887
 
3888
          <para>An ALIGNMENT is represented by a pair consisting of an
3889
            integer, giving the natural alignments for scalar data, and
3890
            boolean to indicate the presence of a block-address. Denote this
3891
            by:
3892
 
3893
            <programlisting>
3894
(s:alignment_of_scalars, b:has_blocks)
3895
            </programlisting>
3896
 
3897
            We then have:
3898
 
3899
            <programlisting>
3900
REPR[alignment({ })] = (s:1, b:FALSE)
3901
REPR[alignment({char_variety}) = (s:8, b:FALSE)
3902
... etc. for other numerical and bitfield varieties.
3903
REPR[alignment({pointer})] = (s:32, b:TRUE)
3904
REPR[alignment({proc})] = (s:32, b:TRUE)
3905
REPR[alignment({local_label_value})] = (s:32, b:TRUE)
3906
            </programlisting>
3907
 
3908
            The representation of a compound ALIGNMENT is given by:
3909
 
3910
            <programlisting>
3911
REPR[A xc8 B] = (s:Max(REPR[A].s, REPR[B].s), b:REPR[A].b xda REPR[B].b)
3912
            </programlisting>
3913
 
3914
            and their inclusion relationship is given by:
3915
 
3916
            <programlisting>
3917
REPR[A ... B] = (REPR[A].s xb3 REPR[B].s) xd9 (REPR[A].b xda REPR[B].b)
3918
            </programlisting>
3919
          </para>
3920
        </sect3>
3921
 
3922
        <sect3 id="S96">
3923
          <title>Offset and pointer model</title>
3924
 
3925
          <para>A value with SHAPE POINTER A where REPR[A].b is represented
3926
            by a pair consisting of a block-address of a scalar block and an
3927
            integer bit-displacement within that block.  Denote this by:
3928
 
3929
            <programlisting>
3930
(sb:scalar_block_address, sd:bit_displacement)
3931
            </programlisting>
3932
 
3933
            A value with SHAPE POINTER A where REPR[A].b is represented by a
3934
            quad word consisting of two block-addresses and two
3935
            bit-displacements within these blocks. One of these block
3936
            addresses will contain the scalar information pointed at by one of
3937
            the bit-displacements; similarly, the other pair will point at the
3938
            block addresses in the data are held. Denote this by:
3939
 
3940
            <programlisting>
3941
(sb:scalar_block_address, ab:address_block_address,
3942
 sd:scalar_displacement, ad:address_displacement)
3943
            </programlisting>
3944
          </para>
3945
 
3946
          <para>A value with SHAPE OFFSET(A, B) where REPR[A].b is represented
3947
            by an integer bit-displacement.</para>
3948
 
3949
          <para>A value with SHAPE OFFSET(A, B) where REPR[A].b is represented
3950
            by a pair of bit-displacements, one relative to a scalar-block and
3951
            the other to an address-block. Denote this by:
3952
 
3953
            <programlisting>
3954
(sd:scalar_displacement, ad:address_displacement)
3955
            </programlisting>
3956
          </para>
3957
        </sect3>
3958
 
3959
        <sect3 id="S97">
3960
          <title>Size model</title>
3961
 
3962
          <para>The sizes given by shape_offset are now:
3963
 
3964
            <programlisting>
3965
REPR[shape_offset(integer(char_variety))] = 8
3966
... etc. for other numerical and bitfield varieties.
3967
REPR[shape_offset(pointer(A))] = (REPR[A].b) ? (sd:64, ad:64) : (sd:32, ad:32)
3968
REPR[shape_offset(offset(A, B))] = (REPR[A].b) ? 64 : 32)
3969
REPR[shape_offset(proc)] = (sd:32, ad:32)
3970
REPR[shape_offset(compound(E))] = REPR[E]
3971
REPR[shape_offset(nof(N, S))] = N * REPR[offset_pad(alignment(S)), shape_offset(S))]
3972
REPR[shape_offset(top)] = 0
3973
            </programlisting>
3974
          </para>
3975
        </sect3>
3976
 
3977
        <sect3 id="S98">
3978
          <title>Offset arithmetic</title>
3979
 
3980
          <para>The other OFFSET constructors are given by:</para>
3981
 
3982
          <programlisting>
3983
REPR[offset_zero(A)] = 0              if REPR[A].b
3984
REPR[offset_zero(A)] = (sd: 0, ad: 0) if REPR[A].b
3985
 
3986
REPR[offset_add(X:OFFSET(A, B), Y:OFFSET(C, D))] = REPR[X] + REPR[Y]
3987
        if REPR[A].b xd9 REPR[C].b
3988
REPR[offset_add(X:OFFSET(A, B), Y:OFFSET(C, D))] =
3989
    (sd:REPR[X].sd + REPR[Y].sd, ad: REPR[X].ad + REPR[Y].ad)
3990
        if REPR[A].b xd9 REPR[C].b
3991
REPR[offset_add(X:OFFSET(A, B), Y:OFFSET(C, D))] =
3992
    (sd:REPR[X].sd + REPR[Y], ad:REPR[X].ad)
3993
        if REPR[A].b xd9 REPR[C].b
3994
 
3995
REPR[offset_pad(A, Y:OFFSET(C, D))] = (REPR[Y] + REPR[A].s - 1) / REPR[A].s
3996
        if REPR[A].b xd9 REPR[C].b
3997
REPR[offset_pad(A, Y:OFFSET(C, D))] =
3998
    (sd:(REPR[Y] + REPR[A].s - 1) / REPR[A].s, ad:REPR[Y].ad)
3999
        if REPR[C].b
4000
REPR[offset_pad(A, Y: OFFSET(C, D))] =
4001
    (sd:(REPR[Y] + REPR[A].s - 1) / REPR[A].s, ad:0)
4002
        if REPR[A].b xd9 REPR[C].b
4003
REPR[offset_max(X:OFFSET(A, B), Y:OFFSET(C, D))] = Max(REPR[X], REPR[Y])
4004
        if REPR[A].b xd9 REPR[C].b
4005
REPR[offset_max(X:OFFSET(A, B), Y:OFFSET(C, D))] =
4006
    (sd:Max(REPR[X].sd, REPR[Y].sd), ad:Max(REPR[X].a, REPR[Y].ad))
4007
        if REPR[A].b xd9 REPR[C].b
4008
REPR[offset_max(X:OFFSET(A, B), Y:OFFSET(C, D))] =
4009
    (sd:Max(REPR[X].sd, REPR[Y]), ad:REPR[X].ad)
4010
        if REPR[A].b xd9 REPR[C].b
4011
REPR[offset_max(X:OFFSET(A, B), Y:OFFSET(C, D))] =
4012
    (sd:Max(REPR[Y].sd, REPR[X]), ad: REPR[Y].ad)
4013
        if REPR[C].b xd9 REPR[A].b
4014
 
4015
REPR[offset_subtract(X:OFFSET(A, B), Y:OFFSET(C, D))] = REPR[X] - REPR[Y]
4016
        if REPR[A].b xd9 REPR[C].b
4017
REPR[offset_subtract(X:OFFSET(A,B), Y:OFFSET(C, D))] =
4018
    (sd:REPR[X].sd - REPR[Y].sd, ad:REPR[X].ad - REPR[Y].ad)
4019
        if REPR[A].b xd9 REPR[C].b
4020
REPR[offset_add(X:OFFSET(A, B), Y:OFFSET(C, D))] = REPR[X].sd - REPR[Y]
4021
        if REPR[A].b xd9 REPR[C].b
4022
.... and so on.
4023
          </programlisting>
4024
 
4025
          <para>Unlike the previous one, this model of ALIGNMENTs would reject
4026
            OFFSETs such as OFFSET({long_variety}, {pointer}) but not
4027
            OFFSET({pointer}, {long_variety}) since:</para>
4028
 
4029
          <programlisting>
4030
REPR[{long_variety} ... {pointer}] = FALSE
4031
          </programlisting>
4032
 
4033
          <para>but:</para>
4034
 
4035
          <programlisting>
4036
REPR[{pointer} ... {long_variety}] = TRUE
4037
          </programlisting>
4038
 
4039
          <para>This just reflects the fact that there is no way that one can
4040
            extract a block-address necessary for a pointer from a
4041
            scalar-block, but since the representation of a pointer includes a
4042
            scalar displacement, one can always retrieve a scalar from a
4043
            pointer to a pointer.</para>
4044
        </sect3>
4045
      </sect2>
4046
    </sect1>
4047
 
4048
    <sect1>
4049
      <title>Conclusion</title>
4050
 
4051
      <para>This commentary is not complete. I have tended to go into
4052
        considerable detail into aspects which I consider might be unfamiliar
4053
        and skip over others which occur in most compiling systems. I also
4054
        have a tendency to say things more than once, albeit in different
4055
        words; however if something is worth saying, it is worth saying
4056
        twice.</para>
4057
 
4058
      <para>I shall continue tracking further revisions of the TDF
4059
        specification in later releases or appendices to this document.</para>
4060
    </sect1>
4061
  </chapter>
4062
</book>