Subversion Repositories tendra.SVN

Rev

Rev 2 | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
2 7u83 1
<!-- Crown Copyright (c) 1998 -->
2
<HTML>
3
<HEAD>
4
<TITLE>The bit encoding of TDF</TITLE>
5
</HEAD>
6
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#400080" ALINK="#FF0000">
7
<H1><A NAME=S417>TDF Specification, Issue 4.0</A></H1>
8
<H3>January 1998</H3>
9
<A HREF="spec12.html"><IMG SRC="../images/next.gif" ALT="next section"></A>
10
<A HREF="spec10.html"><IMG SRC="../images/prev.gif" ALT="previous section"></A>
11
<A HREF="spec1.html"><IMG SRC="../images/top.gif" ALT="current document"></A>
12
<A HREF="../index.html"><IMG SRC="../images/home.gif" ALT="TenDRA home page">
13
</A>
14
<A HREF="spec12.html"><IMG SRC="../images/index.gif" ALT="document index"></A>
15
<P>
16
<HR>
17
<DL>
18
<DT><A HREF="#S418"><B>8.1</B> - The Basic Encoding</A><DD>
19
<DT><A HREF="#S419"><B>8.2</B> - Fundamental encodings</A><DD>
20
<DL>
21
<DT><A HREF="#S420"><B>8.2.1</B> - TDFINT</A><DD>
22
<DT><A HREF="#S421"><B>8.2.2</B> - TDFBOOL</A><DD>
23
<DT><A HREF="#S422"><B>8.2.3</B> - TDFSTRING</A><DD>
24
<DT><A HREF="#S423"><B>8.2.4</B> - TDFIDENT</A><DD>
25
</DL>
26
<DT><A HREF="#S424"><B>8.3</B> -  BITSTREAM</A><DD>
27
<DL>
28
<DT><A HREF="#S425"><B>8.3.1</B> - BYTESTREAM</A><DD>
29
<DT><A HREF="#S426"><B>8.3.2</B> - BYTE_ALIGN</A><DD>
30
<DT><A HREF="#S427"><B>8.3.3</B> - Extendable integer encoding</A><DD>
31
</DL>
32
<DT><A HREF="#S428"><B>8.4</B> - The TDF encoding</A><DD>
33
<DT><A HREF="#S429"><B>8.5</B> - File Formats</A><DD>
34
</DL>
35
<HR>
36
<H1>8. <A NAME=0>The bit encoding of TDF</A></H1>
37
This is a description of the encoding used for TDF. 
38
<P>
39
<A HREF="#1">Section 8.1</A> defines the basic level of encoding,
40
in which integers consisting of a specified number of bits are appended
41
to the sequence of bytes. <A HREF="#4">Section 8.2</A> defines the
42
second level of encoding, in which fundamental kinds of value are
43
encoded in terms of integers of specified numbers of bits. <A HREF="#17">Section
44
8.4</A> defines the third level, in which TDF is encoded using the
45
previously defined concepts. 
46
<P>
47
<A NAME=S418>
48
 
49
<HR>
50
<H2>8.1. <A NAME=1>The Basic Encoding</A></H2>
51
<A NAME=M2><A NAME=M3>TDF consists of a sequence of 8-bit bytes used
52
to encode integers of a varying number of bits, from 1 to 32. These
53
integers will be called basic integers. 
54
<P>
55
TDF is encoded into bytes in increasing byte index, and within the
56
byte the most significant end is filled before the least significant.
57
Let the bits within a byte be numbered from 0 to 7, 0 denoting the
58
least significant bit and 7 the most significant. Suppose that the
59
bytes up to <I>n</I>-1 have been filled and that the next free bit
60
in byte <I>n</I> is bit <I>k</I>. Then bits <I>k</I>+1 to 7 are full
61
and bits 0 to <I>k</I> remain to be used. Now an integer of <I>d</I>
62
bits is to be appended. 
63
<P>
64
If <I>d</I> is less than or equal to <I>k</I>, the <I>d</I> bits will
65
occupy bits <I>k</I>-<I>d</I>+1 to <I>k</I> of byte <I>n</I>, and
66
the next free bit will be at bit <I>k-d</I>. Bit 0 of the integer
67
will be at bit <I>k</I>-<I>d</I>+1 of the byte, and bit <I>d</I>-1
68
of the integer will be at bit <I>k</I>. 
69
<P>
70
If <I>d</I> is equal to <I>k</I>+1, the <I>d</I> bits will occupy
71
bits 0 to <I>k</I> of byte <I>n</I> and the next free bit will be
72
bit 7 of byte <I>n</I>+1. Bit <I>d</I>-1 of the integer will be at
73
bit <I>k</I> of the byte. 
74
<P>
75
If <I>d</I> is greater than <I>k</I>+1, the most significant <I>k</I>+1
76
bits of the integer will be in byte <I>n</I>, with bit <I>d</I>-1
77
at bit <I>k</I> of the byte. The remaining <I>d</I>-<I>k</I>-1 least
78
significant bits are then encoded into the bytes, starting at byte
79
<I>n</I>+1, bit 7, using the same algorithm (i.e. recursively). 
80
<P>
81
<A NAME=S419>
82
 
83
<HR>
84
<H2>8.2. <A NAME=4>Fundamental encodings</A></H2>
85
This section describes the encoding of <CODE>TDFINT</CODE>, 
86
<CODE>TDFBOOL</CODE>, <CODE>TDFSTRING</CODE>, <CODE>TDFIDENT</CODE>,
87
<CODE>BITSTREAM</CODE>, <CODE>BYTESTREAM</CODE>, <CODE>BYTE_ALIGN</CODE>
88
and extendable integers. 
89
<P>
90
<A NAME=S420>
91
<H3>8.2.1. TDFINT</H3>
92
<A NAME=M5><CODE>TDFINT</CODE> encodes non-negative integers of unbounded
93
size. The encoding uses octal digits encoded in 4-bit basic integers.
94
The most significant octal digit is encoded first, the least significant
95
last. For all digits except the last the 4-bit integer is the value
96
of the octal digit. For the last digit the 4-bit integer is the value
97
of the octal digit plus 8. 
98
<P>
99
<A NAME=S421>
100
<H3>8.2.2. TDFBOOL</H3>
101
<A NAME=M6><CODE>TDFBOOL</CODE> encodes a boolean, true or false.
102
The encoding uses a 1-bit basic integer, with 1 encoding true and
103
 
104
<P>
105
<A NAME=S422>
106
<H3>8.2.3. TDFSTRING</H3>
107
<CODE>TDFSTRING</CODE> encodes a sequence containing <I>n</I> non-negative
108
integers, each of <I>k</I> bits. The encoding consists of, first a
109
<CODE>TDFINT</CODE> giving the number of bits, second a <CODE>TDFINT</CODE>
110
giving the number of integers, which may be zero. Thirdly it contains
111
<I>n</I> <I>k</I>-bit basic integers, giving the sequence of integers
112
required, the first integer being first in this sequence. 
113
<P>
114
<A NAME=S423>
115
<H3>8.2.4. TDFIDENT</H3>
116
<A NAME=M7><CODE>TDFIDENT</CODE> also encodes a sequence containing
117
<I>n</I> non-negative integers. These integers will all consist of
118
the same number of bits, which will be a multiple of 8. It is a property
119
of the encoding of the other constructions that TDFIDENTS will start
120
on either bit 7 or bit 3 of a byte and end on bit 7 or bit 3 of a
121
byte. It thus has some alignment properties which are useful to permit
122
fast copying of sections of TDF. 
123
<P>
124
The encoding consists of, first a <CODE>TDFINT</CODE> giving the number
125
of bits, second a <CODE>TDFINT</CODE> giving the number of integers,
126
which may be zero. If the next free bit is not bit 7 of some byte,
127
it is moved on to bit 7 of the next byte. 
128
<P>
129
Thirdly it contains <I>n</I> <I>k</I>-bit integers. 
130
<P>
131
If the next free bit is not bit 7 of some byte, it is moved on to
132
bit 7 of the next byte. 
133
<P>
134
<A NAME=S424>
135
 
136
<HR>
137
<H2>8.3. <A NAME=8>BITSTREAM</A></H2>
138
<A NAME=M9>It can be useful to be able to skip a TDF construction
139
without reading through it. <CODE>BITSTREAM</CODE> provides a means
140
of doing this. 
141
<P>
142
A <CODE>BITSTREAM</CODE> encoding of <I>X</I> consists of a <CODE>TDFINT</CODE>
143
giving the number of bits of encoding which are occupied by the <I>X</I>.
144
Hence to skip over a <CODE>BITSTREAM</CODE> while decoding, one should
145
read the <CODE>TDFINT</CODE> and then advance the bit index by that
146
number of bits. To read the contents of a <CODE>BITSTREAM</CODE> encoding
147
of <I>X</I>, one should read and ignore a <CODE>TDFINT</CODE> and
148
then decode an <I>X</I>. There will be no spare bits at the end of
149
the <I>X</I>, so reading can continue directly. 
150
<P>
151
<A NAME=S425>
152
<H3>8.3.1. BYTESTREAM</H3>
153
<A NAME=M10>It can be useful to be able to skip a TDF construction
154
without reading through it. <CODE>BYTESTREAM</CODE> provides a means
155
of doing this while remaining byte aligned, so facilitating copying
156
the TDF. A <CODE>BYTESTREAM</CODE> will always start when the bit
157
position is 3 or 7. 
158
<P>
159
A <CODE>BYTESTREAM</CODE> encoding of <I>X</I> starts with a <CODE>TDFINT</CODE>
160
giving a number, <I>n</I>. After this, if the current bit position
161
is not bit 7 of some byte, it is moved to bit 7 of the next byte.
162
The next <I>n</I> bytes are an encoding of <I>X</I>. There may be
163
some spare bits left over at the end of <I>X</I>. 
164
<P>
165
Hence to skip over a <CODE>BYTESTREAM</CODE> while decoding one should
166
read a <CODE>TDFINT</CODE>, <I>n</I>, move to the next byte alignment
167
(if the bit position is not 7) and advance the bit index over <I>n</I>
168
bytes. To read a <CODE>BYTESTREAM</CODE> encoding of <I>X</I> one
169
should read a <CODE>TDFINT</CODE>, <I>n</I>, and move to the next
170
byte, <I>b</I> (if the bit position is not 7), and then decode an
171
<I>X</I>. Finally the bit position should be moved to <I>n</I> bytes
172
after <I>b</I>. 
173
<P>
174
<A NAME=S426>
175
<H3>8.3.2. BYTE_ALIGN</H3>
176
<A NAME=M11><A NAME=M12><CODE>BYTE_ALIGN</CODE> leaves the bit position
177
alone if it is 7, and otherwise moves to bit 7 of the next byte. 
178
<P>
179
<A NAME=S427>
180
<H3>8.3.3. <A NAME=13>Extendable integer encoding</A></H3>
181
<A NAME=M14><A NAME=M15><A NAME=M16>A <I>d</I>-bit extendable integer
182
encoding enables an integer greater than zero to be encoded given
183
<I>d</I>, a number of bits. 
184
<P>
185
If the integer is between 1 and 2<SUP><I>d</I></SUP> - 1 inclusive,
186
a 
187
<I>d</I>-bit basic integer is encoded. 
188
<P>
189
If the integer, <I>i</I>, is greater than or equal to 2<SUP><I>d</I></SUP>,
190
a <I>d</I>-bit basic integer encoding of zero is inserted and then
191
<I>i</I> - 2<SUP><I>d</I></SUP> + 1 is encoded as a <I>d</I>-bit extendable
192
encodin, and so on, recursively. 
193
<P>
194
<A NAME=S428>
195
 
196
<HR>
197
<H2>8.4. <A NAME=17>The TDF encoding</A></H2>
198
The descriptions of SORTS and constructors contain encoding information
199
which is interpreted as follows to define the TDF encoding. 
200
<P>
201
<UL>
202
<LI>A TDF CAPSULE is an encoding of the <CODE>SORT CAPSULE</CODE>.<P>
203
<LI>For each <CODE>SORT</CODE> a number of encoding bits, <I>b,</I>
204
is specified. If this is zero, there will only be one construction
205
for the class, and its encoding will consist of the encodings of its
206
components, in the given order.<P>
207
<LI><A NAME=M18>If the number of encoding bits, <I>b</I>, is not zero
208
the <CODE>SORT</CODE> is described as extendable or as not extendable.
209
For each construction there is an encoding number given. If the <CODE>SORT
210
</CODE> is extendable, this number is output as an extendable integer.
211
If the <CODE>SORT</CODE> is described as not extendable, the number
212
is output as a basic integer. This is followed by the encodings of
213
the components of the construction in the order given in the description
214
of the construct.<P>
215
<LI><A NAME=M21>For the classes which are named <CODE>SLIST</CODE>(<I>x</I>)
216
- e.g. <CODE>SLIST</CODE>(<CODE>UNIT</CODE>) - the encoding consists
217
of a <CODE>TDFINT</CODE>, <I>n</I>, followed by <I>n</I> encodings
218
of <I>x</I>.<P>
219
<LI><A NAME=M22>For the classes which are named <CODE>LIST</CODE>(<I>x</I>)
220
- e.g. <CODE>LIST</CODE>(<CODE>EXP</CODE>) - the encoding consists
221
of a 1-bit integer which will be 0, followed by an <CODE>SLIST</CODE>(<I>x</I>).
222
The 1-bit integer is to allow for extensions to other representations
223
of 
224
<CODE>LIST</CODE>s.<P>
225
<LI><A NAME=M23><A NAME=M24>For the classes which are named <CODE>OPTION</CODE>(
226
<I>x</I>) the encoding consists of a 1-bit basic integer. If this
227
is zero, the option is absent and there is no more encoding. If the
228
integer is 1, the option is present and an encoding of <I>x</I> follows.<P>
229
<LI><CODE>BITSTREAMS</CODE> occur in only two kinds of place. One
230
is the constructions with the form <I>x_cond</I>, which are the install-time
231
conditionals. For each of these the class encoded in the <CODE>BITSTREAM</CODE>
232
is the same as the class which is the result of the <I>x_cond</I>
233
construction. The other kind of place is as the <I>token_args</I>
234
component of a construction with the form <I>x_apply_token</I>. This
235
component always gives the parameters of the <CODE>TOKEN</CODE>. It
236
can only be decoded if there is a token definition or a token declaration
237
for the particular token being applied, i.e. for the <I>token_value</I>
238
component of the construction. In this case the <CODE>SORTS</CODE>
239
and hence the classes of the actual token arguments are given by the
240
declaration or definition, and encodings of these classes are placed
241
in sequence after the number of bits. If the declaration or definition
242
are not available, the <CODE>BITSTREAM</CODE> can only be skipped.<P>
243
<LI><CODE>BYTESTREAM</CODE> <I>X</I> occurs in only one place, the
244
encoding of the <CODE>SORT UNIT</CODE>. The <CODE>SORT</CODE>
245
<I>X</I> is determined by the <CODE>UNIT</CODE> identification which
246
is given for each of the relevant <CODE>SORTS</CODE>.<P>
247
<LI><A NAME=M25>The <I>tld</I> <CODE>UNIT</CODE> is encoded specially.
248
It is always the first <CODE>UNIT</CODE> in a Capsule and is used
249
to pass information to the TDF linker. The first entry in a <I>tld</I>
250
<CODE>UNIT</CODE> is a <CODE>TDFINT</CODE> giving the format of the
251
remainder of the <CODE>UNIT</CODE>. Currently, the linker supports
252
formats 0 and 1, but others may be added to give greater functionality
253
while retaining compatibility. With format 0, the remainder of <CODE>UNIT</CODE>
254
is identical to a now obsolete 
255
<I>tld2</I> <CODE>UNIT</CODE>. With format 1, the remainder of the
256
<CODE>UNIT</CODE> is as follows: If <I>n</I> is the number of <CODE>EXTERN_LINK
257
</CODE>s in the <I>external_linkage</I>
258
argument of <I>make_capsule</I>, the remainder consists of <I>n</I>
259
sequences. These sequences are in the order given by <I>external_linkage</I>.
260
Each element of a sequence consist of one <CODE>TDFINT</CODE> per
261
linkable entity in the corresponding <I>el</I> of the <I>make_extern_link</I>
262
in the same order. These integers will describe properties of the
263
correponding external links. (In format 0, there are only two sequences,
264
the first describing the token external links and the second describing
265
the tag external links.) 
266
<P>
267
<UL>
268
<LI>Bit 0: 1 means &quot;used in this capsule&quot;, 0 means &quot;not
269
used in this capsule&quot;. 
270
<LI>Bit 1: 1 means &quot;declared in this capsule&quot;, 0 means &quot;not
271
declared in this capsule&quot;. 
272
<LI>Bit 2: 0 means &quot;not defined in this capsule, 1 means &quot;defined
273
in this capsule&quot;. 
274
<LI>Bit 3: 0 means &quot;is uniquely defined&quot;, 1 means &quot;
275
may be multiply defined&quot;.  
276
</UL>
277
</UL>
278
<P>
279
<A NAME=S429>
280
 
281
<HR>
282
<H2>8.5. File Formats</H2>
283
There may be various kinds of files which contain TDF bitstream information.
284
Each will start with a 4-byte &quot;magic-number&quot; identifying
285
the kind of file followed by two <CODE>TDFINT</CODE>s giving the major
286
and minor version numbers of the TDF involved. 
287
<P>
288
A CAPSULE file will have a magic-number <I>&quot;TDFC&quot;</I>. The
289
encoding of the CAPSULE will be byte-aligned following the version
290
numbers. 
291
<P>
292
A TDF library file will have a magic-number <I>&quot;TDFL&quot;</I>.
293
These files are constructed by the TDF linker. 
294
<P>
295
A TDF archive file will have a magic-number <I>&quot;TDFA&quot;</I>.
296
<P>
297
Other file formats introduced should follow a similar pattern. 
298
<P>
299
<I>The TDF linker will refuse to link TDF files with different major
300
version numbers. The resulting minor version number is the maximum
301
of component minor version numbers.</I>
302
<P>
303
<P>
304
<HR>
305
<P><I>Part of the <A HREF="../index.html">TenDRA Web</A>.<BR>Crown
306
Copyright &copy; 1998.</I></P>
307
</BODY>
308
</HTML>