2 |
7u83 |
1 |
<!-- Crown Copyright (c) 1998 -->
|
|
|
2 |
<HTML>
|
|
|
3 |
<HEAD>
|
|
|
4 |
<TITLE>The bit encoding of TDF</TITLE>
|
|
|
5 |
</HEAD>
|
|
|
6 |
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#400080" ALINK="#FF0000">
|
|
|
7 |
<H1><A NAME=S417>TDF Specification, Issue 4.0</A></H1>
|
|
|
8 |
<H3>January 1998</H3>
|
|
|
9 |
<A HREF="spec12.html"><IMG SRC="../images/next.gif" ALT="next section"></A>
|
|
|
10 |
<A HREF="spec10.html"><IMG SRC="../images/prev.gif" ALT="previous section"></A>
|
|
|
11 |
<A HREF="spec1.html"><IMG SRC="../images/top.gif" ALT="current document"></A>
|
|
|
12 |
<A HREF="../index.html"><IMG SRC="../images/home.gif" ALT="TenDRA home page">
|
|
|
13 |
</A>
|
|
|
14 |
<A HREF="spec12.html"><IMG SRC="../images/index.gif" ALT="document index"></A>
|
|
|
15 |
<P>
|
|
|
16 |
<HR>
|
|
|
17 |
<DL>
|
|
|
18 |
<DT><A HREF="#S418"><B>8.1</B> - The Basic Encoding</A><DD>
|
|
|
19 |
<DT><A HREF="#S419"><B>8.2</B> - Fundamental encodings</A><DD>
|
|
|
20 |
<DL>
|
|
|
21 |
<DT><A HREF="#S420"><B>8.2.1</B> - TDFINT</A><DD>
|
|
|
22 |
<DT><A HREF="#S421"><B>8.2.2</B> - TDFBOOL</A><DD>
|
|
|
23 |
<DT><A HREF="#S422"><B>8.2.3</B> - TDFSTRING</A><DD>
|
|
|
24 |
<DT><A HREF="#S423"><B>8.2.4</B> - TDFIDENT</A><DD>
|
|
|
25 |
</DL>
|
|
|
26 |
<DT><A HREF="#S424"><B>8.3</B> - BITSTREAM</A><DD>
|
|
|
27 |
<DL>
|
|
|
28 |
<DT><A HREF="#S425"><B>8.3.1</B> - BYTESTREAM</A><DD>
|
|
|
29 |
<DT><A HREF="#S426"><B>8.3.2</B> - BYTE_ALIGN</A><DD>
|
|
|
30 |
<DT><A HREF="#S427"><B>8.3.3</B> - Extendable integer encoding</A><DD>
|
|
|
31 |
</DL>
|
|
|
32 |
<DT><A HREF="#S428"><B>8.4</B> - The TDF encoding</A><DD>
|
|
|
33 |
<DT><A HREF="#S429"><B>8.5</B> - File Formats</A><DD>
|
|
|
34 |
</DL>
|
|
|
35 |
<HR>
|
|
|
36 |
<H1>8. <A NAME=0>The bit encoding of TDF</A></H1>
|
|
|
37 |
This is a description of the encoding used for TDF.
|
|
|
38 |
<P>
|
|
|
39 |
<A HREF="#1">Section 8.1</A> defines the basic level of encoding,
|
|
|
40 |
in which integers consisting of a specified number of bits are appended
|
|
|
41 |
to the sequence of bytes. <A HREF="#4">Section 8.2</A> defines the
|
|
|
42 |
second level of encoding, in which fundamental kinds of value are
|
|
|
43 |
encoded in terms of integers of specified numbers of bits. <A HREF="#17">Section
|
|
|
44 |
8.4</A> defines the third level, in which TDF is encoded using the
|
|
|
45 |
previously defined concepts.
|
|
|
46 |
<P>
|
|
|
47 |
<A NAME=S418>
|
|
|
48 |
|
|
|
49 |
<HR>
|
|
|
50 |
<H2>8.1. <A NAME=1>The Basic Encoding</A></H2>
|
|
|
51 |
<A NAME=M2><A NAME=M3>TDF consists of a sequence of 8-bit bytes used
|
|
|
52 |
to encode integers of a varying number of bits, from 1 to 32. These
|
|
|
53 |
integers will be called basic integers.
|
|
|
54 |
<P>
|
|
|
55 |
TDF is encoded into bytes in increasing byte index, and within the
|
|
|
56 |
byte the most significant end is filled before the least significant.
|
|
|
57 |
Let the bits within a byte be numbered from 0 to 7, 0 denoting the
|
|
|
58 |
least significant bit and 7 the most significant. Suppose that the
|
|
|
59 |
bytes up to <I>n</I>-1 have been filled and that the next free bit
|
|
|
60 |
in byte <I>n</I> is bit <I>k</I>. Then bits <I>k</I>+1 to 7 are full
|
|
|
61 |
and bits 0 to <I>k</I> remain to be used. Now an integer of <I>d</I>
|
|
|
62 |
bits is to be appended.
|
|
|
63 |
<P>
|
|
|
64 |
If <I>d</I> is less than or equal to <I>k</I>, the <I>d</I> bits will
|
|
|
65 |
occupy bits <I>k</I>-<I>d</I>+1 to <I>k</I> of byte <I>n</I>, and
|
|
|
66 |
the next free bit will be at bit <I>k-d</I>. Bit 0 of the integer
|
|
|
67 |
will be at bit <I>k</I>-<I>d</I>+1 of the byte, and bit <I>d</I>-1
|
|
|
68 |
of the integer will be at bit <I>k</I>.
|
|
|
69 |
<P>
|
|
|
70 |
If <I>d</I> is equal to <I>k</I>+1, the <I>d</I> bits will occupy
|
|
|
71 |
bits 0 to <I>k</I> of byte <I>n</I> and the next free bit will be
|
|
|
72 |
bit 7 of byte <I>n</I>+1. Bit <I>d</I>-1 of the integer will be at
|
|
|
73 |
bit <I>k</I> of the byte.
|
|
|
74 |
<P>
|
|
|
75 |
If <I>d</I> is greater than <I>k</I>+1, the most significant <I>k</I>+1
|
|
|
76 |
bits of the integer will be in byte <I>n</I>, with bit <I>d</I>-1
|
|
|
77 |
at bit <I>k</I> of the byte. The remaining <I>d</I>-<I>k</I>-1 least
|
|
|
78 |
significant bits are then encoded into the bytes, starting at byte
|
|
|
79 |
<I>n</I>+1, bit 7, using the same algorithm (i.e. recursively).
|
|
|
80 |
<P>
|
|
|
81 |
<A NAME=S419>
|
|
|
82 |
|
|
|
83 |
<HR>
|
|
|
84 |
<H2>8.2. <A NAME=4>Fundamental encodings</A></H2>
|
|
|
85 |
This section describes the encoding of <CODE>TDFINT</CODE>,
|
|
|
86 |
<CODE>TDFBOOL</CODE>, <CODE>TDFSTRING</CODE>, <CODE>TDFIDENT</CODE>,
|
|
|
87 |
<CODE>BITSTREAM</CODE>, <CODE>BYTESTREAM</CODE>, <CODE>BYTE_ALIGN</CODE>
|
|
|
88 |
and extendable integers.
|
|
|
89 |
<P>
|
|
|
90 |
<A NAME=S420>
|
|
|
91 |
<H3>8.2.1. TDFINT</H3>
|
|
|
92 |
<A NAME=M5><CODE>TDFINT</CODE> encodes non-negative integers of unbounded
|
|
|
93 |
size. The encoding uses octal digits encoded in 4-bit basic integers.
|
|
|
94 |
The most significant octal digit is encoded first, the least significant
|
|
|
95 |
last. For all digits except the last the 4-bit integer is the value
|
|
|
96 |
of the octal digit. For the last digit the 4-bit integer is the value
|
|
|
97 |
of the octal digit plus 8.
|
|
|
98 |
<P>
|
|
|
99 |
<A NAME=S421>
|
|
|
100 |
<H3>8.2.2. TDFBOOL</H3>
|
|
|
101 |
<A NAME=M6><CODE>TDFBOOL</CODE> encodes a boolean, true or false.
|
|
|
102 |
The encoding uses a 1-bit basic integer, with 1 encoding true and
|
|
|
103 |
|
|
|
104 |
<P>
|
|
|
105 |
<A NAME=S422>
|
|
|
106 |
<H3>8.2.3. TDFSTRING</H3>
|
|
|
107 |
<CODE>TDFSTRING</CODE> encodes a sequence containing <I>n</I> non-negative
|
|
|
108 |
integers, each of <I>k</I> bits. The encoding consists of, first a
|
|
|
109 |
<CODE>TDFINT</CODE> giving the number of bits, second a <CODE>TDFINT</CODE>
|
|
|
110 |
giving the number of integers, which may be zero. Thirdly it contains
|
|
|
111 |
<I>n</I> <I>k</I>-bit basic integers, giving the sequence of integers
|
|
|
112 |
required, the first integer being first in this sequence.
|
|
|
113 |
<P>
|
|
|
114 |
<A NAME=S423>
|
|
|
115 |
<H3>8.2.4. TDFIDENT</H3>
|
|
|
116 |
<A NAME=M7><CODE>TDFIDENT</CODE> also encodes a sequence containing
|
|
|
117 |
<I>n</I> non-negative integers. These integers will all consist of
|
|
|
118 |
the same number of bits, which will be a multiple of 8. It is a property
|
|
|
119 |
of the encoding of the other constructions that TDFIDENTS will start
|
|
|
120 |
on either bit 7 or bit 3 of a byte and end on bit 7 or bit 3 of a
|
|
|
121 |
byte. It thus has some alignment properties which are useful to permit
|
|
|
122 |
fast copying of sections of TDF.
|
|
|
123 |
<P>
|
|
|
124 |
The encoding consists of, first a <CODE>TDFINT</CODE> giving the number
|
|
|
125 |
of bits, second a <CODE>TDFINT</CODE> giving the number of integers,
|
|
|
126 |
which may be zero. If the next free bit is not bit 7 of some byte,
|
|
|
127 |
it is moved on to bit 7 of the next byte.
|
|
|
128 |
<P>
|
|
|
129 |
Thirdly it contains <I>n</I> <I>k</I>-bit integers.
|
|
|
130 |
<P>
|
|
|
131 |
If the next free bit is not bit 7 of some byte, it is moved on to
|
|
|
132 |
bit 7 of the next byte.
|
|
|
133 |
<P>
|
|
|
134 |
<A NAME=S424>
|
|
|
135 |
|
|
|
136 |
<HR>
|
|
|
137 |
<H2>8.3. <A NAME=8>BITSTREAM</A></H2>
|
|
|
138 |
<A NAME=M9>It can be useful to be able to skip a TDF construction
|
|
|
139 |
without reading through it. <CODE>BITSTREAM</CODE> provides a means
|
|
|
140 |
of doing this.
|
|
|
141 |
<P>
|
|
|
142 |
A <CODE>BITSTREAM</CODE> encoding of <I>X</I> consists of a <CODE>TDFINT</CODE>
|
|
|
143 |
giving the number of bits of encoding which are occupied by the <I>X</I>.
|
|
|
144 |
Hence to skip over a <CODE>BITSTREAM</CODE> while decoding, one should
|
|
|
145 |
read the <CODE>TDFINT</CODE> and then advance the bit index by that
|
|
|
146 |
number of bits. To read the contents of a <CODE>BITSTREAM</CODE> encoding
|
|
|
147 |
of <I>X</I>, one should read and ignore a <CODE>TDFINT</CODE> and
|
|
|
148 |
then decode an <I>X</I>. There will be no spare bits at the end of
|
|
|
149 |
the <I>X</I>, so reading can continue directly.
|
|
|
150 |
<P>
|
|
|
151 |
<A NAME=S425>
|
|
|
152 |
<H3>8.3.1. BYTESTREAM</H3>
|
|
|
153 |
<A NAME=M10>It can be useful to be able to skip a TDF construction
|
|
|
154 |
without reading through it. <CODE>BYTESTREAM</CODE> provides a means
|
|
|
155 |
of doing this while remaining byte aligned, so facilitating copying
|
|
|
156 |
the TDF. A <CODE>BYTESTREAM</CODE> will always start when the bit
|
|
|
157 |
position is 3 or 7.
|
|
|
158 |
<P>
|
|
|
159 |
A <CODE>BYTESTREAM</CODE> encoding of <I>X</I> starts with a <CODE>TDFINT</CODE>
|
|
|
160 |
giving a number, <I>n</I>. After this, if the current bit position
|
|
|
161 |
is not bit 7 of some byte, it is moved to bit 7 of the next byte.
|
|
|
162 |
The next <I>n</I> bytes are an encoding of <I>X</I>. There may be
|
|
|
163 |
some spare bits left over at the end of <I>X</I>.
|
|
|
164 |
<P>
|
|
|
165 |
Hence to skip over a <CODE>BYTESTREAM</CODE> while decoding one should
|
|
|
166 |
read a <CODE>TDFINT</CODE>, <I>n</I>, move to the next byte alignment
|
|
|
167 |
(if the bit position is not 7) and advance the bit index over <I>n</I>
|
|
|
168 |
bytes. To read a <CODE>BYTESTREAM</CODE> encoding of <I>X</I> one
|
|
|
169 |
should read a <CODE>TDFINT</CODE>, <I>n</I>, and move to the next
|
|
|
170 |
byte, <I>b</I> (if the bit position is not 7), and then decode an
|
|
|
171 |
<I>X</I>. Finally the bit position should be moved to <I>n</I> bytes
|
|
|
172 |
after <I>b</I>.
|
|
|
173 |
<P>
|
|
|
174 |
<A NAME=S426>
|
|
|
175 |
<H3>8.3.2. BYTE_ALIGN</H3>
|
|
|
176 |
<A NAME=M11><A NAME=M12><CODE>BYTE_ALIGN</CODE> leaves the bit position
|
|
|
177 |
alone if it is 7, and otherwise moves to bit 7 of the next byte.
|
|
|
178 |
<P>
|
|
|
179 |
<A NAME=S427>
|
|
|
180 |
<H3>8.3.3. <A NAME=13>Extendable integer encoding</A></H3>
|
|
|
181 |
<A NAME=M14><A NAME=M15><A NAME=M16>A <I>d</I>-bit extendable integer
|
|
|
182 |
encoding enables an integer greater than zero to be encoded given
|
|
|
183 |
<I>d</I>, a number of bits.
|
|
|
184 |
<P>
|
|
|
185 |
If the integer is between 1 and 2<SUP><I>d</I></SUP> - 1 inclusive,
|
|
|
186 |
a
|
|
|
187 |
<I>d</I>-bit basic integer is encoded.
|
|
|
188 |
<P>
|
|
|
189 |
If the integer, <I>i</I>, is greater than or equal to 2<SUP><I>d</I></SUP>,
|
|
|
190 |
a <I>d</I>-bit basic integer encoding of zero is inserted and then
|
|
|
191 |
<I>i</I> - 2<SUP><I>d</I></SUP> + 1 is encoded as a <I>d</I>-bit extendable
|
|
|
192 |
encodin, and so on, recursively.
|
|
|
193 |
<P>
|
|
|
194 |
<A NAME=S428>
|
|
|
195 |
|
|
|
196 |
<HR>
|
|
|
197 |
<H2>8.4. <A NAME=17>The TDF encoding</A></H2>
|
|
|
198 |
The descriptions of SORTS and constructors contain encoding information
|
|
|
199 |
which is interpreted as follows to define the TDF encoding.
|
|
|
200 |
<P>
|
|
|
201 |
<UL>
|
|
|
202 |
<LI>A TDF CAPSULE is an encoding of the <CODE>SORT CAPSULE</CODE>.<P>
|
|
|
203 |
<LI>For each <CODE>SORT</CODE> a number of encoding bits, <I>b,</I>
|
|
|
204 |
is specified. If this is zero, there will only be one construction
|
|
|
205 |
for the class, and its encoding will consist of the encodings of its
|
|
|
206 |
components, in the given order.<P>
|
|
|
207 |
<LI><A NAME=M18>If the number of encoding bits, <I>b</I>, is not zero
|
|
|
208 |
the <CODE>SORT</CODE> is described as extendable or as not extendable.
|
|
|
209 |
For each construction there is an encoding number given. If the <CODE>SORT
|
|
|
210 |
</CODE> is extendable, this number is output as an extendable integer.
|
|
|
211 |
If the <CODE>SORT</CODE> is described as not extendable, the number
|
|
|
212 |
is output as a basic integer. This is followed by the encodings of
|
|
|
213 |
the components of the construction in the order given in the description
|
|
|
214 |
of the construct.<P>
|
|
|
215 |
<LI><A NAME=M21>For the classes which are named <CODE>SLIST</CODE>(<I>x</I>)
|
|
|
216 |
- e.g. <CODE>SLIST</CODE>(<CODE>UNIT</CODE>) - the encoding consists
|
|
|
217 |
of a <CODE>TDFINT</CODE>, <I>n</I>, followed by <I>n</I> encodings
|
|
|
218 |
of <I>x</I>.<P>
|
|
|
219 |
<LI><A NAME=M22>For the classes which are named <CODE>LIST</CODE>(<I>x</I>)
|
|
|
220 |
- e.g. <CODE>LIST</CODE>(<CODE>EXP</CODE>) - the encoding consists
|
|
|
221 |
of a 1-bit integer which will be 0, followed by an <CODE>SLIST</CODE>(<I>x</I>).
|
|
|
222 |
The 1-bit integer is to allow for extensions to other representations
|
|
|
223 |
of
|
|
|
224 |
<CODE>LIST</CODE>s.<P>
|
|
|
225 |
<LI><A NAME=M23><A NAME=M24>For the classes which are named <CODE>OPTION</CODE>(
|
|
|
226 |
<I>x</I>) the encoding consists of a 1-bit basic integer. If this
|
|
|
227 |
is zero, the option is absent and there is no more encoding. If the
|
|
|
228 |
integer is 1, the option is present and an encoding of <I>x</I> follows.<P>
|
|
|
229 |
<LI><CODE>BITSTREAMS</CODE> occur in only two kinds of place. One
|
|
|
230 |
is the constructions with the form <I>x_cond</I>, which are the install-time
|
|
|
231 |
conditionals. For each of these the class encoded in the <CODE>BITSTREAM</CODE>
|
|
|
232 |
is the same as the class which is the result of the <I>x_cond</I>
|
|
|
233 |
construction. The other kind of place is as the <I>token_args</I>
|
|
|
234 |
component of a construction with the form <I>x_apply_token</I>. This
|
|
|
235 |
component always gives the parameters of the <CODE>TOKEN</CODE>. It
|
|
|
236 |
can only be decoded if there is a token definition or a token declaration
|
|
|
237 |
for the particular token being applied, i.e. for the <I>token_value</I>
|
|
|
238 |
component of the construction. In this case the <CODE>SORTS</CODE>
|
|
|
239 |
and hence the classes of the actual token arguments are given by the
|
|
|
240 |
declaration or definition, and encodings of these classes are placed
|
|
|
241 |
in sequence after the number of bits. If the declaration or definition
|
|
|
242 |
are not available, the <CODE>BITSTREAM</CODE> can only be skipped.<P>
|
|
|
243 |
<LI><CODE>BYTESTREAM</CODE> <I>X</I> occurs in only one place, the
|
|
|
244 |
encoding of the <CODE>SORT UNIT</CODE>. The <CODE>SORT</CODE>
|
|
|
245 |
<I>X</I> is determined by the <CODE>UNIT</CODE> identification which
|
|
|
246 |
is given for each of the relevant <CODE>SORTS</CODE>.<P>
|
|
|
247 |
<LI><A NAME=M25>The <I>tld</I> <CODE>UNIT</CODE> is encoded specially.
|
|
|
248 |
It is always the first <CODE>UNIT</CODE> in a Capsule and is used
|
|
|
249 |
to pass information to the TDF linker. The first entry in a <I>tld</I>
|
|
|
250 |
<CODE>UNIT</CODE> is a <CODE>TDFINT</CODE> giving the format of the
|
|
|
251 |
remainder of the <CODE>UNIT</CODE>. Currently, the linker supports
|
|
|
252 |
formats 0 and 1, but others may be added to give greater functionality
|
|
|
253 |
while retaining compatibility. With format 0, the remainder of <CODE>UNIT</CODE>
|
|
|
254 |
is identical to a now obsolete
|
|
|
255 |
<I>tld2</I> <CODE>UNIT</CODE>. With format 1, the remainder of the
|
|
|
256 |
<CODE>UNIT</CODE> is as follows: If <I>n</I> is the number of <CODE>EXTERN_LINK
|
|
|
257 |
</CODE>s in the <I>external_linkage</I>
|
|
|
258 |
argument of <I>make_capsule</I>, the remainder consists of <I>n</I>
|
|
|
259 |
sequences. These sequences are in the order given by <I>external_linkage</I>.
|
|
|
260 |
Each element of a sequence consist of one <CODE>TDFINT</CODE> per
|
|
|
261 |
linkable entity in the corresponding <I>el</I> of the <I>make_extern_link</I>
|
|
|
262 |
in the same order. These integers will describe properties of the
|
|
|
263 |
correponding external links. (In format 0, there are only two sequences,
|
|
|
264 |
the first describing the token external links and the second describing
|
|
|
265 |
the tag external links.)
|
|
|
266 |
<P>
|
|
|
267 |
<UL>
|
|
|
268 |
<LI>Bit 0: 1 means "used in this capsule", 0 means "not
|
|
|
269 |
used in this capsule".
|
|
|
270 |
<LI>Bit 1: 1 means "declared in this capsule", 0 means "not
|
|
|
271 |
declared in this capsule".
|
|
|
272 |
<LI>Bit 2: 0 means "not defined in this capsule, 1 means "defined
|
|
|
273 |
in this capsule".
|
|
|
274 |
<LI>Bit 3: 0 means "is uniquely defined", 1 means "
|
|
|
275 |
may be multiply defined".
|
|
|
276 |
</UL>
|
|
|
277 |
</UL>
|
|
|
278 |
<P>
|
|
|
279 |
<A NAME=S429>
|
|
|
280 |
|
|
|
281 |
<HR>
|
|
|
282 |
<H2>8.5. File Formats</H2>
|
|
|
283 |
There may be various kinds of files which contain TDF bitstream information.
|
|
|
284 |
Each will start with a 4-byte "magic-number" identifying
|
|
|
285 |
the kind of file followed by two <CODE>TDFINT</CODE>s giving the major
|
|
|
286 |
and minor version numbers of the TDF involved.
|
|
|
287 |
<P>
|
|
|
288 |
A CAPSULE file will have a magic-number <I>"TDFC"</I>. The
|
|
|
289 |
encoding of the CAPSULE will be byte-aligned following the version
|
|
|
290 |
numbers.
|
|
|
291 |
<P>
|
|
|
292 |
A TDF library file will have a magic-number <I>"TDFL"</I>.
|
|
|
293 |
These files are constructed by the TDF linker.
|
|
|
294 |
<P>
|
|
|
295 |
A TDF archive file will have a magic-number <I>"TDFA"</I>.
|
|
|
296 |
<P>
|
|
|
297 |
Other file formats introduced should follow a similar pattern.
|
|
|
298 |
<P>
|
|
|
299 |
<I>The TDF linker will refuse to link TDF files with different major
|
|
|
300 |
version numbers. The resulting minor version number is the maximum
|
|
|
301 |
of component minor version numbers.</I>
|
|
|
302 |
<P>
|
|
|
303 |
<P>
|
|
|
304 |
<HR>
|
|
|
305 |
<P><I>Part of the <A HREF="../index.html">TenDRA Web</A>.<BR>Crown
|
|
|
306 |
Copyright © 1998.</I></P>
|
|
|
307 |
</BODY>
|
|
|
308 |
</HTML>
|