Rev 2 | Blame | Compare with Previous | Last modification | View Log | RSS feed
<!-- Crown Copyright (c) 1998 -->
<HTML>
<HEAD>
<TITLE>The bit encoding of TDF</TITLE>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#400080" ALINK="#FF0000">
<H1><A NAME=S417>TDF Specification, Issue 4.0</A></H1>
<H3>January 1998</H3>
<A HREF="spec12.html"><IMG SRC="../images/next.gif" ALT="next section"></A>
<A HREF="spec10.html"><IMG SRC="../images/prev.gif" ALT="previous section"></A>
<A HREF="spec1.html"><IMG SRC="../images/top.gif" ALT="current document"></A>
<A HREF="../index.html"><IMG SRC="../images/home.gif" ALT="TenDRA home page">
</A>
<A HREF="spec12.html"><IMG SRC="../images/index.gif" ALT="document index"></A>
<P>
<HR>
<DL>
<DT><A HREF="#S418"><B>8.1</B> - The Basic Encoding</A><DD>
<DT><A HREF="#S419"><B>8.2</B> - Fundamental encodings</A><DD>
<DL>
<DT><A HREF="#S420"><B>8.2.1</B> - TDFINT</A><DD>
<DT><A HREF="#S421"><B>8.2.2</B> - TDFBOOL</A><DD>
<DT><A HREF="#S422"><B>8.2.3</B> - TDFSTRING</A><DD>
<DT><A HREF="#S423"><B>8.2.4</B> - TDFIDENT</A><DD>
</DL>
<DT><A HREF="#S424"><B>8.3</B> - BITSTREAM</A><DD>
<DL>
<DT><A HREF="#S425"><B>8.3.1</B> - BYTESTREAM</A><DD>
<DT><A HREF="#S426"><B>8.3.2</B> - BYTE_ALIGN</A><DD>
<DT><A HREF="#S427"><B>8.3.3</B> - Extendable integer encoding</A><DD>
</DL>
<DT><A HREF="#S428"><B>8.4</B> - The TDF encoding</A><DD>
<DT><A HREF="#S429"><B>8.5</B> - File Formats</A><DD>
</DL>
<HR>
<H1>8. <A NAME=0>The bit encoding of TDF</A></H1>
This is a description of the encoding used for TDF.
<P>
<A HREF="#1">Section 8.1</A> defines the basic level of encoding,
in which integers consisting of a specified number of bits are appended
to the sequence of bytes. <A HREF="#4">Section 8.2</A> defines the
second level of encoding, in which fundamental kinds of value are
encoded in terms of integers of specified numbers of bits. <A HREF="#17">Section
8.4</A> defines the third level, in which TDF is encoded using the
previously defined concepts.
<P>
<A NAME=S418>
<HR>
<H2>8.1. <A NAME=1>The Basic Encoding</A></H2>
<A NAME=M2><A NAME=M3>TDF consists of a sequence of 8-bit bytes used
to encode integers of a varying number of bits, from 1 to 32. These
integers will be called basic integers.
<P>
TDF is encoded into bytes in increasing byte index, and within the
byte the most significant end is filled before the least significant.
Let the bits within a byte be numbered from 0 to 7, 0 denoting the
least significant bit and 7 the most significant. Suppose that the
bytes up to <I>n</I>-1 have been filled and that the next free bit
in byte <I>n</I> is bit <I>k</I>. Then bits <I>k</I>+1 to 7 are full
and bits 0 to <I>k</I> remain to be used. Now an integer of <I>d</I>
bits is to be appended.
<P>
If <I>d</I> is less than or equal to <I>k</I>, the <I>d</I> bits will
occupy bits <I>k</I>-<I>d</I>+1 to <I>k</I> of byte <I>n</I>, and
the next free bit will be at bit <I>k-d</I>. Bit 0 of the integer
will be at bit <I>k</I>-<I>d</I>+1 of the byte, and bit <I>d</I>-1
of the integer will be at bit <I>k</I>.
<P>
If <I>d</I> is equal to <I>k</I>+1, the <I>d</I> bits will occupy
bits 0 to <I>k</I> of byte <I>n</I> and the next free bit will be
bit 7 of byte <I>n</I>+1. Bit <I>d</I>-1 of the integer will be at
bit <I>k</I> of the byte.
<P>
If <I>d</I> is greater than <I>k</I>+1, the most significant <I>k</I>+1
bits of the integer will be in byte <I>n</I>, with bit <I>d</I>-1
at bit <I>k</I> of the byte. The remaining <I>d</I>-<I>k</I>-1 least
significant bits are then encoded into the bytes, starting at byte
<I>n</I>+1, bit 7, using the same algorithm (i.e. recursively).
<P>
<A NAME=S419>
<HR>
<H2>8.2. <A NAME=4>Fundamental encodings</A></H2>
This section describes the encoding of <CODE>TDFINT</CODE>,
<CODE>TDFBOOL</CODE>, <CODE>TDFSTRING</CODE>, <CODE>TDFIDENT</CODE>,
<CODE>BITSTREAM</CODE>, <CODE>BYTESTREAM</CODE>, <CODE>BYTE_ALIGN</CODE>
and extendable integers.
<P>
<A NAME=S420>
<H3>8.2.1. TDFINT</H3>
<A NAME=M5><CODE>TDFINT</CODE> encodes non-negative integers of unbounded
size. The encoding uses octal digits encoded in 4-bit basic integers.
The most significant octal digit is encoded first, the least significant
last. For all digits except the last the 4-bit integer is the value
of the octal digit. For the last digit the 4-bit integer is the value
of the octal digit plus 8.
<P>
<A NAME=S421>
<H3>8.2.2. TDFBOOL</H3>
<A NAME=M6><CODE>TDFBOOL</CODE> encodes a boolean, true or false.
The encoding uses a 1-bit basic integer, with 1 encoding true and
0 encoding false.
<P>
<A NAME=S422>
<H3>8.2.3. TDFSTRING</H3>
<CODE>TDFSTRING</CODE> encodes a sequence containing <I>n</I> non-negative
integers, each of <I>k</I> bits. The encoding consists of, first a
<CODE>TDFINT</CODE> giving the number of bits, second a <CODE>TDFINT</CODE>
giving the number of integers, which may be zero. Thirdly it contains
<I>n</I> <I>k</I>-bit basic integers, giving the sequence of integers
required, the first integer being first in this sequence.
<P>
<A NAME=S423>
<H3>8.2.4. TDFIDENT</H3>
<A NAME=M7><CODE>TDFIDENT</CODE> also encodes a sequence containing
<I>n</I> non-negative integers. These integers will all consist of
the same number of bits, which will be a multiple of 8. It is a property
of the encoding of the other constructions that TDFIDENTS will start
on either bit 7 or bit 3 of a byte and end on bit 7 or bit 3 of a
byte. It thus has some alignment properties which are useful to permit
fast copying of sections of TDF.
<P>
The encoding consists of, first a <CODE>TDFINT</CODE> giving the number
of bits, second a <CODE>TDFINT</CODE> giving the number of integers,
which may be zero. If the next free bit is not bit 7 of some byte,
it is moved on to bit 7 of the next byte.
<P>
Thirdly it contains <I>n</I> <I>k</I>-bit integers.
<P>
If the next free bit is not bit 7 of some byte, it is moved on to
bit 7 of the next byte.
<P>
<A NAME=S424>
<HR>
<H2>8.3. <A NAME=8>BITSTREAM</A></H2>
<A NAME=M9>It can be useful to be able to skip a TDF construction
without reading through it. <CODE>BITSTREAM</CODE> provides a means
of doing this.
<P>
A <CODE>BITSTREAM</CODE> encoding of <I>X</I> consists of a <CODE>TDFINT</CODE>
giving the number of bits of encoding which are occupied by the <I>X</I>.
Hence to skip over a <CODE>BITSTREAM</CODE> while decoding, one should
read the <CODE>TDFINT</CODE> and then advance the bit index by that
number of bits. To read the contents of a <CODE>BITSTREAM</CODE> encoding
of <I>X</I>, one should read and ignore a <CODE>TDFINT</CODE> and
then decode an <I>X</I>. There will be no spare bits at the end of
the <I>X</I>, so reading can continue directly.
<P>
<A NAME=S425>
<H3>8.3.1. BYTESTREAM</H3>
<A NAME=M10>It can be useful to be able to skip a TDF construction
without reading through it. <CODE>BYTESTREAM</CODE> provides a means
of doing this while remaining byte aligned, so facilitating copying
the TDF. A <CODE>BYTESTREAM</CODE> will always start when the bit
position is 3 or 7.
<P>
A <CODE>BYTESTREAM</CODE> encoding of <I>X</I> starts with a <CODE>TDFINT</CODE>
giving a number, <I>n</I>. After this, if the current bit position
is not bit 7 of some byte, it is moved to bit 7 of the next byte.
The next <I>n</I> bytes are an encoding of <I>X</I>. There may be
some spare bits left over at the end of <I>X</I>.
<P>
Hence to skip over a <CODE>BYTESTREAM</CODE> while decoding one should
read a <CODE>TDFINT</CODE>, <I>n</I>, move to the next byte alignment
(if the bit position is not 7) and advance the bit index over <I>n</I>
bytes. To read a <CODE>BYTESTREAM</CODE> encoding of <I>X</I> one
should read a <CODE>TDFINT</CODE>, <I>n</I>, and move to the next
byte, <I>b</I> (if the bit position is not 7), and then decode an
<I>X</I>. Finally the bit position should be moved to <I>n</I> bytes
after <I>b</I>.
<P>
<A NAME=S426>
<H3>8.3.2. BYTE_ALIGN</H3>
<A NAME=M11><A NAME=M12><CODE>BYTE_ALIGN</CODE> leaves the bit position
alone if it is 7, and otherwise moves to bit 7 of the next byte.
<P>
<A NAME=S427>
<H3>8.3.3. <A NAME=13>Extendable integer encoding</A></H3>
<A NAME=M14><A NAME=M15><A NAME=M16>A <I>d</I>-bit extendable integer
encoding enables an integer greater than zero to be encoded given
<I>d</I>, a number of bits.
<P>
If the integer is between 1 and 2<SUP><I>d</I></SUP> - 1 inclusive,
a
<I>d</I>-bit basic integer is encoded.
<P>
If the integer, <I>i</I>, is greater than or equal to 2<SUP><I>d</I></SUP>,
a <I>d</I>-bit basic integer encoding of zero is inserted and then
<I>i</I> - 2<SUP><I>d</I></SUP> + 1 is encoded as a <I>d</I>-bit extendable
encodin, and so on, recursively.
<P>
<A NAME=S428>
<HR>
<H2>8.4. <A NAME=17>The TDF encoding</A></H2>
The descriptions of SORTS and constructors contain encoding information
which is interpreted as follows to define the TDF encoding.
<P>
<UL>
<LI>A TDF CAPSULE is an encoding of the <CODE>SORT CAPSULE</CODE>.<P>
<LI>For each <CODE>SORT</CODE> a number of encoding bits, <I>b,</I>
is specified. If this is zero, there will only be one construction
for the class, and its encoding will consist of the encodings of its
components, in the given order.<P>
<LI><A NAME=M18>If the number of encoding bits, <I>b</I>, is not zero
the <CODE>SORT</CODE> is described as extendable or as not extendable.
For each construction there is an encoding number given. If the <CODE>SORT
</CODE> is extendable, this number is output as an extendable integer.
If the <CODE>SORT</CODE> is described as not extendable, the number
is output as a basic integer. This is followed by the encodings of
the components of the construction in the order given in the description
of the construct.<P>
<LI><A NAME=M21>For the classes which are named <CODE>SLIST</CODE>(<I>x</I>)
- e.g. <CODE>SLIST</CODE>(<CODE>UNIT</CODE>) - the encoding consists
of a <CODE>TDFINT</CODE>, <I>n</I>, followed by <I>n</I> encodings
of <I>x</I>.<P>
<LI><A NAME=M22>For the classes which are named <CODE>LIST</CODE>(<I>x</I>)
- e.g. <CODE>LIST</CODE>(<CODE>EXP</CODE>) - the encoding consists
of a 1-bit integer which will be 0, followed by an <CODE>SLIST</CODE>(<I>x</I>).
The 1-bit integer is to allow for extensions to other representations
of
<CODE>LIST</CODE>s.<P>
<LI><A NAME=M23><A NAME=M24>For the classes which are named <CODE>OPTION</CODE>(
<I>x</I>) the encoding consists of a 1-bit basic integer. If this
is zero, the option is absent and there is no more encoding. If the
integer is 1, the option is present and an encoding of <I>x</I> follows.<P>
<LI><CODE>BITSTREAMS</CODE> occur in only two kinds of place. One
is the constructions with the form <I>x_cond</I>, which are the install-time
conditionals. For each of these the class encoded in the <CODE>BITSTREAM</CODE>
is the same as the class which is the result of the <I>x_cond</I>
construction. The other kind of place is as the <I>token_args</I>
component of a construction with the form <I>x_apply_token</I>. This
component always gives the parameters of the <CODE>TOKEN</CODE>. It
can only be decoded if there is a token definition or a token declaration
for the particular token being applied, i.e. for the <I>token_value</I>
component of the construction. In this case the <CODE>SORTS</CODE>
and hence the classes of the actual token arguments are given by the
declaration or definition, and encodings of these classes are placed
in sequence after the number of bits. If the declaration or definition
are not available, the <CODE>BITSTREAM</CODE> can only be skipped.<P>
<LI><CODE>BYTESTREAM</CODE> <I>X</I> occurs in only one place, the
encoding of the <CODE>SORT UNIT</CODE>. The <CODE>SORT</CODE>
<I>X</I> is determined by the <CODE>UNIT</CODE> identification which
is given for each of the relevant <CODE>SORTS</CODE>.<P>
<LI><A NAME=M25>The <I>tld</I> <CODE>UNIT</CODE> is encoded specially.
It is always the first <CODE>UNIT</CODE> in a Capsule and is used
to pass information to the TDF linker. The first entry in a <I>tld</I>
<CODE>UNIT</CODE> is a <CODE>TDFINT</CODE> giving the format of the
remainder of the <CODE>UNIT</CODE>. Currently, the linker supports
formats 0 and 1, but others may be added to give greater functionality
while retaining compatibility. With format 0, the remainder of <CODE>UNIT</CODE>
is identical to a now obsolete
<I>tld2</I> <CODE>UNIT</CODE>. With format 1, the remainder of the
<CODE>UNIT</CODE> is as follows: If <I>n</I> is the number of <CODE>EXTERN_LINK
</CODE>s in the <I>external_linkage</I>
argument of <I>make_capsule</I>, the remainder consists of <I>n</I>
sequences. These sequences are in the order given by <I>external_linkage</I>.
Each element of a sequence consist of one <CODE>TDFINT</CODE> per
linkable entity in the corresponding <I>el</I> of the <I>make_extern_link</I>
in the same order. These integers will describe properties of the
correponding external links. (In format 0, there are only two sequences,
the first describing the token external links and the second describing
the tag external links.)
<P>
<UL>
<LI>Bit 0: 1 means "used in this capsule", 0 means "not
used in this capsule".
<LI>Bit 1: 1 means "declared in this capsule", 0 means "not
declared in this capsule".
<LI>Bit 2: 0 means "not defined in this capsule, 1 means "defined
in this capsule".
<LI>Bit 3: 0 means "is uniquely defined", 1 means "
may be multiply defined".
</UL>
</UL>
<P>
<A NAME=S429>
<HR>
<H2>8.5. File Formats</H2>
There may be various kinds of files which contain TDF bitstream information.
Each will start with a 4-byte "magic-number" identifying
the kind of file followed by two <CODE>TDFINT</CODE>s giving the major
and minor version numbers of the TDF involved.
<P>
A CAPSULE file will have a magic-number <I>"TDFC"</I>. The
encoding of the CAPSULE will be byte-aligned following the version
numbers.
<P>
A TDF library file will have a magic-number <I>"TDFL"</I>.
These files are constructed by the TDF linker.
<P>
A TDF archive file will have a magic-number <I>"TDFA"</I>.
<P>
Other file formats introduced should follow a similar pattern.
<P>
<I>The TDF linker will refuse to link TDF files with different major
version numbers. The resulting minor version number is the maximum
of component minor version numbers.</I>
<P>
<P>
<HR>
<P><I>Part of the <A HREF="../index.html">TenDRA Web</A>.<BR>Crown
Copyright © 1998.</I></P>
</BODY>
</HTML>