WebSVN – tendra.SVN – /branches/tendra4/doc/tcc/tcc4.html

<!-- Crown Copyright (c) 1998 -->
<HTML>
<HEAD>
<TITLE>tcc User's Guide: The Overall Design of tcc</TITLE>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#400080" ALINK="#FF0000">
<A NAME=S3>
<H1>tcc User's Guide</H1>
<H3>January 1998</H3>
<A HREF="tcc5.html"><IMG SRC="../images/next.gif" ALT="next section"></A>
<A HREF="tcc3.html"><IMG SRC="../images/prev.gif" ALT="previous section"></A>
<A HREF="tcc1.html"><IMG SRC="../images/top.gif" ALT="current document"></A>
<A HREF="../index.html"><IMG SRC="../images/home.gif" ALT="TenDRA home page">
</A>
<IMG SRC="../images/no_index.gif" ALT="document index"><P>
<HR>
<DL>
<DT><A HREF="#S4"><B>3.1</B> - Specifying the API</A><DD>
<DT><A HREF="#S5"><B>3.2</B> - The Main Compilation Path</A><DD>
<DT><A HREF="#S6"><B>3.3</B> - Input File Types</A><DD>
<DT><A HREF="#S7"><B>3.4</B> - Intermediate and Output Files</A><DD>
<DT><A HREF="#S8"><B>3.5</B> - Other Compilation Paths</A><DD>
<DL>
<DT><A HREF="#S9"><B>3.5.1</B> - Preprocessing</A><DD>
<DT><A HREF="#S10"><B>3.5.2</B> - TDF Archives</A><DD>
<DT><A HREF="#S11"><B>3.5.3</B> - TDF Notation</A><DD>
<DT><A HREF="#S12"><B>3.5.4</B> - Merging TDF Capsules</A><DD>
</DL>
<DT><A HREF="#S13"><B>3.6</B> - Finding out what tcc is doing</A><DD>
</DL>

<HR>
<H1>3.  The Overall Design of tcc</H1>
Having discussed the compilation strategy <CODE>tcc</CODE> is designed
to implement, let us move on to describe the details of this implementation.
The basic compilation path is shown in Fig. 3, which corresponds to
Fig. 2.<P>
FIGURE 3.  Basic tcc Compilation Path<BR>
<CENTER>
<IMG SRC="../images/tcc_scheme.gif">
</CENTER>
<P>
<A NAME=S4>
<HR><H2>3.1.  Specifying the API</H2>
As we have seen, the API plays a far more concrete role in the TDF
compilation strategy than in the traditional scheme. Therefore the
API needs to be explicitly specified to <CODE>tcc</CODE> before any
compilation takes place. As can be seen from Fig. 3, the API has three
components. Firstly, in the target independent (or production) half
of the compilation, there are the target independent headers which
describe the API. Secondly in the target dependent (or installation)
half, there is the API implementation for the particular target machine.
This is divided between the TDF libraries, derived from the system
headers, and the system libraries. Specifying the API to <CODE>tcc</CODE>
essentially consists of telling it what target independent headers,
TDF libraries and system libraries to use. The precise way in which
this is done is discussed below (in section 4.3</A>).<P>
<A NAME=S5>
<HR><H2>3.2.  The Main Compilation Path</H2>
Once the API has been specified, the actual compilation can begin.
The default action of <CODE>tcc</CODE> is to perform production and
installation consecutively on the same machine; any other action needs
to be explicitly specified. So let us describe the entire compilation
path from C source to executable shown in Fig. 3.<P>
<OL>
<LI>The first stage is production. The C --&gt; TDF producer transforms
each input C source file into a target independent TDF capsule, using
the target independent headers to describe the API in abstract terms.
These target independent capsules will contain tokens to represent
the uses of objects from the API, but these tokens will be left undefined.<P>
<LI>The second stage, which is also the first stage of the installation,
is TDF linking. Each target independent capsule is combined with the
TDF library describing the API implementation to form a target dependent
TDF capsule. Recall that the TDF libraries contain the local definitions
of the tokens left undefined by the producer, so the resultant target
dependent capsule will contain both the uses of these tokens and the
corresponding token definitions.<P>
<LI>The third stage of the compilation is for the TDF translator to
transform each target dependent TDF capsule into an assembly source
file for the appropriate target machine. Some TDF translators output
not an assembly source file, but a binary object file. In this case
the following assembler stage is redundant and the compilation skips
to the system linking.<P>
<LI>The next stage of the compilation is for each assembly source
file to be translated into a binary object file by the system assembler.<P>
<LI>The final compilation phase is for the system linker to combine
all the binary object files with the system libraries to form a single,
final executable. Recall that the system libraries are the final constituent
of the API implementation, so this stage completes the combination
of the program with the API implementation started in stage 2).<P>
</OL>
Let us, for convenience, tabulate these stages, giving the name of
each compilation tool (plus the corresponding executable name), a
code letter which <CODE>tcc</CODE> uses to refer to this stage, and
the input and output file types for the stage (also see 7.2</A>).<P>
<PRE>
        <B>   TOOL                                      INPUT           OUTPUT</B>
        1. C producer (tdfc)            c       C source        target ind. TDF
        2. TDF linker (tld)             L       target ind. TDF target dep. TDF
        3. TDF translator (trans)       t       target dep. TDF assembly source
        4. assembler (as)               a       assembly source binary object
        5. system linker (ld)           l       binary object   executable
</PRE>
The executable name of the TDF translator varies, depending on the
target machine. It will normally start, or end, however, in <CODE>trans</CODE>.
These stages are documented in more detail in sections 5.1</A> to
5.5.<P>
The code letters for the various compilation stages can be used in
the <B>-W</B><I>tool</I><B>, </B><I>opt</I><B>,</B> ... command-line
option to <CODE>tcc</CODE>. This passes the option(s) <I>opt</I> directly
to the executable in the compilation stage identified by the letter
<I>tool</I>. For example, <B>-Wl, -x</B> will cause the system linker
to be invoked with the <B>-x</B> option. Similarly the <B>-E</B><I>tool</I><B>:
</B><I>file</I> allows the executable to be invoked at the compilation
stage <I>tool</I> to be specified as <I>file</I>. This allows the
<CODE>tcc</CODE> user access to the compilation tools in a very direct
manner.<P>
<A NAME=S6>
<HR><H2>3.3.  Input File Types</H2>
This compilation path may be joined at any point, and terminated at
any point. The latter possibility is discussed below. For the former,
<CODE>tcc</CODE> determines, for each input file it is given, to which
of the file types it knows (C source, target independent TDF, etc.)
this file belongs. This determines where in the compilation path described
this file will start. The method used to determine the type of a file
is the normal filename suffix convention:<P>
<UL>
<LI>files ending in <CODE>.c</CODE> are understood to be C source
files,<P>
<LI>files ending in <CODE>.j</CODE> are understood to be target independent
TDF capsules,<P>
<LI>files ending in <CODE>.t</CODE> are understood to be target dependent
TDF capsules,<P>
<LI>files ending in <CODE>.s</CODE> are understood to be assembly
source files,<P>
<LI>files ending in <CODE>.o</CODE> are understood to be binary object
files,<P>
<LI>files whose type cannot otherwise be determined are assumed to
be binary object files,<P>
</UL>
(for a complete list see 7.1</A>). Thus, for example, we speak of
&quot;<CODE>.j</CODE> files&quot; as a shorthand for &quot;target
independent TDF capsules&quot;. Each file type recognised by <CODE>tcc</CODE>
is assigned an identifying letter. For convenience, this corresponds
to the suffix identifying the file type (<CODE>c</CODE> for C source
files, <CODE>j</CODE> for target independent TDF capsules etc.).<P>
There is an alternative method of specifying input files, by means
of the <B>-S</B><I>type</I><B>, </B><I>file</I><B>,</B> ... command-line
option. This specifies that the file <I>file</I> should be treated
as an input file of the type corresponding to the letter <I>type</I>,
regardless of its actual suffix. Thus, for example, <B>-Sc,</B> <I>file</I>
specifies that <I>file</I> should be regarded as a C source (or <CODE>.c</CODE>)
file.<P>
<A NAME=S7>
<HR><H2>3.4.  Intermediate and Output Files</H2>
During the compilation, <CODE>tcc</CODE> makes up names for the output
files of each of the compilation phases. These names are based on
the input file name, but with the input file suffix replaced by the
output file suffix (unless the <B>-make_up_names</B> command-line
option is given, in which case the intermediate files are given names
of the form  
<CODE>_tccnnnn.x</CODE>, where <CODE>nnnn</CODE> is a number which
is incremented for each intermediate file produced, and <CODE>x</CODE>
is the suffix corresponding to the output file type). Thus if the
input file <CODE>file.c</CODE> is given, this will be transformed
into <CODE>file.j</CODE> by the producer, which in turn will be transformed
into <CODE>file.t</CODE> by the TDF linker, and so on. The system
linker output file name can not be deduced in the same way since it
is the result of linking a number of <CODE>.o</CODE>
files. By default, as with <CODE>cc</CODE>, this file is called <CODE>a.out
</CODE>.<P>
For most purposes these intermediate files are not required to be
preserved; if we are compiling a single C source file to an executable,
then the only output file we are interested in is the executable,
not the intermediate files created during the compilation process.
For this reason <CODE>tcc</CODE> creates a temporary directory in
which to put these intermediate files, and removes this directory
when the compilation is complete. All intermediate files are put into
this temporary directory except:<P>
<UL>
<LI>those which are an end product of the compilation (such as the
executable),<P>
<LI>those which are explicitly ordered to be preserved by means of
command-line options,<P>
<LI>binary object files, when more than one such file is produced
(this is for compatibility with <CODE>cc</CODE>).<P>
</UL>
<CODE>tcc</CODE> can be made to preserve intermediate files of various
types by means of the <B>-P</B><I>type</I>... command-line option,
which specifies a list of letters corresponding to the file types
to be preserved. Thus for example <B>-Pjt</B> specifies that all TDF
capsules produced, whether target independent or target dependent,
(i.e. all <CODE>.j</CODE> and <CODE>.t</CODE> files) should be preserved.
The special form <B>-Pa</B> specifies that all intermediate files
should be preserved. It is also possible to specify that a certain
file type should not be preserved by preceding the corresponding letter
by <B>-</B> in the <B>-P</B> option. The only really useful application
of this is to use <B>-P-o</B> to cancel the <CODE>cc</CODE> convention
on preserving binary object files mentioned above.<P>
By default, all preserved files are stored in the current working
directory. However the <B>-work</B> <I>dir</I> command-line option
specifies that they should be stored in the directory <I>dir</I>.<P>
The compilation can also be halted at any stage. The <B>-F</B><I>type</I>
option to <CODE>tcc</CODE> tells it to stop the compilation after
creating the files of the type corresponding to the letter <I>type</I>.
Because any files of this type which are produced will be an end product
of the compilation, they will automatically be preserved. For example,
<B>-Fo</B> halts the compilation after the creation of the binary
object, or <CODE>.o</CODE>, files (i.e. just before the system linking),
and preserves all such files produced. A number of other <CODE>tcc</CODE>
options are equivalent to options of the form <B>-F</B><I>type</I>:<P>
<UL>
<LI><B>-i</B> is equivalent to <B>-Fj</B> (i.e. just apply the producer),<P>
<LI><B>-S</B> is equivalent to <B>-Fs</B> (<CODE>cc</CODE> compatibility),<P>
<LI><B>-c</B> is equivalent to <B>-Fo</B> (<CODE>cc</CODE> compatibility).<P>
</UL>
If more than one <B>-F</B> option (including the equivalent options
just listed) is given, then <CODE>tcc</CODE> issues a warning. The
stage coming first in the compilation path takes priority.<P>
If the compilation has precisely one end product output file, then
the name of this file can be specified to be <I>file</I> by means
of the <B>-o</B> <I>file</I> command-line option. If a <B>-o</B> <I>file</I>
option is given when there is more than one end product, then the
first such file produced will be called <I>file</I>, and all such
files produced subsequently will cause <CODE>tcc</CODE> to issue a
warning.<P>
FIGURE 4.  Full tcc Compilation Path<BR>
<CENTER>
<IMG SRC="../images/tcc_files.gif">
</CENTER>
<P>
<A NAME=S8>
<HR><H2>3.5.  Other Compilation Paths</H2>
So far we have been discussing the main <CODE>tcc</CODE> compilation
path from C source to executable. This is however only part of the
picture. The full complexity (almost) of all the possible compilation
paths supported by <CODE>tcc</CODE> is shown in Fig. 4. This differs
from Fig. 3 in that it only shows the left hand, or program, half
of the main compilation diagram. The solid arrows show the default
compilation paths; the shaded arrows are only followed if <CODE>tcc</CODE>
is so instructed by means of command-line options. Let us consider
those paths in this diagram which have not so far been mentioned.<P>
<A NAME=S9>
<H3>3.5.1.  Preprocessing</H3>
The first paths to be considered involve preprocessed C source files.
These form a distinct file type which <CODE>tcc</CODE> recognises
by means of the <CODE>.i</CODE> file suffix. Input <CODE>.i</CODE>
files are treated in exactly the same way as <CODE>.c</CODE> files;
that is, they are fed into the producer.<P>
<CODE>tcc</CODE> can be made to preprocess the C source files it is
given by means of the <B>-P</B> and <B>-E</B> options. If the <B>-P</B>
option is given then each <CODE>.c</CODE> file is transformed into
a corresponding <CODE>.i</CODE> file by the TDF C preprocessor, <CODE>tdfcpp
</CODE>. If the <B>-E</B> option is given then the output of <CODE>tdfcpp</CODE>
is sent instead to the standard output. In both cases the compilation
halts after the preprocessor has been applied. Preprocessing is discussed
further in section 5.6</A>.<P>
<A NAME=S10>
<H3>3.5.2.  TDF Archives</H3>
The second new file type introduced in Fig. 4 is the TDF archive.
This is recognised by <CODE>tcc</CODE> by means of the <CODE>.ta</CODE>
file suffix. Basically a TDF archive is a set of target independent
TDF capsules (this is slightly simplified, see section 5.2.3</A> for
more details). Any input TDF archives are automatically split into
their constituent target independent capsules. These then join the
main compilation path in the normal way.<P>
In order to create a TDF archive, <CODE>tcc</CODE> must be given the
<B>-prod</B> command-line option. It will combine all the target independent
TDF capsules it has into an archive, and the compilation will then
halt. By default this archive is called <CODE>a.ta</CODE>, but another
name may be specified using the <B>-o</B> option.<P>
The routines for splitting and building TDF archives are built into
<CODE>tcc</CODE>, and are not implemented by a separate compilation
tool (in particular, TDF archives are not <CODE>ar</CODE> archives).
Really TDF archives are a <CODE>tcc</CODE>-specific construction;
they are not part of TDF proper.<P>
<A NAME=S11>
<H3>3.5.3.  TDF Notation</H3>
TDF has the form of an abstract syntax tree which is encoded as a
series of bits. In order to examine the contents of a TDF capsule
it is necessary to translate it into an equivalent human readable
form. Two tools are provided which do this. The TDF pretty printer,
<CODE>disp</CODE>, translates TDF into text, whereas the TDF notation
compiler, <CODE>tnc</CODE>, both translates TDF to text and text to
TDF. The two textual forms of TDF are incompatible - <CODE>disp</CODE>
output cannot be used as <CODE>tnc</CODE> input. <CODE>disp</CODE>
is in many ways the more sophisticated decoder - it understands the
TDF extensions used to handle diagnostics, for example - but it does
not handle the text to TDF translation which <CODE>tnc</CODE> does.
By default <CODE>tnc</CODE> is a text to TDF translator, it needs
to be passed the <B>-p</B> flag in order to translate TDF into text.
We refer to the textual form of TDF supported by <CODE>tnc</CODE>
as TDF notation.<P>
By default, <CODE>tcc</CODE> uses <CODE>disp</CODE>. If it is given
the <B>-disp</B> command-line option then all target independent TDF
capsules (<CODE>.j</CODE> files) are transformed into text using <CODE>disp
</CODE>. The <B>-disp_t</B> option causes all target dependent TDF
capsules (<CODE>.t</CODE> files) to be transformed into text. In both
cases the output files have a <CODE>.p</CODE> suffix, and the compilation
halts after they are produced.<P>
In order for <CODE>tnc</CODE> to be used, the <B>-Ytnc</B> flag should
be passed to <CODE>tcc</CODE>. In this case the <B>-disp</B> and the
<B>-disp_t</B> option cause, not <CODE>disp</CODE>, but <CODE>tnc</CODE>
<B>-p</B>, to be invoked. But this flag also causes <CODE>tcc</CODE>
to recognise files with a <CODE>.p</CODE> suffix as TDF notation source
files. These are translated by <CODE>tnc</CODE> into target independent
TDF capsules, which join the main compilation path in the normal way.<P>
Similarly if the <B>-Ypl_tdf</B> flag is passed to <CODE>tcc</CODE>
then it recognises files with a <CODE>.pl</CODE> suffix as PL_TDF
source files. These are translated by the PL_TDF compiler, <CODE>pl</CODE>,
into target independent TDF capsules.<P>
<CODE>disp</CODE> and <CODE>tnc</CODE> are further discussed in section
5.7</A>.<P>
<A NAME=S12>
<H3>3.5.4.  Merging TDF Capsules</H3>
The final unexplored path in Fig. 4 is the ability to combine all
the target independent TDF capsules into a single capsule. This is
specified by means of the <B>-M</B> command-line option to <CODE>tcc</CODE>.
The combination of these capsules is performed by the TDF linker,
<CODE>tld</CODE>. Whereas in the main compilation path <CODE>tld</CODE>
is used to combine a single target independent TDF capsule with the
TDF libraries to form a target dependent TDF capsule, in this case
it is used to combine several target independent capsules into a single
target independent capsule. By default the combined capsule is called
<CODE>a.j</CODE>. The compilation will continue after the combination
phase, with the resultant capsule rejoining the main compilation path.
This merging operation is further discussed in section 5.2.2</A>.<P>
The only unresolved issue in this case is, if the <B>-M</B> option
is given, to what <CODE>.j</CODE> files do the <B>-Fj</B> and the
<B>-Pj</B> options refer? In fact, <CODE>tcc</CODE> takes them to
refer to the merged TDF capsule rather than the capsules which are
merged to form it. The <B>-Pa</B> option, however, will cause both
sets of capsules to be preserved.<P>
To summarise, <CODE>tcc</CODE> has an extra three file types, and
an extra three compilation tools (not including the TDF archive creating
and splitting routines which are built into <CODE>tcc</CODE>). These
are:<P>
<UL>
<LI>files ending in <CODE>.i</CODE> are understood to be preprocessed
C source files,<P>
<LI>files ending in <CODE>.ta</CODE> are understood to be TDF archives,<P>
<LI>files ending in <CODE>.p</CODE> are understood to be TDF notation
source files,<P>
</UL>
and:<P>
<PRE>
        <B>   TOOL                                      INPUT           OUTPUT</B>
        6. C preprocessor (tdfcpp)      c       C source        preproc. C source
        7a. pretty printer (disp)       d       TDF capsule     TDF notation
        7b. reverse notation (tnc -p)   d       TDF capsule     TDF notation
        8. notation compiler (tnc)      d       TDF notation    TDF capsule
</PRE>
(see 7.1</A> and 7.2</A> for complete lists).<P>
<A NAME=S13>
<HR><H2>3.6.  Finding out what tcc is doing</H2>
With so many different file types and alternative compilation paths,
it is often useful to be able to keep track of what <CODE>tcc</CODE>
is doing. There are several command-line options which do this. The
simplest is <B>-v</B> which specifies that <CODE>tcc</CODE> should
print each command in the compilation process on the standard output
before it is executed. The <B>-vb</B> option is similar, but only
causes the name of each input file to be printed as it is processed.
Finally the <B>-dry</B> option specifies that the commands should
be printed (as with <B>-v</B>) but not actually executed. This can
be used to experiment with <CODE>tcc</CODE> to find out what it would
do in various circumstances.<P>
Occasionally an unclear error message may be printed by one of the
compilation tools. In this case the <B>-show_errors</B> option to
<CODE>tcc</CODE> might be useful. It causes <CODE>tcc</CODE> to print
the command it was executing when the error occurred. By default,
if an error occurs during the construction of an output file, the
file is removed by <CODE>tcc</CODE>. It can however be preserved for
examination using the <B>-keep_errors</B> option. This applies not
only to normal errors, but also to exceptional errors such as the
user interrupting <CODE>tcc</CODE> by pressing <CODE>^C</CODE>, or
one of the compilation tools crashing. In the latter case, <CODE>tcc</CODE>
will also remove any core file produced, unless the <B>-keep_errors</B>
option is specified.<P>
For purposes of configuration control, the <B>-version</B> flag will
cause <CODE>tcc</CODE> to print its version number. This will typically
be of the form:<P>
<PRE>
        tcc: Version: 4.0, Revision: 1.5, Machine: hp
</PRE>
giving the version and revision number, plus the target machine identifier.
The <B>-V</B> flag will also cause each compilation tool to print
its version number (if appropriate) as it is invoked.<P>
<HR>
<P><I>Part of the <A HREF="../index.html">TenDRA Web</A>.<BR>Crown
Copyright &copy; 1998.</I></P>
</BODY>
</HTML>
Subversion Repositories tendra.SVN

(root)/branches/tendra4/doc/tcc/tcc4.html – Rev 2