Rev 2 | Blame | Compare with Previous | Last modification | View Log | RSS feed
<!-- Crown Copyright (c) 1998 -->
<HTML>
<HEAD>
<TITLE>
C++ Producer Guide: Parsing C++
</TITLE>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#400080" ALINK="#FF0000">
<H1>C++ Producer Guide</H1>
<H3>March 1998</H3>
<A HREF="tdf.html"><IMG SRC="../images/next.gif" ALT="next section"></A>
<A HREF="error.html"><IMG SRC="../images/prev.gif" ALT="previous section"></A>
<A HREF="index.html"><IMG SRC="../images/top.gif" ALT="current document"></A>
<A HREF="../index.html"><IMG SRC="../images/home.gif" ALT="TenDRA home page">
</A>
<IMG SRC="../images/no_index.gif" ALT="document index"><P>
<HR>
<H2>3.4. Parsing C++</H2>
<P>
The parser used in the C++ producer is generated using the
<A HREF="../utilities/sid.html"><CODE>sid</CODE> tool</A>. Because
of the large size of the generated code (1.3MB), the <CODE>sid</CODE>
output is run through a simple program, <CODE>sidsplit</CODE>, which
splits the output into a number of more manageable modules. It also
transforms the code to use the <A HREF="style.html#language"><CODE>PROTO</CODE>
macros</A> used in the rest of the program.
</P>
<P>
<CODE>sid</CODE> is designed as a parser for grammars which can be
transformed into LL(1) grammars. The distinguishing feature of these
grammars is that the parser can always decide what to do next based
on the current terminal. This is not the case in C++; in some circumstances
a potentially unlimited look-ahead is required to distinguish, for
example, declaration statements from expression statements. In the
technical phrase, C++ is an LL(k) grammar. Fortunately there are relatively
few such situations, and <CODE>sid</CODE>
provides a mechanism, <A HREF="../utilities/sid.html#predicate">predicates</A>,
for bypassing the normal parsing mechanism in these cases. Thus it
is possible, although difficult, to express C++ as a <CODE>sid</CODE>
grammar.
</P>
<P>
The <CODE>sid</CODE> grammar file, <CODE>syntax.sid</CODE>, is closely
based on the ISO C++ grammar. In particular, the same production
names have been used. The grammar has been extended slightly to allow
common syntactic errors to be detected elegantly. Other parsing errors
are handled by <CODE>sid</CODE>'s exception mechanism. At present
there is only limited recovery after such errors.
</P>
<P>
The lexical analysis routines in the C++ producer are hand-crafted,
based on an initial version generated by the simple lexical analyser
generator,
<CODE>lexi</CODE>. <CODE>lexi</CODE> has been used more directly
to generate the lexical analysers for certain of the other automatic
code generating tools, including <CODE>calculus</CODE>, used in the
producer.
</P>
<P>
The <CODE>sid</CODE> grammar contains a number of entry points. The
most important is <CODE>parse_file</CODE>, which is used to parse
a complete C++ translation unit. The syntax for the
<A HREF="pragma.html"><CODE>#pragma TenDRA</CODE> directives</A> is
included within the same grammar with two entry points,
<CODE>parse_tendra</CODE> in normal use, and <CODE>parse_preproc</CODE>
for use in preprocessing mode. There are also entry points in the
grammar for each of the kinds of <A HREF="token.html#args">token argument</A>.
The parsing routines for token and template arguments are largely
hand-crafted, based on these primitives.
</P>
<P>
Certain parsing operations are performed before control passes to
the
<CODE>sid</CODE> grammar. As mentioned above, these include the processing
of token and template applications. The other important case concerns
nested name specifiers. For example, in:
<PRE>
class A {
class B {
static int c ;
} ;
} ;
int A::B::c = 0 ;
</PRE>
the qualified identifier <CODE>A::B::c</CODE> is split into two terminals,
a nested name specifier, <CODE>A::B::</CODE>, and an identifier, <CODE>c</CODE>,
which is looked up in the corresponding namespace. Note that it is
at this stage that name look-up occurs. An identifier can be mapped
to one of a number of terminals, including keywords, type names,
namespace names and other identifiers, according to the result of
this look-up. If the look-up gives a macro then this is expanded
at this stage.
</P>
<HR>
<P><I>Part of the <A HREF="../index.html">TenDRA Web</A>.<BR>Crown
Copyright © 1998.</I></P>
</BODY>
</HTML>