WebSVN – tendra.SVN – Blame – /trunk/doc/utilities/sid.html

Rev	Author	Line No.	Line
2	7u83	1	`<!-- Crown Copyright (c) 1998 -->`
		2	`<HTML>`
		3	`<HEAD>`
		4	`<TITLE>sid users' guide</TITLE>`
		5	`</HEAD>`
		6	`<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#400080" ALINK="#FF0000">`
		7	`<H1>sid Users' Guide</H1>`
		8	`<H3>January 1998</H3>`
		9	`<IMG SRC="../images/no_next.gif" ALT="next section">`
		10	`<IMG SRC="../images/no_prev.gif" ALT="previous section">`
		11	`<IMG SRC="../images/no_top.gif" ALT="current document">`
		12	`<A HREF="../index.html"><IMG SRC="../images/home.gif" ALT="TenDRA home page">`
		13	`</A>`
		14	`<IMG SRC="../images/no_index.gif" ALT="document index"><P>`
		15	`<HR>`
		16	`<DL>`
		17	`<DT><A HREF="#intro"><B>1</B> - Introduction</A><DD>`
		18	`<DT><A HREF="#grammar"><B>2</B> - Grammars</A><DD>`
		19	`<DL>`
		20	`<DT><A HREF="#parse"><B>2.1</B> - Parsing</A><DD>`
		21	`<DT><A HREF="#context-free"><B>2.2</B> - Context free grammars</A><DD>`
		22	`<DT><A HREF="#sid-grammar"><B>2.3</B> - <CODE>sid</CODE> grammars</A><DD>`
		23	`</DL>`
		24	`<DT><A HREF="#overview"><B>3</B> - Overview</A><DD>`
		25	`<DL>`
		26	`<DT><A HREF="#left-recursion"><B>3.1</B> - Left recursion elimination</A><DD>`
		27	`<DT><A HREF="#factoring"><B>3.2</B> - Factoring</A><DD>`
		28	`<DT><A HREF="#optimise"><B>3.3</B> - Optimisations</A><DD>`
		29	`</DL>`
		30	`<DT><A HREF="#invoke"><B>4</B> - Invocation</A><DD>`
		31	`<DT><A HREF="#grammar-file"><B>5</B> - The <CODE>sid</CODE> grammar`
		32	`file</A><DD>`
		33	`<DL>`
		34	`<DT><A HREF="#grammar-lex"><B>5.1</B> - Lexical conventions</A><DD>`
		35	`<DT><A HREF="#type-decl"><B>5.2</B> - The type declaration section</A><DD>`
		36	`<DT><A HREF="#term-decl"><B>5.3</B> - The terminal declaration section</A><DD>`
		37	`<DT><A HREF="#rule-defn"><B>5.4</B> - The rule definition section</A><DD>`
		38	`<DT><A HREF="#entry-point"><B>5.5</B> - The grammar entry points section</A><DD>`
		39	`</DL>`
		40	`<DT><A HREF="#info-file"><B>6</B> - The C information file</A><DD>`
		41	`<DL>`
		42	`<DT><A HREF="#info-lex"><B>6.1</B> - Lexical conventions</A><DD>`
		43	`<DT><A HREF="#prefixes"><B>6.2</B> - The prefixes section</A><DD>`
		44	`<DT><A HREF="#maps"><B>6.3</B> - The maps section</A><DD>`
		45	`<DT><A HREF="#header"><B>6.4</B> - The header section</A><DD>`
		46	`<DT><A HREF="#assign"><B>6.5</B> - The assignments section</A><DD>`
		47	`<DT><A HREF="#param-assign"><B>6.6</B> - The parameter assignments`
		48	`section</A><DD>`
		49	`<DT><A HREF="#result-assign"><B>6.7</B> - The result assignments section</A><DD>`
		50	`<DT><A HREF="#term-result"><B>6.8</B> - The terminal result extraction`
		51	`section</A><DD>`
		52	`<DT><A HREF="#action-defn"><B>6.9</B> - The action definition section</A><DD>`
		53	`<DT><A HREF="#trailer"><B>6.10</B> - The trailer section</A><DD>`
		54	`</DL>`
		55	`<DT><A HREF="#predicate"><B>7</B> - Predicates</A><DD>`
		56	`<DT><A HREF="#exception"><B>8</B> - Error handling</A><DD>`
		57	`<DT><A HREF="#call-by-ref"><B>9</B> - Call by reference</A><DD>`
		58	`<DT><A HREF="#call-entry"><B>10</B> - Calling entry points</A><DD>`
		59	`<DT><A HREF="#glossary"><B>11</B> - Glossary</A><DD>`
		60	`<DT><A HREF="#errors"><B>12</B> - Understanding error messages</A><DD>`
		61	`<DL>`
		62	`<DT><A HREF="#left-errors"><B>12.1</B> - Left recursion elimination`
		63	`errors</A><DD>`
		64	`<DT><A HREF="#first-errors"><B>12.2</B> - First set computation errors</A><DD>`
		65	`<DT><A HREF="#factor-errors"><B>12.3</B> - Factoring errors</A><DD>`
		66	`<DT><A HREF="#check-errors"><B>12.4</B> - Checking errors</A><DD>`
		67	`</DL>`
		68	`</DL>`
		69
		70	`<HR>`
		71	`<H2><A NAME="intro">1. Introduction</A></H2>`
		72	`<P>`
		73	`This document describes how to use the <CODE>sid</CODE> parser generator.`
		74	`It was written for <CODE>sid</CODE> version 1.9. The main features`
		75	`of each version of <CODE>sid</CODE> are listed below:`
		76	`<UL>`
		77
		78	`<LI><CODE>sid</CODE> 1.0 - this was the first version of <CODE>sid</CODE>`
		79	`to support attribute grammars. The output language was C.`
		80	`<LI><CODE>sid</CODE> 1.1 - this was a bug fix version of <CODE>sid</CODE>`
		81	`1.0.`
		82	`<LI><CODE>sid</CODE> 1.2 - this version of <CODE>sid</CODE> added`
		83	`predicates, exception handling, improved inlining options, and a`
		84	`grammar test pseudo-language.`
		85	`<LI><CODE>sid</CODE> 1.3 - this version of <CODE>sid</CODE> added`
		86	`anonymous rules, and a better syntax for the C specific information.`
		87
		88	`<LI><CODE>sid</CODE> 1.4 - this was a bug fix version of <CODE>sid</CODE>`
		89	`1.3.`
		90	`<LI><CODE>sid</CODE> 1.5 - this was a bug fix version of <CODE>sid</CODE>`
		91	`1.4. The command line option syntax changed in this release as well.`
		92
		93	`<LI><CODE>sid</CODE> 1.6 - this version of <CODE>sid</CODE> changed`
		94	`the input syntax, added scoped rules, and removed basics (replacing`
		95	`them with terminal symbols). It also added a stricter ISO C language.`
		96
		97	`<LI><CODE>sid</CODE> 1.7 - The syntax of the actions file changed`
		98	`slightly in this version. The error messages and recovery from`
		99	`syntax errors were also improved in this version. This version`
		100	`also added explicit call by reference support (rather than the`
		101	`inconsistent function call semantics of earlier versions). The`
		102	`command line options changed in this version, to support language`
		103	`specific options, and the <CODE>strict-ansi-c</CODE> language was`
		104	`dropped. Non-local variables were also added.`
		105	`<LI><CODE>sid</CODE> 1.8 - Initialisers were added for non-local`
		106	`variables. Assignment was added.`
		107	`<LI><CODE>sid</CODE> 1.9 - this was a bug fix version of <CODE>sid</CODE>`
		108	`1.8.`
		109	`</UL>`
		110	`</P>`
		111	`<P>`
		112	`<CODE>sid</CODE> turns specifications of languages into programs that`
		113	`recognise those languages. One of the aims of <CODE>sid</CODE> was`
		114	`to separate the specification of the language to be recognised from`
		115	`the language that the recogniser program is written in. For this`
		116	`reason, input to <CODE>sid</CODE> is split into two components: output`
		117	`language independent information, and output language dependent information.`
		118	`</P>`
		119	`<P>`
		120	`At present, <CODE>sid</CODE> will only output programs in C (either`
		121	`ISO or pre-ISO), but it is designed so that adding new output languages`
		122	`should be fairly simple. There is one other pseudo-language: the`
		123	`test language. This is used for testing grammars and the transforms,`
		124	`but will not output a parser.`
		125	`</P>`
		126
		127	`<HR>`
		128	`<H2><A NAME="grammar">2. Grammars</A></H2>`
		129
		130	`<H3><A NAME="parse">2.1. Parsing</A></H3>`
		131	`<P>`
		132	`There are two phases in traditional language recognition that are`
		133	`relevant to <CODE>sid</CODE>: the first is lexical analysis (breaking`
		134	`the input up into terminal symbols); the second is syntax analysis`
		135	`or parsing (ensuring that the terminal symbols in the input are in`
		136	`the correct order).`
		137	`</P>`
		138	`<P>`
		139	`<CODE>sid</CODE> currently does very little to help with lexical analysis.`
		140	`It is possible to use <CODE>sid</CODE> to produce the lexical analyser,`
		141	`but <CODE>sid</CODE> provides no real support for it at present.`
		142	`For now, the programmer is expected to write the lexical analyser,`
		143	`or use another tool to produce it.`
		144	`</P>`
		145	`<P>`
		146	`The lexical analyser should break the input up into a series of terminals.`
		147	`Each of these terminals is allocated a number. In`
		148	`<CODE>sid</CODE>, these numbers range from zero to the maximum terminal`
		149	`number (specified in the grammar description). The terminals may`
		150	`also have data associated with them (e.g. the value of a number),`
		151	`known as the attributes of the terminal, or the results of the terminal.`
		152	`</P>`
		153	`<P>`
		154	`<CODE>sid</CODE> generates the parser. The parser is a program that`
		155	`reads the sequence of terminals from the lexical analyser, and ensures`
		156	`that they form a valid input in the language being recognised. If`
		157	`the input is not valid, then the parser will fail (<CODE>sid</CODE>`
		158	`provides mechanisms to allow the parser to recover from errors).`
		159	`</P>`
		160
		161	`<H3><A NAME="context-free">2.2. Context free grammars</A></H3>`
		162	`<P>`
		163	`This section provides a brief introduction to grammars. It is not`
		164	`intended to be an in-depth guide to grammars, more an introduction`
		165	`which defines some terminology.`
		166	`</P>`
		167	`<P>`
		168	`<CODE>sid</CODE> works with a subset of what are known as context`
		169	`free grammars. A context free grammar consists of a set of input`
		170	`symbols (known as terminals), a set of rules (descriptions of what`
		171	`are legal forms in the language, also known as non-terminals), and`
		172	`an entry point (the rule from which the grammar starts).`
		173	`</P>`
		174	`<P>`
		175	`A rule is defined as a series of alternatives (throughout this document`
		176	`the definition of a rule is called a production - this may conflict`
		177	`with some other uses of the term). Each alternative consists of a`
		178	`sequence of items. An item can be a terminal or a rule. As an example`
		179	`(using the <CODE>sid</CODE> notation, which now looks unlike the conventional`
		180	`syntax for grammars), a comma separated list of integers could be`
		181	`specified as:`
		182	`<PRE>`
		183	`list-of-integers = {`
		184	`integer ;`
		185	`\|\|`
		186	`integer ;`
		187	`comma ;`
		188	`list-of-integers ;`
		189	`} ;`
		190	`</PRE>`
		191	`This production contains two alternatives: the first matches the terminal`
		192	`<CODE>integer</CODE>; the second matches the sequence of terminals`
		193	`<CODE>integer</CODE> followed by <CODE>comma</CODE>, followed by another`
		194	`list of integers.`
		195	`</P>`
		196	`<P>`
		197	`There is much documentation available on context free grammars (and`
		198	`other types of formal grammar). The reader is advised to find an`
		199	`alternative source for more information.`
		200	`</P>`
		201
		202	`<H3><A NAME="sid-grammar">2.3. <CODE>sid</CODE> grammars</A></H3>`
		203	`<P>`
		204	`<CODE>sid</CODE> grammars are based upon a subset of context free`
		205	`grammars, known as LL(1) grammars. The main property of such grammars`
		206	`is that the parser can always tell which alternative it is going to`
		207	`parse next by looking at the current terminal symbol. <CODE>sid</CODE>`
		208	`does a number of transforms to turn grammars that are not in this`
		209	`form into it (although it cannot turn all possible grammars into this`
		210	`form). It also provides facilities that allow the user to alter the`
		211	`control flow of the parser.`
		212	`</P>`
		213	`<P>`
		214	`<CODE>sid</CODE> makes the following changes to the context free grammars`
		215	`described above:`
		216	`<OL>`
		217	`<LI>There may be more than one entry point to the grammar.`
		218	`<LI>As well as being a rule or a terminal, an item may be an action,`
		219	`a predicate or an identity. An action is just a user supplied function.`
		220	`A predicate is a cross between a basic and an action (it is a user`
		221	`defined function but it may alter the control flow). An identity`
		222	`is like an assignment in a conventional programming language.`
		223	`<LI>Rules may take parameters and return results (as may actions;`
		224	`terminals may only return results). These may be passed between items`
		225	`using names.`
		226	`<LI>Each rule can have an exception handler associated with it. Exception`
		227	`handlers are used for error recovery when the input being parsed does`
		228	`not match the grammar.`
		229	`<LI>Rules may be anonymous.`
		230	`<LI>Rules may have non-local names associated with them. These names`
		231	`are in scope for that rule and any rules that are defined inside it.`
		232	`The value of each non-local name is saved on entry to the rule, and`
		233	`restored when the rule is exited.`
		234	`</OL>`
		235	`</P>`
		236
		237	`<HR>`
		238	`<H2><A NAME="overview">3. Overview</A></H2>`
		239	`<P>`
		240	`<CODE>sid</CODE> takes the input grammar and applies several transformations`
		241	`to it, before it produces the parser. These transformations allow`
		242	`the language descriptions to be written in a slightly more natural`
		243	`form than would otherwise be necessary.`
		244	`</P>`
		245
		246	`<H3><A NAME="left-recursion">3.1. Left recursion elimination</A></H3>`
		247	`<P>`
		248	`The first transformation is to eliminate any left recursive productions,`
		249	`replacing them with right recursive ones. This transforms a set of`
		250	`rules of the form:`
		251	`<PRE>`
		252	`Ai = Aj Bji, Ci`
		253	`</PRE>`
		254	`into:`
		255	`<PRE>`
		256	`Ai = Cj Xji`
		257	`Xji = Bjk Xki, Yji`
		258	`</PRE>`
		259	`where <CODE>Yji</CODE> is the identity function if <CODE>i</CODE>`
		260	`equals`
		261	`<CODE>j</CODE>, and an error otherwise. In order for this transformation`
		262	`to work, the following conditions must hold:`
		263	`<OL>`
		264
		265	`<LI>The parameter type for all <CODE>Ai</CODE> must be the same.`
		266	`<LI>The result type for all <CODE>Ai</CODE> must be the same.`
		267	`<LI>The exception handlers for all <CODE>Ai</CODE> must be the same.`
		268
		269	`<LI>The parameters to each left recursive call to some <CODE>Aj</CODE>`
		270	`must be exactly the formal parameters to the calling rule <CODE>Ai</CODE>.`
		271
		272	`<LI>Any non-local name referenced by any rule must be in scope for`
		273	`all rules.`
		274	`<LI>A rule may not define non-local variables unless it is the only`
		275	`entry point into the cycle (i.e. there is only one named rule in the`
		276	`cycle).`
		277	`</OL>`
		278	`<CODE>sid</CODE> will report an error if these conditions are not`
		279	`met.`
		280	`</P>`
		281
		282	`<H3><A NAME="factoring">3.2. Factoring</A></H3>`
		283	`<P>`
		284	`The second major transformation is to factor the grammar. It is possible`
		285	`that this phase could go on for ever, so there is a limit to the number`
		286	`of rules that can be generated. This limit may be changed (see the`
		287	`<A HREF="#invoke">invocation section</A>). In practice it is rare`
		288	`for this to happen.`
		289	`</P>`
		290	`<P>`
		291	`The factoring phase tries to increase the number of language specifications`
		292	`that <CODE>sid</CODE> can produce a parser for, by factoring out common`
		293	`initial items in alternatives, e.g.:`
		294	`<PRE>`
		295	`X = {`
		296	`a ; b ;`
		297	`\|\| a ; c ;`
		298	`} ;`
		299	`</PRE>`
		300	`would be transformed into something like:`
		301	`<PRE>`
		302	`X = {`
		303	`a ; X1 ;`
		304	`} ;`
		305
		306	`X1 = {`
		307	`b ;`
		308	`\|\| c ;`
		309	`} ;`
		310	`</PRE>`
		311	`It will also expand calls to rules at the beginning of alternatives`
		312	`if this is necessary to allow the parser to be produced (this is the`
		313	`phase that the limit is needed for). The expansions are done in the`
		314	`following cases:`
		315	`<OL>`
		316
		317	`<LI>If the rule is see-through (i.e. there is an expansion of the`
		318	`rule that does not contain any terminals or predicates) and its first`
		319	`set contains terminals in the first set of the rest of the alternative.`
		320
		321	`<LI>If the rule has a predicate in its first set.`
		322	`<LI>If the rule has terminals in its first set that are also in the`
		323	`first set of another alternative that does not begin with the same`
		324	`rule.`
		325	`</OL>`
		326	`If the rule being expanded contains an exception handler, and it is`
		327	`not identical to the exception handler in the enclosing rule, then`
		328	`an error will occur. Similarly if the rule being expanded defines`
		329	`any non-local variables then an error will occur.`
		330	`</P>`
		331
		332	`<H3><A NAME="optimise">3.3. Optimisations</A></H3>`
		333	`<P>`
		334	`After these two transformations, <CODE>sid</CODE> performs some optimisations`
		335	`on the grammar, such as reducing multiple copies of identical rules,`
		336	`removing tail recursion, inlining all basic rules, inlining single`
		337	`alternative rules, inling rules used only once, and inlining everything`
		338	`that can be inlined. All of the inlining is optional.`
		339	`</P>`
		340	`<P>`
		341	`After these optimisations, <CODE>sid</CODE> checks the language description`
		342	`to ensure that it is possible to produce a parser for it, and if so`
		343	`it outputs the parser.`
		344	`</P>`
		345
		346	`<HR>`
		347	`<H2><A NAME="invoke">4. Invocation</A></H2>`
		348	`<P>`
		349	`<CODE>sid</CODE> should be invoked in the following manner:`
		350	`<PRE>`
		351	`sid <I>options input-files output-files</I>`
		352	`</PRE>`
		353	`The options are described later. The input files should be a number`
		354	`of input file names dependent upon the output language. The first`
		355	`input file is the <CODE>sid</CODE> grammar file. In the case of either`
		356	`C dialects there should be one other input file that provides C specific`
		357	`information to <CODE>sid</CODE>. The number of output files is also`
		358	`language specific. At present, two output files should be specified`
		359	`for the C languages. The first should be a <CODE>.c</CODE> file into`
		360	`which the parser is written; the second should be a <CODE>.h</CODE>`
		361	`file into which the terminal definitions and external function declarations`
		362	`are written.`
		363	`</P>`
		364	`<P>`
		365	`The options list should consist of zero or more of the following options.`
		366	`There are short forms for each of these options as well; see the`
		367	`<CODE>sid</CODE> manual page for more information on invocation.`
		368	`<DL>`
		369
		370	`<DT><B>--dump-file <I>FILE</I></B>`
		371	`<DD>`
		372	`This option causes intermediate dumps of the grammar to be written`
		373	`to the named file. The format of the dump files is similar to the`
		374	`format of the grammar specification, with the following exceptions:`
		375	`<OL>`
		376
		377	`<LI>Predicates are written with the predicate result replaced by the`
		378	`predicate identifier (this will always be zero), and the result is`
		379	`followed by a <CODE>?</CODE> to indicate that it was a predicate.`
		380	`As an example, the predicate:`
		381	`<PRE>`
		382	`( b, ? ) = <pred> ( a )`
		383	`</PRE>`
		384	`would be printed out as:`
		385	`<PRE>`
		386	`( b : Type1T, 0 : Type2T ) ? = <pred> ( a : Type3T )`
		387	`</PRE>`
		388
		389	`<LI>Items that are considered to be inlinable are prefixed by a`
		390	`<CODE>+</CODE>. Items that are tail calls which will be eliminated`
		391	`are prefixed by a <CODE>*</CODE>.`
		392	`<LI>Nested rules are written at the outer level, with names of the`
		393	`form <CODE>outer-rule::....::inner-rule</CODE>.`
		394	`<LI>Types are provided on call parameter and result tuples.`
		395	`<LI>Inline rules are given a generated name, and are written out as`
		396	`a call to the generated rule (and a definition elsewhere).`
		397	`</OL>`
		398	`<P>`
		399
		400	`<DT><B>--factor-limit <I>LIMIT</I></B>`
		401	`<DD>`
		402	`This option limits the number of rules that can be created during`
		403	`the factorisation process. It is probably best not to change this.`
		404	`<P>`
		405
		406	`<DT><B>--help</B>`
		407	`<DD>`
		408	`This option writes a summary of the command line options to the standard`
		409	`error stream.`
		410	`<P>`
		411
		412	`<DT><B>--inline <I>INLINES</I></B>`
		413	`<DD>`
		414	`This option controls what inlining will be done in the output parser.`
		415	`The inlines argument should be a comma separated list of the following`
		416	`words:`
		417	`<P>`
		418	`<DL>`
		419
		420	`<DT><CODE>SINGLES</CODE>`
		421	`<DD>`
		422	`This causes single alternative rules to be inlined. This inlining`
		423	`is no longer performed as a modification to the grammar (it was in`
		424	`version 1.0).`
		425	`<DT><CODE>BASICS</CODE>`
		426	`<DD>`
		427	`This causes rules that contain only basics (and no exception handlers`
		428	`or empty alternatives) to be inlined. The restriction on exception`
		429	`handlers and empty alternatives is rather arbitrary, and may be changed`
		430	`later.`
		431	`<DT><CODE>TAIL</CODE>`
		432	`<DD>`
		433	`This causes tail recursive calls to be inlined. Without this, tail`
		434	`recursion elimination will not be performed.`
		435	`<DT><CODE>OTHER</CODE>`
		436	`<DD>`
		437	`This causes other calls to be inlined wherever possible. Unless the`
		438	`<CODE>MULTI</CODE> inlining is also specified, this will be done only`
		439	`for productions that are called once.`
		440	`<DT><CODE>MULTI</CODE>`
		441	`<DD>`
		442	`This causes calls to be inlined, even if the rule being called is`
		443	`called more than once. Turning this inlining on implies <CODE>OTHER</CODE>.`
		444	`Similarly turning off <CODE>OTHER</CODE> inlining will turn off`
		445	`<CODE>MULTI</CODE> inlining. For grammars of any size, this is probably`
		446	`best avoided; if used the generated parser may be huge (e.g. a C grammar`
		447	`has produced a file that was several hundred MB in size).`
		448	`<DT><CODE>ALL</CODE>`
		449	`<DD>`
		450	`This turns on all inlining.`
		451	`</DL>`
		452	`</P>`
		453	`<P>`
		454	`In addition, prefixing a word with <CODE>NO</CODE> turns off that`
		455	`inlining phase. The words may be given in any case. They are evaluated`
		456	`in the order given, so:`
		457	`<PRE>`
		458	`--inline noall,singles`
		459	`</PRE>`
		460	`would turn on single alternative rule inlining only, whilst:`
		461	`<PRE>`
		462	`--inline singles,noall`
		463	`</PRE>`
		464	`would turn off all inlining. The default is as if <CODE>sid</CODE>`
		465	`were invoked with the option:`
		466	`<PRE>`
		467	`--inline noall,basics,tail`
		468	`</PRE>`
		469	`</P>`
		470
		471	`<DT><A NAME="lang"><B>--language <I>LANGUAGE</I></B></A>`
		472	`<DD>`
		473	`This option specifies the output language. Currently this should`
		474	`be one of <CODE>ansi-c</CODE>, <CODE>pre-ansi-c</CODE> and <CODE>test</CODE>.`
		475	`The default is <CODE>ansi-c</CODE>.`
		476	`<P>`
		477	`The <CODE>ansi-c</CODE> and <CODE>pre-ansi-c</CODE> languages are`
		478	`basically the same. The only difference is that <CODE>ansi-c</CODE>`
		479	`initially uses function prototypes, and <CODE>pre-ansi-c</CODE> doesn't.`
		480	`The C language specific options are:`
		481	`</P>`
		482	`<P>`
		483	`<DL>`
		484
		485	`<DT><CODE>prototypes, proto, no-prototypes, no-proto</CODE>`
		486	`<DD>`
		487	`These enable or disable the use of function prototypes. By default`
		488	`this is enabled for <CODE>ansi-c</CODE> and disabled for <CODE>pre-ansi-c</CODE>.`
		489	`<P>`
		490
		491	`<DT><CODE>numeric-ids, numeric, no-numeric-ids, no-numeric</CODE>`
		492	`<DD>`
		493	`These enable or disable the use of numeric identifiers. Numeric identifiers`
		494	`replace the identifier name with a number, which is mainly of use`
		495	`in stopping identifier names getting too long. The disadvantage is`
		496	`that the code becomes less readable, and more difficult to debug.`
		497	`Numeric identifiers are not used by default.`
		498	`<P>`
		499
		500	`<DT><A NAME="casts"><CODE>casts, cast, no-casts, no-cast</CODE></A>`
		501	`<DD>`
		502	`These enable or disable casting of action and assignment operator`
		503	`immutable parameters. If enabled, a parameter is cast to its own`
		504	`type when it is substituted into the action. This will cause some`
		505	`compilers to complain about attempts to modify the parameter (which`
		506	`can help pick out attempts at mutating parameters that should not`
		507	`be mutated). The disadvantage is that not all compilers will reject`
		508	`attempts at mutation, and that ISO C doesn't allow casting to structure`
		509	`and union types, which means that some code may be illegal. Parameter`
		510	`casting is disabled by default.`
		511	`<P>`
		512
		513	`<DT><CODE>unreachable-macros, unreachable-macro, unreachable-comments,`
		514	`unreachable-comment</CODE>`
		515	`<DD>`
		516	`These choose whether unreachable code is marked by a macro or a comment.`
		517	`The default is to mark unreachable code with a comment`
		518	`<CODE>/UNREACHED/</CODE>, however a macro <CODE>UNREACHED ;</CODE>`
		519	`may be used instead, if desired.`
		520	`</DL>`
		521	`</P>`
		522	`<P>`
		523	`The <CODE>test</CODE> language only takes one input file, and produces`
		524	`no output file. It may be used to check that a grammar is valid.`
		525	`In conjunction with the dump file, it may be used to check the transformations`
		526	`that would be applied to the grammar. There are no language specific`
		527	`options for the <CODE>test</CODE> language.`
		528	`</P>`
		529
		530	`<DT><B>--show-errors</B>`
		531	`<DD>`
		532	`This option writes a copy of the current error messages to the standard`
		533	`output. See the manual entry for more details about changing the`
		534	`error message content.`
		535	`<P>`
		536
		537	`<DT><B>--switch <I>OPTION</I></B>`
		538	`<DD>`
		539	`This passes through <I>OPTION</I> as a language specific option.`
		540	`The valid options are described <A HREF="#lang">above</A>.`
		541	`<P>`
		542
		543	`<DT><B>--tab-width <I>NUMBER</I></B>`
		544	`<DD>`
		545	`This option specifies the number of spaces that a tab occupies. It`
		546	`defaults to 8. It is only used when indenting output.`
		547	`<P>`
		548
		549	`<DT><B>--version</B>`
		550	`<DD>`
		551	`This option causes the version number and supported languages to be`
		552	`written to the standard error stream.`
		553	`</DL>`
		554
		555	`<HR>`
		556	`<H2><A NAME="grammar-file">5. The <CODE>sid</CODE> grammar file</A></H2>`
		557	`<P>`
		558	`The <CODE>sid</CODE> grammar file should always be the first input`
		559	`file specified on the command line. It provides an output language`
		560	`independent description of the language to be recognised. The file`
		561	`is split up into a number of different sections, each of which begins`
		562	`with a section header. All sections must be included, although it`
		563	`is possible to leave most of them empty. The following sections exist`
		564	`at present: type declaration, terminal declaration, rule definition,`
		565	`and grammar entry points. They must appear in that order. The sections`
		566	`are detailed below, after the lexical conventions.`
		567	`</P>`
		568
		569	`<H3><A NAME="grammar-lex">5.1. Lexical conventions</A></H3>`
		570	`<P>`
		571	`Almost all non-printable characters (but definitely space, tab, newline`
		572	`and form-feed) are considered to be white space, and are ignored other`
		573	`than to separate other tokens. In addition, two comment forms are`
		574	`recognised: the C comment syntax, where comments begin with`
		575	`<CODE>/</CODE>, and end with <CODE>/</CODE>, although unlike C these`
		576	`comments nest properly; and the C++ line comment syntax, where comments`
		577	`begin with <CODE>//</CODE>, and are terminated at the end of the line.`
		578	`Comments are treated in the same way as white space characters.`
		579	`</P>`
		580	`<P>`
		581	`Section headers are enclosed in percent characters, and are case insensitive.`
		582	`The following section headers are currently recognised:`
		583	`<PRE>`
		584	`%types%`
		585	`%terminals%`
		586	`%productions%`
		587	`%entry%`
		588	`</PRE>`
		589	`</P>`
		590	`<P>`
		591	`Identifiers must begin with a letter, an underscore or a hyphen.`
		592	`This may be followed by zero or more letters, digits, underscores`
		593	`and hyphens. Identifiers are case sensitive. The following are all`
		594	`legal identifiers:`
		595	`<PRE>`
		596	`expression mult-expr plus_expr expr-1`
		597	`</PRE>`
		598	`Identifiers are split into two namespaces: local names have one space;`
		599	`types, actions, rules, non-local names and terminals share the other`
		600	`namespace, so it is not possible to have an identifier that is a type`
		601	`as well as being a rule for example.`
		602	`</P>`
		603	`<P>`
		604	`The following symbols are also used:`
		605	`<PRE>`
		606	`& ; = [ : ] ! ,`
		607	`\|\| $ ? { } ( ) <`
		608	`> -> :: ##`
		609	`</PRE>`
		610	`All other characters are illegal.`
		611	`</P>`
		612
		613	`<H3><A NAME="type-decl">5.2. The type declaration section</A></H3>`
		614	`<P>`
		615	`The first section is the type declaration section. This begins with`
		616	`the types section header, followed by the names of all of the types`
		617	`used by the grammar. Each type name should be terminated by a semicolon.`
		618	`An example of the type declaration section follows:`
		619	`<PRE>`
		620	`%types%`
		621	`NumberT ;`
		622	`StringT ;`
		623	`</PRE>`
		624	`This declares two types: <CODE>NumberT</CODE> and <CODE>StringT</CODE>.`
		625	`There is no requirement for the type names to resemble names in the`
		626	`target language (in fact this should be avoided, as it is possible`
		627	`to use many different target languages). All types used in the grammar`
		628	`must be declared here. Similarly, all types declared here must be`
		629	`used in the grammar.`
		630	`</P>`
		631
		632	`<H3><A NAME="term-decl">5.3. The terminal declaration section</A></H3>`
		633	`<P>`
		634	`After the type declaration section comes the terminal declaration`
		635	`section. This section declares the terminals that will be used by`
		636	`the grammar. A terminal is a recogniser for a symbol in the input`
		637	`alphabet of the language to be recognised. It is possible to declare`
		638	`terminals that are not used by the grammar.`
		639	`</P>`
		640	`<P>`
		641	`The section begins with its section header, followed by the declarations`
		642	`of the terminals. Each terminal declaration begins with the name`
		643	`of the terminal being defined, followed by its type, and terminated`
		644	`by a semicolon. If the terminal is not used in the grammar, the declaration`
		645	`should be preceded by a <CODE>!</CODE> symbol.`
		646	`</P>`
		647	`<P>`
		648	`A type (for terminals, rules and actions) is written as a colon, followed`
		649	`by the parameter tuple, followed by the <CODE>-></CODE>`
		650	`symbol, followed by the result tuple. If the type is zero-tuple to`
		651	`zero-tuple, then the type may be omitted. A tuple consists of a comma`
		652	`separated sequence of name-type pairs (with the name and type being`
		653	`separated by a colon), surrounded by parentheses. For parameter tuples,`
		654	`the type may be suffixed by a <CODE>&</CODE> symbol, indicating`
		655	`that call by reference should be used (the default is call by copy).`
		656	`For declarations, the names should be omitted. For terminals, the`
		657	`parameter type must be the zero-tuple.`
		658	`</P>`
		659	`<P>`
		660	`The simplest type of terminal declaration is as follows:`
		661	`<PRE>`
		662	`terminal1 ;`
		663	`</PRE>`
		664	`This means the same as:`
		665	`<PRE>`
		666	`terminal1 : () -> () ;`
		667	`</PRE>`
		668	`An example of a more complex terminal declaration is:`
		669	`<PRE>`
		670	`terminal2 : () -> ( :StringT ) ;`
		671	`</PRE>`
		672	`If these terminals were not to be used in the grammar, they should`
		673	`be declared as:`
		674	`<PRE>`
		675	`!terminal1 ;`
		676	`!terminal2 : () -> ( :StringT ) ;`
		677	`</PRE>`
		678	`</P>`
		679
		680	`<H3><A NAME="rule-defn">5.4. The rule definition section</A></H3>`
		681	`<P>`
		682	`The rule definition section follows the terminal declaration section.`
		683	`It begins with the section header, followed by the definitions and`
		684	`declarations of all of the rules used in the grammar, and the declarations`
		685	`of all of the actions used in the grammar.`
		686	`</P>`
		687	`<P>`
		688	`Rule declarations look identical to terminal declarations, e.g.:`
		689	`<PRE>`
		690	`rule1 : ( :NumberT ) -> ( :NumberListT ) ;`
		691	`rule2 ;`
		692	`</PRE>`
		693	`Action declarations are similar, although the names are surrounded`
		694	`by angle brackets, e.g.:`
		695	`<PRE>`
		696	`<action1> : ( :StringT & ) -> () ;`
		697	`<action2> ;`
		698	`</PRE>`
		699	`A declaration (or a definition) may be prefixed with the <CODE>::</CODE>`
		700	`symbol. This symbol forces the definition into the outermost scope.`
		701	`Scopes are described later on.`
		702	`</P>`
		703	`<P>`
		704	`A rule definition (called a production) looks something like the following:`
		705	`<PRE>`
		706	`add-expr : () -> ( v : NumberT ) = {`
		707	`v1 = mul-expr ;`
		708	`plus ;`
		709	`v2 = expr ;`
		710	`v = <add> ( v1, v2 ) ;`
		711	`\|\|`
		712	`v1 = mul-expr ;`
		713	`minus ;`
		714	`v2 = expr ;`
		715	`v = <subtract> ( v1, v2 ) ;`
		716	`##`
		717	`v = <ex> ;`
		718	`} ;`
		719	`</PRE>`
		720	`The production begins with the rule name, followed by the parameter`
		721	`and result names and types (in this case, the rule name is`
		722	`<CODE>add-expr</CODE>, there are no parameters, and there is one result`
		723	`name <CODE>v</CODE> of type <CODE>NumberT</CODE>). This may optionally`
		724	`be followed by local declarations (there are none here - they are`
		725	`described later).`
		726	`</P>`
		727	`<P>`
		728	`The left hand side of the rule is followed by the <CODE>=</CODE> symbol,`
		729	`a list of alternatives surrounded by curly braces, and is terminated`
		730	`by a semicolon. The alternatives are separated by the <CODE>\|\|</CODE>`
		731	`symbol, and the last alternative may be separated from its predecessor`
		732	`(there must be one) using the <CODE>##</CODE> symbol; if this is the`
		733	`case, then this alternative is the exception handler for the production`
		734	`(otherwise it has no exception handler).`
		735	`</P>`
		736	`<P>`
		737	`An alternative may match the empty string, using the symbol <CODE>$</CODE>`
		738	`and the terminator symbol <CODE>;</CODE>, i.e.:`
		739	`<PRE>`
		740	`$ ;`
		741	`</PRE>`
		742	`unless it is an exception handler alternative (in which case it must`
		743	`do something), or a sequence of items. The empty string is only valid`
		744	`if the production has no results. If you want to match an empty string`
		745	`in a production that has a result, it is necessary to use an action`
		746	`(or identity) to provide a result.`
		747	`</P>`
		748	`<P>`
		749	`An item is an identity, or a call to a (possibly anonymous) rule,`
		750	`a terminal, an action or a predicate. An identity looks like an assignment`
		751	`in conventional programming languages:`
		752	`<PRE>`
		753	`( a, b, c ) = ( d, e, f ) ;`
		754	`</PRE>`
		755	`Each tuple must contain the same number of names. In the case of`
		756	`a one-tuple, the parentheses may be dropped, e.g.:`
		757	`<PRE>`
		758	`a = d ;`
		759	`</PRE>`
		760	`Note that this is a binding operation, not an assignment. Each name`
		761	`on the left hand side must be a new name. It is illegal to redeclare`
		762	`a name that is already in scope. It is possible to assign to a name`
		763	`which is already in scope by prefixing that name with the <CODE>&</CODE>`
		764	`symbol, e.g.:`
		765	`<PRE>`
		766	`( a, &b, c ) = ( d, e, f ) ;`
		767	`</PRE>`
		768	`would assign to the name <CODE>b</CODE>, which must have been previously`
		769	`defined (it may be a parameter; if it is a call by reference parameter,`
		770	`then the change will propagate outside to the calling rule).`
		771	`</P>`
		772	`<P>`
		773	`It is also possible to use the <CODE>!</CODE> symbol in the result`
		774	`tuple to ignore results, e.g.:`
		775	`<PRE>`
		776	`( a, !, b, ! ) = ( c, d, e, f ) ;`
		777	`</PRE>`
		778	`This is not particularly useful in an identity, but may be more useful`
		779	`in a call to a rule, terminal or action. A call to a terminal or`
		780	`rule looks like a call to a function in a conventional programming`
		781	`language, e.g.:`
		782	`<PRE>`
		783	`( a, b ) = rule1 ( c, d ) ;`
		784	`( e, f ) = terminal1 () ;`
		785	`</PRE>`
		786	`Calls to actions have the same form, except that action names are`
		787	`surrounded by angle brackets, e.g.:`
		788	`<PRE>`
		789	`( g, h, i ) = <action1> ( a, e ) ;`
		790	`</PRE>`
		791	`In addition, one of the names in the result tuple of the call to the`
		792	`action may be the predicate result symbol <CODE>?</CODE>, in which`
		793	`case the action is used as a predicate (more details on predicates`
		794	`are given later).`
		795	`</P>`
		796	`<P>`
		797	`When calling a rule, terminal or action, it is necessary to have declared`
		798	`it (or in the case of a rule declared or defined it) before the call.`
		799	`</P>`
		800	`<P>`
		801	`If a rule or action is being invoked, and it takes one or more call`
		802	`by reference parameters, then the corresponding arguments should be`
		803	`prefixed by the <CODE>&</CODE> symbol, e.g.:`
		804	`<PRE>`
		805	`length = <string-length> ( &string ) ;`
		806	`</PRE>`
		807	`If the rule, terminal or action has the zero-tuple as a result, then`
		808	`only the right hand side of the definition is required, e.g.:`
		809	`<PRE>`
		810	`rule2 ( a, b ) ;`
		811	`terminal2 () ;`
		812	`<action2> ( c, d ) ;`
		813	`</PRE>`
		814	`If the rule, terminal or action has the zero-tuple as a parameter,`
		815	`then the parameter tuple may be omitted, e.g.:`
		816	`<PRE>`
		817	`( a, b ) = rule3 ;`
		818	`terminal3 ;`
		819	`c = <action3> ;`
		820	`</PRE>`
		821	`In older versions of <CODE>sid</CODE>, it used to be possible to have`
		822	`ambiguous items, e.g.:`
		823	`<PRE>`
		824	`a = b ;`
		825	`</PRE>`
		826	`where <CODE>b</CODE> was both a rule and a name. As local names may`
		827	`not shadow non-local and global names, then this is no longer a problem.`
		828	`</P>`
		829	`<P>`
		830	`In each case, the data flow through the rule is indicated using names.`
		831	`In the previous example of a production, both alternatives have the`
		832	`same data flow: the call to <CODE>mul-expr</CODE> returns one value,`
		833	`which is stored in the name <CODE>v1</CODE>, and the call to <CODE>expr</CODE>`
		834	`returns one value, which is stored in the name <CODE>v2</CODE>. Both`
		835	`of these names are passed to the action (<CODE>add</CODE> in the first`
		836	`alternative, <CODE>subtract</CODE> in the second), which returns a`
		837	`value into the name <CODE>v</CODE> (presumably the sum or difference`
		838	`of the previous two values). The name <CODE>v</CODE> is the name`
		839	`of the result, so this will be returned as the result of the rule.`
		840	`The exception handler (which is invoked if something fails) calls`
		841	`the action`
		842	`<CODE>ex</CODE> to produce the result <CODE>v</CODE>.`
		843	`</P>`
		844	`<P>`
		845	`It is necessary that the types of the data flow through the production`
		846	`are correct. It is also necessary to define all of the result names`
		847	`for the production in each of the alternatives in that production.`
		848	`</P>`
		849	`<P>`
		850	`An anonymous rule is written in the same way as the body of a normal`
		851	`rule, e.g.:`
		852	`<PRE>`
		853	`list : () -> ( l : ListT ) = {`
		854	`n = number ;`
		855	`/* START ANONYMOUS RULE */ {`
		856	`? = <at-eof> ;`
		857	`l = <make-list> ( n ) ;`
		858	`\|\|`
		859	`comma ;`
		860	`l1 = list ;`
		861	`l = <cons> ( n, l1 ) ;`
		862	`##`
		863	`l = <com-or-eof> ( n ) ;`
		864	`} ; /* END ANONYMOUS RULE */`
		865	`} ;`
		866	`</PRE>`
		867	`An anonymous rule is always inlined.`
		868	`</P>`
		869	`<P>`
		870	`The rule name may be followed by a sequence of definitions and declarations`
		871	`surrounded by the <CODE>[</CODE> and <CODE>]</CODE> symbols (which`
		872	`are followed by the rest of the rule). In this case, the definitions`
		873	`are local to the rule, e.g.:`
		874	`<PRE>`
		875	`x-list [`
		876	`x = {`
		877	`terminal1 ;`
		878	`terminal2 ;`
		879	`\|\|`
		880	`terminal3 ;`
		881	`} ;`
		882	`] = {`
		883	`x ;`
		884	`\|\|`
		885	`x ;`
		886	`x-list ;`
		887	`} ;`
		888	`</PRE>`
		889	`In this case, the rule <CODE>x</CODE> would be local to the rule`
		890	`<CODE>x-list</CODE> and no other rule would be able to use it. In`
		891	`error messages, the name of the rule <CODE>x</CODE> would be written`
		892	`as`
		893	`<CODE>x-list::x</CODE>. All declarations and definitions that occur`
		894	`inside of the <CODE>[</CODE> and <CODE>]</CODE> symbols have the scope`
		895	`of the enclosing rule, unless they are preceded by the <CODE>::</CODE>`
		896	`symbol, in which case they have global scope. This is particularly`
		897	`necessary for actions, as actions can only be defined with global`
		898	`scope.`
		899	`</P>`
		900	`<P>`
		901	`It is also possible to define non-local names. These are declared`
		902	`as an identifier (the name), followed by the <CODE>:</CODE> symbol,`
		903	`followed by another identifier (its type), in a similar manner to`
		904	`an entry in a type tuple. Non-local names are not allowed at the`
		905	`outermost level (so they may not be prefixed with the <CODE>::</CODE>`
		906	`symbol either). When a non-local name is defined, it is in scope`
		907	`for all of the rules in its scope that are defined after it is, plus`
		908	`its defining rule.`
		909	`</P>`
		910	`<P>`
		911	`Non-local names have their values saved on entry to their defining`
		912	`rule, and the value will be restored when the rule is exited. This`
		913	`includes exiting the rule tail recursively or because of an exception`
		914	`(if the rule has an exception handler, the non-local name will not`
		915	`be restored until the exception handler has exited). In almost all`
		916	`other respects non-local names are the same as local names. An example`
		917	`follows:`
		918	`<PRE>`
		919	`rule1 [`
		920	`name1 : Type1T ;`
		921	`rule1_1 = {`
		922	`<action1> ( name1 ) ;`
		923	`rule2 ( name1 ) ;`
		924	`} ;`
		925	`] = {`
		926	`<action2> ( &name1 ) ;`
		927	`rule1_1 ;`
		928	`} ;`
		929	`</PRE>`
		930	`It is possible to associate an initialiser with a non-local name,`
		931	`by following the type name with a <CODE>=</CODE> symbol and the action`
		932	`name in angle brackets, e.g.:`
		933	`<PRE>`
		934	`rule1 [`
		935	`name1 : Type1T = <action1> ;`
		936	`] = {`
		937	`// ....`
		938	`} ;`
		939	`</PRE>`
		940	`In this case the action is called at the start of the rule to initialise`
		941	`the non-local name. The action should return an object of the same`
		942	`type as the non-local name. Normally, the action takes no parameters,`
		943	`however it may take one parameter of the same type as the non-local`
		944	`name (or a reference to that type), in which case it will be given`
		945	`the saved value of the non-local name as an argument (this may be`
		946	`used to build a stack automatically for example).`
		947	`</P>`
		948
		949	`<H3><A NAME="entry-point">5.5. The grammar entry points section</A></H3>`
		950	`<P>`
		951	`The final section lists the entry points to the grammar. It begins`
		952	`with the section header, followed by a comma separated list of rule`
		953	`names, terminated by a semicolon, e.g.:`
		954	`<PRE>`
		955	`%entry% expr ;`
		956	`</PRE>`
		957	`If you are going to use a rule as an entry point into the grammar`
		958	`(i.e. you wish to call the associated function), you must list it`
		959	`in the entry points list. If not, the function may not exist.`
		960	`</P>`
		961
		962	`<HR>`
		963	`<H2><A NAME="info-file">6. The C information file</A></H2>`
		964	`<P>`
		965	`The grammar specification itself is not sufficient to produce a parser.`
		966	`There also needs to be output language specific information to allow`
		967	`the parser to interface with the program it is to be part of. In`
		968	`the case of the C output routines, <CODE>sid</CODE> needs to know`
		969	`the following information:`
		970	`<OL>`
		971
		972	`<LI>What code should precede and succeed the automatically generated`
		973	`code.`
		974	`<LI>How to map the <CODE>sid</CODE> identifiers into C identifiers.`
		975
		976	`<LI>How to do assignments for each type.`
		977	`<LI>How to get the current terminal number.`
		978	`<LI>How to get the result of the current terminal.`
		979	`<LI>How to advance the lexical analyser, to get the next terminal.`
		980
		981	`<LI>What the actions are defined as, and how to pass parameters to`
		982	`them.`
		983	`<LI>How to save and restore the current terminal when an error occurs.`
		984
		985	`</OL>`
		986	`Eventually almost all of this should be user suppliable. At the moment,`
		987	`some of the information is supplied by the user in the C information`
		988	`file, some through macros, and some is built in. <CODE>sid</CODE>`
		989	`currently gets the information as follows:`
		990	`</P>`
		991	`<P>`
		992	`1. The C information file has a header and a trailer section, which`
		993	`define code that precedes and succeeds the code that <CODE>sid</CODE>`
		994	`generates.`
		995	`</P>`
		996	`<P>`
		997	`2. The C information file has a section that allows the user to specify`
		998	`mappings from <CODE>sid</CODE> identifiers into C identifiers. These`
		999	`are only valid for the following types of identifiers: types, functions`
		1000	`(implementations of rules) and terminals. For other identifier types`
		1001	`(or when no mapping is supplied), <CODE>sid</CODE> uses some default`
		1002	`rules:`
		1003	`<P>`
		1004	`Firstly, <CODE>sid</CODE> applies a transform to the <CODE>sid</CODE>`
		1005	`identifier name, to make it a legal C identifier. At present this`
		1006	`maps`
		1007	`<CODE>_</CODE> to <CODE>__</CODE>, <CODE>-</CODE> to <CODE>_H</CODE>`
		1008	`and`
		1009	`<CODE>:</CODE> (this occurs in scoped names) to <CODE>_C</CODE>.`
		1010	`All other characters are left unmodified. This transform cannot be`
		1011	`changed.`
		1012	`</P>`
		1013	`<P>`
		1014	`<CODE>sid</CODE> also puts a prefix before all identifiers, to try`
		1015	`to prevent clashes (and also to make automatically generated - i.e.`
		1016	`numeric - identifiers legal). These prefixes can be redefined for`
		1017	`each class of identifier, in the C information file. They should`
		1018	`be chosen so as not to clash with any other identifiers (i.e. no other`
		1019	`identifiers should begin with that prefix).`
		1020	`</P>`
		1021	`<P>`
		1022	`By default, the following prefixes are used:`
		1023	`</P>`
		1024	`<P>`
		1025	`<DL>`
		1026
		1027	`<DT><CODE>ZT</CODE>`
		1028	`<DD>`
		1029	`This prefix is used before type identifiers, for the type name itself.`
		1030
		1031	`<DT><CODE>ZR</CODE>`
		1032	`<DD>`
		1033	`This prefix is used before rule identifiers, for the rule's implementation`
		1034	`function.`
		1035	`<DT><CODE>ZL</CODE>`
		1036	`<DD>`
		1037	`This prefix is used before rule identifiers, for the rule's label`
		1038	`when tail recursion is being eliminated. In this case, a number is`
		1039	`added to the suffix before the identifier name, to prevent clashes`
		1040	`when a rule is inlined twice in the same function. It is also used`
		1041	`before other labels that are automatically generated and are just`
		1042	`numbered.`
		1043	`<DT><CODE>ZI</CODE>`
		1044	`<DD>`
		1045	`This prefix is used before name identifiers used as parameters to`
		1046	`functions, or in normal usage. It is also used by non-local names`
		1047	`(which doesn't cause a problem as they always occur scoped, and local`
		1048	`names never do).`
		1049	`<DT><CODE>ZO</CODE>`
		1050	`<DD>`
		1051	`This prefix is used before name identifiers used as results of functions.`
		1052	`Results are passed as reference parameters, and this suffix is used`
		1053	`then. Another identifier with the <CODE>ZI</CODE> prefix is also`
		1054	`used within the function, and the type reference assignment operator`
		1055	`is used at the end of the function to assign the results to the reference`
		1056	`parameters.`
		1057	`<DT><CODE>ZB</CODE>`
		1058	`<DD>`
		1059	`This prefix is used before the terminal symbol names in the generated`
		1060	`header file.`
		1061	`</DL>`
		1062	`</P>`
		1063
		1064	`<P>`
		1065	`3. Normally, <CODE>sid</CODE> will do assignments using the C assignment`
		1066	`operator. Sometimes, this will not do the right thing, so the user`
		1067	`can define a set of assignment operations for any type in the C information`
		1068	`file.`
		1069	`</P>`
		1070	`<P>`
		1071	`4. <CODE>sid</CODE> expects the <CODE>CURRENT_TERMINAL</CODE> macro`
		1072	`to be defined, and its definition should return an integer that is`
		1073	`the current terminal. The macro should be an expression, not a statement.`
		1074	`</P>`
		1075	`<P>`
		1076	`5. It is necessary to define how to extract the results of all terminals`
		1077	`in the C information file (if a terminal doesn't return anything,`
		1078	`then it is not necessary to define how to get the result).`
		1079	`</P>`
		1080	`<P>`
		1081	`6. <CODE>sid</CODE> expects the <CODE>ADVANCE_LEXER</CODE> macro`
		1082	`to be defined, and its definition should cause the lexical analyser`
		1083	`to read a new token. The new terminal number should be accessible`
		1084	`through the`
		1085	`<CODE>CURRENT_TERMINAL</CODE> macro. On entry into the parser`
		1086	`<CODE>CURRENT_TERMINAL</CODE> should give the first terminal number.`
		1087	`</P>`
		1088	`<P>`
		1089	`7. All actions, and their parameter and result names are defined`
		1090	`in the C information file.`
		1091	`</P>`
		1092	`<P>`
		1093	`8. <CODE>sid</CODE> expects the <CODE>SAVE_LEXER</CODE> and`
		1094	`<CODE>RESTORE_LEXER</CODE> macros to be defined. The first is called`
		1095	`with an argument which is the error terminal value. The macro should`
		1096	`save the current terminal's value, and set the current terminal to`
		1097	`be the error terminal value. The second macro is called without arguments,`
		1098	`and should restore the saved value of the current terminal.`
		1099	`<CODE>SAVE_LEXER</CODE> will never be called more than once without`
		1100	`a call to <CODE>RESTORE_LEXER</CODE>, so the save stack only needs`
		1101	`one element.`
		1102	`</P>`
		1103	`<P>`
		1104	`The remainder of this section describes the layout of the C information`
		1105	`file. The lexical conventions are described first, followed by a`
		1106	`description of the sections in the order in which they should appear.`
		1107	`Unlike the <CODE>sid</CODE> grammar file, not all sections are mandatory.`
		1108	`</P>`
		1109
		1110	`<H3><A NAME="info-lex">6.1. Lexical conventions</A></H3>`
		1111	`<P>`
		1112	`The lexical conventions of the C information file are very similar`
		1113	`to those of the <CODE>sid</CODE> grammar file. There is a second`
		1114	`class of identifier: the C identifier, which is a subset of the valid`
		1115	`<CODE>sid</CODE> identifiers; there is also the C code block.`
		1116	`</P>`
		1117	`<P>`
		1118	`A C code block begins with <CODE>@{</CODE> and is terminated by`
		1119	`<CODE>@}</CODE>. The code block consists of all of the characters`
		1120	`between the start and end of the code block, subject to substitutions.`
		1121	`All substitutions begin with the <CODE>@</CODE> character. The following`
		1122	`substitutions are recognised:`
		1123	`</P>`
		1124	`<P>`
		1125	`<DL>`
		1126
		1127	`<DT><CODE>@@</CODE>`
		1128	`<DD>`
		1129	`This substitutes the <CODE>@</CODE> character itself.`
		1130	`<P>`
		1131
		1132	`<DT><CODE>@:</CODE><I>label</I>`
		1133	`<DD>`
		1134	`This form marks a label, which will be substituted for in the output`
		1135	`code. This is necessary, because an action may be inlined into the`
		1136	`same function more than once. If this happens, then without doing`
		1137	`label substitution there would be two identical labels in the same`
		1138	`scope. With label substitution, this problem is avoided. In general,`
		1139	`all references to a label within an action should be prefixed with`
		1140	`<CODE>@:</CODE>. This substitution may not be used in header and`
		1141	`trailer code blocks.`
		1142	`<P>`
		1143
		1144	`<DT><CODE>@</CODE><I>identifier</I>`
		1145	`<DD>`
		1146	`This form marks a parameter or result identifier substitution. If`
		1147	`parameter and result identifiers are not prefixed with an <CODE>@</CODE>`
		1148	`character, then they will not be substituted. It is an error if the`
		1149	`identifier is not a parameter or a result. Header and trailer code`
		1150	`blocks have no parameters or results, so it is always an error to`
		1151	`use identifier substitution in them. It is an error if any of the`
		1152	`result identifiers are not substituted at least once.`
		1153	`<P>`
		1154	`Result identifiers may be assigned to using this form of identifier`
		1155	`substitution, but parameter identifiers may not be (nor may there`
		1156	`address be taken - they are immutable). To try to prevent this, parameters`
		1157	`that are substituted may be cast to their own type, which makes them`
		1158	`unmodifiable in ISO C (see the notes on the`
		1159	`<A HREF="#casts"><CODE>casts</CODE> language specific option</A>).`
		1160	`</P>`
		1161
		1162	`<DT><CODE>@&</CODE><I>identifier</I>`
		1163	`<DD>`
		1164	`This form marks a parameter identifier whose address is to be substituted,`
		1165	`but whose contents will not be modified. The effects of modifying`
		1166	`the identifier are undefined. It is an error to use this in parameter`
		1167	`assignment operator definitions.`
		1168	`<P>`
		1169
		1170	`<DT><CODE>@=</CODE><I>identifier</I>`
		1171	`<DD>`
		1172	`This form marks a parameter identifier that will be modified. For`
		1173	`this to be useful, the parameter should be a call by reference parameter,`
		1174	`so that the effect of the modification will be propagated. This substitution`
		1175	`is only valid in actions (assignment operators are not allowed to`
		1176	`modify their parameters).`
		1177	`<P>`
		1178
		1179	`<DT><CODE>@!</CODE>`
		1180	`<DD>`
		1181	`This form marks an exception raise. In the generated code, a jump`
		1182	`to the current exception handler will be substituted. This substitution`
		1183	`is only valid in actions.`
		1184	`<P>`
		1185
		1186	`<DT><CODE>@.</CODE>`
		1187	`<DD>`
		1188	`This form marks an attempt to access the current terminal. This substitution`
		1189	`is only valid in actions.`
		1190	`<P>`
		1191
		1192	`<DT><CODE>@></CODE>`
		1193	`<DD>`
		1194	`This form marks an attempt to advance the lexical analyser. This`
		1195	`substitution is only valid in actions.`
		1196	`</DL>`
		1197	`</P>`
		1198	`<P>`
		1199	`All other forms are illegal. Note that in the case of labels and`
		1200	`identifiers, no white space is allowed between the <CODE>@:</CODE>,`
		1201	`<CODE>@</CODE>, <CODE>@&</CODE> or <CODE>@=</CODE> and the identifier`
		1202	`name. An example of a code block is:`
		1203	`<PRE>`
		1204	`@{`
		1205	`/* A code block */`
		1206	`{`
		1207	`int i ;`
		1208	`if ( @param ) {`
		1209	`@! ;`
		1210	`}`
		1211	`@result = 0 ;`
		1212	`for ( i = 0 ; i < 100 ; i++ ) {`
		1213	`printf ( "{%d}\n", i ) ;`
		1214	`@result += i ;`
		1215	`}`
		1216	`@=param += @result ;`
		1217	`if ( @. == TOKEN_SEMI ) {`
		1218	`@> ;`
		1219	`}`
		1220	`}`
		1221	`@}`
		1222	`</PRE>`
		1223	`</P>`
		1224
		1225	`<H3><A NAME="prefixes">6.2. The prefixes section</A></H3>`
		1226	`<P>`
		1227	`The first section in the C information file is the prefix definition`
		1228	`section. This section is optional. It begins with the section header,`
		1229	`followed by a list of prefix definitions. A prefix definition begins`
		1230	`with the prefix name, followed by a <CODE>=</CODE> symbol, followed`
		1231	`by a C identifier that is the new prefix, and terminated by a semicolon.`
		1232	`The following example shows all of the prefix names, and their default`
		1233	`values:`
		1234	`<PRE>`
		1235	`%prefixes%`
		1236	`type = ZT ;`
		1237	`function = ZR ;`
		1238	`label = ZL ;`
		1239	`input = ZI ;`
		1240	`output = ZO ;`
		1241	`terminal = ZB ;`
		1242	`</PRE>`
		1243	`</P>`
		1244
		1245	`<H3><A NAME="maps">6.3. The maps section</A></H3>`
		1246	`<P>`
		1247	`The section that follows the prefixes section is the maps section.`
		1248	`This section is also optional. It begins with its section header,`
		1249	`followed by a list of identifier mappings. An identifier mapping`
		1250	`begins with a`
		1251	`<CODE>sid</CODE> identifier (either a type, a rule or a terminal),`
		1252	`followed by the <CODE>-></CODE> symbol, followed by the C identifier`
		1253	`it is to be mapped to, and terminated by a semicolon. An example`
		1254	`follows:`
		1255	`<PRE>`
		1256	`%maps%`
		1257	`NumberT -> unsigned ;`
		1258	`calculator -> calculator ;`
		1259	`</PRE>`
		1260	`Note that it is not possible to map type identifiers to be arbitrary`
		1261	`C types. It will be necessary to <CODE>typedef</CODE> or macro define`
		1262	`the type name in the C file.`
		1263	`</P>`
		1264	`<P>`
		1265	`It is recommended that all types, terminals and entry point rules`
		1266	`have their names mapped in this section, although this is not necessary.`
		1267	`If the names are not mapped, they will have funny names in the rest`
		1268	`of the program.`
		1269	`</P>`
		1270
		1271	`<H3><A NAME="header">6.4. The header section</A></H3>`
		1272	`<P>`
		1273	`After the maps section comes the header section. This begins with`
		1274	`the section header, followed by a code block, followed by a comma,`
		1275	`followed by a second code block, and terminated with a semicolon.`
		1276	`The first code block will be inserted at the beginning of the generated`
		1277	`parser file; the second code block will be inserted at the start of`
		1278	`the generated header file. An example is:`
		1279	`<PRE>`
		1280	`%header% @{`
		1281	`#include "lexer.h"`
		1282
		1283	`LexerT token ;`
		1284
		1285	`#define CURRENT_TERMINAL token.t`
		1286	`#define ADVANCE_LEXER next_token ()`
		1287
		1288	`extern void terminal_error () ;`
		1289	`extern void syntax_error () ;`
		1290	`@}, @{`
		1291	`@} ;`
		1292	`</PRE>`
		1293	`</P>`
		1294
		1295	`<H3><A NAME="assign">6.5. The assignments section</A></H3>`
		1296	`<P>`
		1297	`The assignments section follows the header section. This section`
		1298	`is optional. Normally, assignment between two identifiers will be`
		1299	`done using the C assignment operator. In some cases this will not`
		1300	`do the correct thing, and it is necessary to do the assignment differently.`
		1301	`All types for which this applies should have an entry in the assignments`
		1302	`section. The section begins with its header, followed by definitions`
		1303	`for each type that needs its own assignment operator. Each definition`
		1304	`should have one parameter, and one result. The action's name should`
		1305	`be the name of the type. An example follows:`
		1306	`<PRE>`
		1307	`%assignments%`
		1308
		1309	`ListT : ( l1 ) -> ( l2 ) = @{`
		1310	`if ( @l2.head = @l1.head ) {`
		1311	`@l2.tail = @l1.tail ;`
		1312	`} else {`
		1313	`@l2.tail = &( @l2.head ) ;`
		1314	`}`
		1315	`@} ;`
		1316	`</PRE>`
		1317	`If a type has an assignment operator defined, it must also have a`
		1318	`parameter assignment operator type defined and a result assignment`
		1319	`operator defined (more precisely it must have either no assignment`
		1320	`operations defined, or all three assignment operations defined).`
		1321	`</P>`
		1322
		1323	`<H3><A NAME="param-assign">6.6. The parameter assignments section</A></H3>`
		1324	`<P>`
		1325	`The parameter assignments section is very similar to the assignments`
		1326	`section (which it follows), and is also optional. If a type has an`
		1327	`assignment section entry, it must have a parameter assignment entry`
		1328	`as well.`
		1329	`</P>`
		1330	`<P>`
		1331	`The parameter assignment operator is used in function calls to ensure`
		1332	`that the object is copied correctly: if no parameter assignment operator`
		1333	`is provided for a type, the standard C call by copy mechanism is used;`
		1334	`if a parameter assignment operator is provided for a type, then the`
		1335	`address of the object is passed by the calling function, and the called`
		1336	`function declares a local of the same type, and uses the parameter`
		1337	`assignment operator to copy the object (this should be remembered`
		1338	`when passing parameters to entry points that have arguments of a type`
		1339	`that has a parameter assignment operator defined).`
		1340	`</P>`
		1341	`<P>`
		1342	`The difference between the parameter assignment operator and the assignment`
		1343	`operator is that the parameter identifier to the parameter assignment`
		1344	`operator is a pointer to the object being manipulated, rather than`
		1345	`the object itself. An example reference assignment section is:`
		1346	`<PRE>`
		1347	`%parameter-assignments%`
		1348
		1349	`ListT : ( l1 ) -> ( l2 ) = @{`
		1350	`if ( @l2.head = @l1->head ) {`
		1351	`@l2.tail = @l1->tail ;`
		1352	`} else {`
		1353	`@l2.tail = &( @l2.head ) ;`
		1354	`}`
		1355	`@} ;`
		1356	`</PRE>`
		1357	`</P>`
		1358
		1359	`<H3><A NAME="result-assign">6.7. The result assignments section</A></H3>`
		1360	`<P>`
		1361	`The result assignments section is very similar to the assignments`
		1362	`section and the parameter assignments section (which it follows),`
		1363	`and is also optional. If a type has an assignment section entry,`
		1364	`it must also have a result assignment entry. The only difference`
		1365	`between the two is that the result identifier of the result assignment`
		1366	`operation is a pointer to the object being manipulated, rather than`
		1367	`the object itself. Result assignments are only used when the results`
		1368	`of a rule are assigned back through the reference parameters passed`
		1369	`into the function. An example result assignment section is:`
		1370	`<PRE>`
		1371	`%result-assignments%`
		1372
		1373	`ListT : ( l1 ) -> ( l2 ) = @{`
		1374	`if ( @l2->head = @l1.head ) {`
		1375	`@l2->tail = @l1.tail ;`
		1376	`} else {`
		1377	`@l2->tail = &( @l2->head ) ;`
		1378	`}`
		1379	`@} ;`
		1380	`</PRE>`
		1381	`</P>`
		1382
		1383	`<H3><A NAME="term-result">6.8. The terminal result extraction section</A></H3>`
		1384	`<P>`
		1385	`The terminal result extraction section follows the reference assignment`
		1386	`section. It defines how to extract the results from terminals. The`
		1387	`section begins with its section header, followed by the terminal extraction`
		1388	`definitions.`
		1389	`</P>`
		1390	`<P>`
		1391	`There must be a definition for every terminal in the grammar that`
		1392	`returns a result. It is an error to include a definition for a terminal`
		1393	`that doesn't return a result. The result of the definition should`
		1394	`be the same as the result of the terminal. An example of the terminal`
		1395	`result extraction section follows:`
		1396	`<PRE>`
		1397	`%terminals%`
		1398
		1399	`number : () -> ( n ) = @{`
		1400	`@n = token.u.number ;`
		1401	`@} ;`
		1402
		1403	`identifier : () -> ( i ) = @{`
		1404	`@i = token.u.identifier ;`
		1405	`@} ;`
		1406
		1407	`string : () -> ( s ) = @{`
		1408	`@s = token.u.string ;`
		1409	`@} ;`
		1410	`</PRE>`
		1411	`</P>`
		1412
		1413	`<H3><A NAME="action-defn">6.9. The action definition section</A></H3>`
		1414	`<P>`
		1415	`The action definition section follows the terminal result extractor`
		1416	`definition section. The format is similar to the previous sections:`
		1417	`the section header followed by definitions for all of the actions.`
		1418	`An action definition has the following form:`
		1419	`<PRE>`
		1420	`<action-name> : ( parameters ) -> ( results ) = code-block ;`
		1421	`</PRE>`
		1422	`This is similar to the form of all previous definitions, except that`
		1423	`the name is surrounded in angle brackets. What follows is also true`
		1424	`of the other definitions as well (unless they state otherwise).`
		1425	`</P>`
		1426	`<P>`
		1427	`The <CODE>action-name</CODE> is a <CODE>sid</CODE> identifier that`
		1428	`is the name of the action being defined; <CODE>parameters</CODE> is`
		1429	`a comma separated list of C identifiers that will be the names of`
		1430	`the parameters passed to the action, and <CODE>results</CODE> is a`
		1431	`comma separated list of C identifiers that will be the names of the`
		1432	`result parameters passed to the action. The <CODE>code-block</CODE>`
		1433	`is the C code that defines the action. It is expected that this will`
		1434	`assign a valid result to each of the result identifier names.`
		1435	`</P>`
		1436	`<P>`
		1437	`The parameter and result tuples have the same form as in the language`
		1438	`independent file, except that the types are optional. Like the language`
		1439	`independent file, if the type of an action is zero-tuple to zero-tuple,`
		1440	`then the type can be omitted, e.g.:`
		1441	`<PRE>`
		1442	`<action> = @{ /* .... */ @} ;`
		1443	`</PRE>`
		1444	`An example action definition section is:`
		1445	`<PRE>`
		1446	`%actions%`
		1447
		1448	`<add> : ( v1, v2 ) -> ( v3 ) = @{`
		1449	`@v3 = @v1 + @v2 ;`
		1450	`@} ;`
		1451
		1452	`<subtract> : ( v1 : NumberT, v2 : NumberT ) -> ( v3 : NumberT ) = @{`
		1453	`@v3 = @v1 - @v2 ;`
		1454	`@} ;`
		1455
		1456	`<multiply> : ( v1 : NumberT, v2 ) -> ( v3 ) = @{`
		1457	`@v3 = @v1 * @v2 ;`
		1458	`@} ;`
		1459
		1460	`<divide> : ( v1, v2 ) -> ( v3 : NumberT ) = @{`
		1461	`@v3 = @v1 / @v2 ;`
		1462	`@} ;`
		1463
		1464	`<print> : ( v ) -> () = @{`
		1465	`printf ( "%u\n", @v ) ;`
		1466	`@} ;`
		1467
		1468	`<error> = @{`
		1469	`fprintf ( stderr, "ERROR\n" ) ;`
		1470	`exit ( EXIT_FAILURE ) ;`
		1471	`@} ;`
		1472	`</PRE>`
		1473	`Do not define static variables in action definitions; if you do, you`
		1474	`will get unexpected results. If you wish to use static variables`
		1475	`in actions definitions, then define them in the header block.`
		1476	`</P>`
		1477
		1478	`<H3><A NAME="trailer">6.10. The trailer section</A></H3>`
		1479	`<P>`
		1480	`After the action definition section comes the trailer section. This`
		1481	`has the same form as the header section. An example is:`
		1482	`<PRE>`
		1483	`%trailer% @{`
		1484	`int main ()`
		1485	`{`
		1486	`next_token () ;`
		1487	`calculator ( NULL ) ;`
		1488	`return 0 ;`
		1489	`}`
		1490	`@}, @{`
		1491	`@} ;`
		1492	`</PRE>`
		1493	`The code blocks will be appended to the generated parser, and the`
		1494	`generated header file respectively.`
		1495	`</P>`
		1496
		1497	`<HR>`
		1498	`<H2><A NAME="predicate">7. Predicates</A></H2>`
		1499	`<P>`
		1500	`Predicates provide the user with a mechanism for altering the control`
		1501	`flow in a manner that terminals alone cannot do.`
		1502	`</P>`
		1503	`<P>`
		1504	`During the factorisation process, rules that begin with predicates`
		1505	`are expanded if necessary to ensure that predicates that may be used`
		1506	`to select which alternative to go down always begin the alternative,`
		1507	`e.g.:`
		1508	`<PRE>`
		1509	`rule1 = {`
		1510	`rule2 ;`
		1511	`/* .... */`
		1512	`\|\|`
		1513	`/* .... */`
		1514	`} ;`
		1515
		1516	`rule2 = {`
		1517	`? = <predicate> ;`
		1518	`/* .... */`
		1519	`\|\|`
		1520	`/* .... */`
		1521	`} ;`
		1522	`</PRE>`
		1523	`would be expanded into:`
		1524	`<PRE>`
		1525	`rule1 = {`
		1526	`? = predicate ;`
		1527	`/* .... */`
		1528	`/* .... */`
		1529	`\|\|`
		1530	`/* .... */`
		1531	`/* .... */`
		1532	`\|\|`
		1533	`/* .... */`
		1534	`} ;`
		1535	`</PRE>`
		1536	`Also, if a predicate is used to select which alternative to use, it`
		1537	`must be the first thing in the alternative, so the following would`
		1538	`not be allowed:`
		1539	`<PRE>`
		1540	`rule = {`
		1541	`<action> ;`
		1542	`? = <predicate> ;`
		1543	`/* .... */`
		1544	`\|\|`
		1545	`/* .... */`
		1546	`} ;`
		1547	`</PRE>`
		1548	`When predicates begin a rule, they are executed (in some arbitrary`
		1549	`order) until one of them returns true. The alternative that this`
		1550	`predicate begins is then selected. If no predicates return true,`
		1551	`then one of the remaining alternatives is selected based upon the`
		1552	`current terminal (or an error occurs).`
		1553	`</P>`
		1554	`<P>`
		1555	`It is important that predicates do not contain dependencies upon the`
		1556	`order of evaluation. In practice, predicates are likely to be simple,`
		1557	`so this shouldn't be a problem.`
		1558	`</P>`
		1559	`<P>`
		1560	`When predicates are used within an alternative, they behave like terminals.`
		1561	`If they evaluate to true, then parsing continues. If they evaluate`
		1562	`to false, then an exception is raised.`
		1563	`</P>`
		1564
		1565	`<HR>`
		1566	`<H2><A NAME="exception">8. Error handling</A></H2>`
		1567	`<P>`
		1568	`If the input given to the parser is valid, then the parser will not`
		1569	`need to produce any errors. Unfortunately this is not always the`
		1570	`case, so`
		1571	`<CODE>sid</CODE> provides a mechanism for handling errors.`
		1572	`</P>`
		1573	`<P>`
		1574	`When an error occurs, an exception is raised. This passes control`
		1575	`to the nearest enclosing exception handler. If there is no exception`
		1576	`handler at all, the entry point function will return with the current`
		1577	`terminal set to the error value.`
		1578	`</P>`
		1579	`<P>`
		1580	`An exception handler is just an alternative that is executed when`
		1581	`a terminal or predicate fails. This should obviate the need to rely`
		1582	`upon language specific mechanisms (such as <CODE>setjmp</CODE> and`
		1583	`<CODE>longjmp</CODE>) for error recovery.`
		1584	`</P>`
		1585
		1586	`<HR>`
		1587	`<H2><A NAME="call-by-ref">9. Call by reference</A></H2>`
		1588	`<P>`
		1589	`The default behaviour of <CODE>sid</CODE> is to do argument passing`
		1590	`using call by copy semantics, and to not allow mutation of parameters`
		1591	`of rules and actions (however inlined rules, and rules created during`
		1592	`factoring have call by reference parameters). However it is possible`
		1593	`to give rule and action parameters call by reference semantics, using`
		1594	`the`
		1595	`<CODE>&</CODE> symbol in the type specification (as described`
		1596	`earlier). It is also possible to mutate parameters of actions, using`
		1597	`the <CODE>@=</CODE> substitution in the action body (also described`
		1598	`earlier). It is important to do the correct substitutions in action`
		1599	`definitions, as <CODE>sid</CODE> uses this information to decide where`
		1600	`it can optimise the output code.`
		1601	`</P>`
		1602	`<P>`
		1603	`If a call by copy parameter is mutated, then <CODE>sid</CODE> will`
		1604	`introduce a new temporary variable and copy the parameter into it`
		1605	`- this temporary will then be mutated. Similar code will be output`
		1606	`for rules that have call by copy parameters that are mutated (e.g.`
		1607	`as a call by reference argument to an action that mutates its parameters).`
		1608	`</P>`
		1609
		1610	`<HR>`
		1611	`<H2><A NAME="call-entry">10. Calling entry points</A></H2>`
		1612	`<P>`
		1613	`When calling a function that implements an entry point rule, it should`
		1614	`be called with the rule's parameters as the first arguments, followed`
		1615	`by the addresses of the rule's results as the remaining arguments.`
		1616	`The parameters should have their addresses passed if they are of a`
		1617	`type that has a parameter assignment operator defined, or if the parameter`
		1618	`is a call by reference parameter.`
		1619	`</P>`
		1620	`<P>`
		1621	`For example, given the following rule:`
		1622	`<PRE>`
		1623	`rule1 : ( :Type1T, :Type2T, :Type3T & ) -> ( :Type4T ) ;`
		1624	`</PRE>`
		1625	`where <CODE>Type2T</CODE> has a parameter assignment operator defined,`
		1626	`and <CODE>rule1</CODE> is mapped to <CODE>rule1</CODE> (and the type`
		1627	`names are mapped to themselves), the call would be something like:`
		1628	`<PRE>`
		1629	`Type1T a = make_type1 () ;`
		1630	`Type2T b = make_type2 () ;`
		1631	`Type3T c = make_type3 () ;`
		1632	`Type4T d ;`
		1633
		1634	`rule1 ( a, b, &c, &d ) ;`
		1635	`</PRE>`
		1636	`</P>`
		1637
		1638	`<HR>`
		1639	`<H2><A NAME="glossary">11. Glossary</A></H2>`
		1640	`<P>`
		1641	`This section describes some of the terms used in the <CODE>sid</CODE>`
		1642	`documentation.`
		1643	`<DL>`
		1644
		1645	`<DT><B>Alternative</B>`
		1646	`<DD>`
		1647	`An alternative is a sequence of items.`
		1648	`<P>`
		1649
		1650	`<DT><B>Exception handler</B>`
		1651	`<DD>`
		1652	`An exception handler is a special type of alternative. Each rule`
		1653	`may have at most one exception handler. An exception handler is invoked`
		1654	`if the current terminal does not match any of the expected terminals,`
		1655	`if a predicate fails, or if an action raises an exception, within`
		1656	`the scope of the exception handler.`
		1657	`<P>`
		1658
		1659	`<DT><B>Expansion</B>`
		1660	`<DD>`
		1661	`This is the process of physically inlining a rule into another rule.`
		1662	`It is done during the factoring process to turn the grammar into a`
		1663	`form that a parser can be produced for. See the entry for factoring.`
		1664	`<P>`
		1665
		1666	`<DT><B>Factoring</B>`
		1667	`<DD>`
		1668	`This is one of the transforms that <CODE>sid</CODE> performs on the`
		1669	`grammar. See the <A HREF="#overview">overview section</A> for a description`
		1670	`of the factoring process.`
		1671	`<P>`
		1672
		1673	`<DT><B>First set</B>`
		1674	`<DD>`
		1675	`The first set of a rule (or alternative) is the set of terminals and`
		1676	`predicates that can start that rule (or alternative).`
		1677	`<P>`
		1678
		1679	`<DT><B>Follow set</B>`
		1680	`<DD>`
		1681	`The follow set of a rule is the set of terminals and predicates that`
		1682	`can follow the rule in any of its invocations.`
		1683	`<P>`
		1684
		1685	`<DT><B>Inlining</B>`
		1686	`<DD>`
		1687	`This is the process of outputting the code for parsing one rule within`
		1688	`the function that parses another rule. This is normally done as part`
		1689	`of the output process. Expansion is a form of inlining performed`
		1690	`during the factoring process, but the inlining is done by modifying`
		1691	`the grammar, rather than as part of the output phase.`
		1692	`<P>`
		1693
		1694	`<DT><B>Item</B>`
		1695	`<DD>`
		1696	`An item is the equivalent of a statement in a conventional programming`
		1697	`language. It can be an invocation of a rule, terminal, action or`
		1698	`predicate, or an identity operation (assignment).`
		1699	`<P>`
		1700
		1701	`<DT><B>Name</B>`
		1702	`<DD>`
		1703	`A name is an identifier that is used to pass information between rules`
		1704	`and actions. Local names are defined within a rule, and only exist`
		1705	`within the rule itself. Non-local names are defined in a rule's scoped`
		1706	`definitions section and exists in all of the rules in that scope.`
		1707	`Non-local rules are also saved and restored across calls to the rule`
		1708	`that defines them.`
		1709	`<P>`
		1710
		1711	`<DT><B>Recursion</B>`
		1712	`<DD>`
		1713	`Recursion is where a rule invokes itself. Direct recursion is where`
		1714	`the rule invokes itself from one of its own alternatives; indirect`
		1715	`recursion is where a rule invokes another rule (which invokes another`
		1716	`rule etc.) which eventually invokes the original rule.`
		1717	`</P>`
		1718	`<P>`
		1719	`Left recursion is a form of recursion where all of the recursive calls`
		1720	`occur as the first item in an alternative. It is not possible to`
		1721	`produce a parser for a grammar that contains left recursions, so`
		1722	`<CODE>sid</CODE> turns left recursions into right recursions. This`
		1723	`process is known as left recursion elimination.`
		1724	`</P>`
		1725	`<P>`
		1726	`Right recursion is a form of recursion where all of the recursive`
		1727	`calls occur as the last item in an alternative.`
		1728	`</P>`
		1729
		1730	`<DT><B>Production</B>`
		1731	`<DD>`
		1732	`See rule.`
		1733	`<P>`
		1734
		1735	`<DT><B>Rule</B>`
		1736	`<DD>`
		1737	`A rule is a sequence of alternatives. A rule may contain a special`
		1738	`alternative that is used as an exception handler. A rule is also`
		1739	`referred to as a production; this term is normally used when talking`
		1740	`about the definition of a rule.`
		1741	`<P>`
		1742
		1743	`<DT><B>See-through</B>`
		1744	`<DD>`
		1745	`A rule is said to be see-through if there is an expansion of the rule`
		1746	`that does not contain any terminals or predicates.`
		1747	`</DL>`
		1748	`</P>`
		1749
		1750	`<HR>`
		1751	`<H2><A NAME="errors">12. Understanding error messages</A></H2>`
		1752	`<P>`
		1753	`This section tries to explain what some of the error messages that`
		1754	`are reported by the <CODE>sid</CODE> transforms mean. It does not`
		1755	`contain descriptions of messages like "type 'some type' is unknown",`
		1756	`as these should be self-explanatory.`
		1757	`</P>`
		1758
		1759	`<H3><A NAME="left-errors">12.1. Left recursion elimination errors</A></H3>`
		1760
		1761	`<P>`
		1762	`<B>The parameter or result types of the left recursive calls in the`
		1763	`following productions do not match: <I>PRODUCTIONS</I></B>: This means`
		1764	`that there is a set of rules which call each other left recursively`
		1765	`(i.e. the first item in some of the alternatives in each rule is a`
		1766	`call to another rule in the set), and they do not all have the same`
		1767	`parameter and result types, e.g.:`
		1768	`<PRE>`
		1769	`rule1 : ( a : Type1T, b : Type1T, c : Type2T, d : Type2T ) -> () = {`
		1770	`rule2 ( a, b ) ;`
		1771	`\|\|`
		1772	`terminal1 ;`
		1773	`} ;`
		1774
		1775	`rule2 : ( a : Type1T, b : Type2T ) -> () = {`
		1776	`rule1 ( a, a, b, b ) ;`
		1777	`\|\|`
		1778	`terminal2 ;`
		1779	`} ;`
		1780	`</PRE>`
		1781	`</P>`
		1782
		1783	`<P>`
		1784	`<B>The exception handlers in the left recursion involving the following`
		1785	`productions do not match: <I>PRODUCTIONS</I></B>: This means that`
		1786	`there is a set of productions which call each other left recursively`
		1787	`(i.e. the first item in an alternative is a call to another production`
		1788	`in the set), and they do not all have the same exception handler,`
		1789	`e.g.:`
		1790	`<PRE>`
		1791	`rule1 = {`
		1792	`rule2 ;`
		1793	`\|\|`
		1794	`terminal1 ;`
		1795	`##`
		1796	`<action1&gt ;`
		1797	`} ;`
		1798
		1799	`rule2 = {`
		1800	`rule1 ;`
		1801	`\|\|`
		1802	`terminal2 ;`
		1803	`##`
		1804	`<action2&gt ;`
		1805	`} ;`
		1806	`</PRE>`
		1807	`It is quite likely that when using exception handlers, it may be necessary`
		1808	`to do the left recursion elimination manually to ensure that the exception`
		1809	`handlers occur at the correct place.`
		1810	`</P>`
		1811
		1812	`<P>`
		1813	`<B>The argument names of the left recursive calls in the following`
		1814	`productions do not match: <I>PRODUCTIONS</I></B>: This means that`
		1815	`there is a set of productions which call each other left recursively`
		1816	`(i.e. the first item in an alternative is a call to another production`
		1817	`in the set), and the arguments to one of the left recursive calls`
		1818	`are not the same as the parameters of the calling rule, e.g.:`
		1819	`<PRE>`
		1820	`rule1 : ( a : Type1T, b : Type1T ) -> () = {`
		1821	`rule1 ( b, a ) ;`
		1822	`\|\|`
		1823	`terminal1 ;`
		1824	`} ;`
		1825	`</PRE>`
		1826	`</P>`
		1827
		1828	`<P>`
		1829	`<B>A non-local name in the rule '<I>RULE</I>' is not in scope in the`
		1830	`rule '<I>RULE</I>' in the left recursive cycle involving the following`
		1831	`productions: <I>PRODUCTIONS</I></B>: This means that there is a set`
		1832	`of productions which call each other left recursively (i.e. the first`
		1833	`item in an alternative is a call to another production in the set),`
		1834	`and the first named rule uses a non-local name that is not in scope`
		1835	`in the second named rule, e.g.:`
		1836	`<PRE>`
		1837	`rule1 [`
		1838	`name1 : Type1T ;`
		1839	`rule1_1 [`
		1840	`name1_1 : Type1T ;`
		1841	`] = {`
		1842	`rule1 ;`
		1843	`<action1_1> ( name1_1 ) ;`
		1844	`\|\|`
		1845	`terminal1 ;`
		1846	`} ;`
		1847	`] = {`
		1848	`terminal2 ;`
		1849	`\|\|`
		1850	`rule1_1 ;`
		1851	`<action1> ( name1 ) ;`
		1852	`} ;`
		1853	`</PRE>`
		1854	`</P>`
		1855
		1856	`<P>`
		1857	`<B>The rule '<I>RULE</I>' declares non-local names in the left recursive`
		1858	`cycle with more than one entry point involving the following productions:`
		1859	`<I>PRODUCTIONS</I></B>: This means that there is a set of productions`
		1860	`which call each other left recursively (i.e. the first item in an`
		1861	`alternative is a call to another production in the set), and the named`
		1862	`rule defines non-local variables even though it is not the only entry`
		1863	`point to the cycle, e.g.:`
		1864	`<PRE>`
		1865	`rule1 [`
		1866	`name1 : Type1T ;`
		1867	`rule1_1 = {`
		1868	`<action1_1> ( name1 ) ;`
		1869	`} ;`
		1870	`] = {`
		1871	`terminal1 ;`
		1872	`rule1_1 ;`
		1873	`\|\|`
		1874	`rule2 ;`
		1875	`<action1> ( name1 ) ;`
		1876	`} ;`
		1877
		1878	`rule2 = {`
		1879	`rule1 ;`
		1880	`<action2&gt ;`
		1881	`\|\|`
		1882	`terminal2 ;`
		1883	`} ;`
		1884	`</PRE>`
		1885	`</P>`
		1886
		1887	`<P>`
		1888	`<B>No cycle termination for the left recursive set involving the following`
		1889	`rules: <I>RULES</I></B>: This means that there is a set of productions`
		1890	`which call each other left recursively (i.e. the first item in an`
		1891	`alternative is a call to another production in the set), and they`
		1892	`do not contain an alternative that begins with a non-left recursive`
		1893	`call, e.g.:`
		1894	`<PRE>`
		1895	`rule1 = {`
		1896	`rule2 ;`
		1897	`\|\|`
		1898	`rule3 ;`
		1899	`} ;`
		1900
		1901	`rule2 = {`
		1902	`rule1 ;`
		1903	`\|\|`
		1904	`rule3 ;`
		1905	`} ;`
		1906
		1907	`rule3 = {`
		1908	`rule1 ;`
		1909	`\|\|`
		1910	`rule2 ;`
		1911	`} ;`
		1912	`</PRE>`
		1913	`</P>`
		1914
		1915	`<H3><A NAME="first-errors">12.2. First set computation errors</A></H3>`
		1916
		1917	`<P>`
		1918	`<B>Cannot compute first set for <I>PRODUCTION</I></B>: This means`
		1919	`that <CODE>sid</CODE> cannot compute the set of terminals and predicates`
		1920	`that may start the production. This is normally because there is`
		1921	`a recursive call (or cycle) that contains no terminals, e.g.:`
		1922	`<PRE>`
		1923	`rule1 = {`
		1924	`<action1> ;`
		1925	`rule1 ;`
		1926	`\|\|`
		1927	`terminal1 ;`
		1928	`} ;`
		1929	`</PRE>`
		1930	`This is not removed by the left recursion elimination phase, as the`
		1931	`call is not the leftmost item in the alternative.`
		1932	`</P>`
		1933
		1934	`<P>`
		1935	`<B>Can see through to predicate '<I>PREDICATE</I>' in production`
		1936	`<I>PRODUCTION</I></B>: This means that there is a predicate that isn't`
		1937	`the first item in its alternative, but is preceded only by see-through`
		1938	`items, e.g.:`
		1939	`<PRE>`
		1940	`rule1 = {`
		1941	`<action1> ;`
		1942	`? = <predicate> ;`
		1943	`\|\|`
		1944	`terminal1 ;`
		1945	`} ;`
		1946	`</PRE>`
		1947	`</P>`
		1948
		1949	`<P>`
		1950	`<B>Can see through to predicates in rule '<I>RULE</I>' in production`
		1951	`<I>PRODUCTION</I></B>: This means that the first rule has at least`
		1952	`one predicate in its first set, and the second rule calls it in a`
		1953	`position where it is not the first item in the alternative and is`
		1954	`preceded only by see-through items, e.g.:`
		1955	`<PRE>`
		1956	`rule1 = {`
		1957	`? = <predicate> ;`
		1958	`\|\|`
		1959	`terminal1 ;`
		1960	`} ;`
		1961
		1962	`rule2 = {`
		1963	`<action> ;`
		1964	`rule1 ;`
		1965	`\|\|`
		1966	`terminal2 ;`
		1967	`} ;`
		1968	`</PRE>`
		1969	`</P>`
		1970
		1971	`<P>`
		1972	`<B>The rule '<I>RULE</I>' has all terminals in its first set and has`
		1973	`a redundant see-through alternative</B>: This means that the rule's`
		1974	`first set (the set of all terminals that can start the rule) includes`
		1975	`all possible input terminals, and the rule also has a see-through`
		1976	`alternative. The see-through alternative will never be used, as one`
		1977	`of the other alternatives will always be chosen.`
		1978	`</P>`
		1979
		1980	`<H3><A NAME="factor-errors">12.3. Factoring errors</A></H3>`
		1981
		1982	`<P>`
		1983	`<B>Too many productions (<I>NUMBER</I>) created during factorisation</B>:`
		1984	`This normally means that <CODE>sid</CODE> cannot factor the grammar.`
		1985	`You will need to rewrite the offending part. Unfortunately there`
		1986	`is no easy way to do this. Start by looking at the dump file for`
		1987	`a set of rules that seem to have been expanded a lot.`
		1988	`</P>`
		1989
		1990	`<P>`
		1991	`<B>The rule '<I>RULE</I>' cannot be expanded into '<I>RULE</I>' as`
		1992	`the exception handlers don't match</B>: When <CODE>sid</CODE> performs`
		1993	`factoring, it needs to expand calls to certain rules into the rules`
		1994	`that calls them (this is described in the`
		1995	`<A HREF="#overview">overview section</A>). If the called rule has`
		1996	`an exception handler and it is not the same as the exception handler`
		1997	`of the calling rule, then the expansion will fail.`
		1998	`</P>`
		1999
		2000	`<P>`
		2001	`<B>The rule '<I>RULE</I>' cannot be expanded into '<I>RULE</I>' as`
		2002	`it contains non-local name definitions</B>: When <CODE>sid</CODE>`
		2003	`performs factoring, it needs to expand calls to certain rules into`
		2004	`the rules that calls them (this is described in the`
		2005	`<A HREF="#overview">overview section</A>). If the called rule defines`
		2006	`any non-local names, then the expansion will fail.`
		2007	`</P>`
		2008
		2009	`<H3><A NAME="check-errors">12.4. Checking errors</A></H3>`
		2010
		2011	`<P>`
		2012	`<B>Collision of terminal(s) <I>TERMINALS</I> in rule '<I>RULE</I>'</B>:`
		2013	`This error means that more than one alternative in the named rule`
		2014	`begins with the named terminals, e.g.:`
		2015	`<PRE>`
		2016	`rule1 = {`
		2017	`<action1> ;`
		2018	`terminal1 ;`
		2019	`\|\|`
		2020	`terminal1 ;`
		2021	`} ;`
		2022	`</PRE>`
		2023	`Normally, the factoring process will remove the problem, but when`
		2024	`something like the above happens to stop the factoring occurring,`
		2025	`this error will be produced.`
		2026	`</P>`
		2027
		2028	`<P>`
		2029	`<B>Collision of predicate '<I>PREDICATE</I>' in rule '<I>RULE</I>'</B>:`
		2030	`This error occurs when more than one alternative in the named rule`
		2031	`begins with the named predicate, e.g.:`
		2032	`<PRE>`
		2033	`rule1 = {`
		2034	`( a, ? ) = <predicate> ;`
		2035	`<action1> ( a ) ;`
		2036	`\|\|`
		2037	`( ?, b ) = <predicate> ;`
		2038	`<action2> ( b ) ;`
		2039	`} ;`
		2040	`</PRE>`
		2041	`Again, it is normally the case that the factoring process will remove`
		2042	`this problem, but if the same predicate uses different predicate results`
		2043	`in different alternatives, this error will be produced.`
		2044	`</P>`
		2045
		2046	`<P>`
		2047	`<B>The terminal(s) <I>TERMINALS</I> can start rule '<I>RULE</I>' which`
		2048	`is see-through, and the same terminal(s) may appear in the following`
		2049	`situations: <I>ALTERNATIVES</I></B>: This means that there are one`
		2050	`or more terminals that can start the named rule (which is see-through),`
		2051	`and may also follow it, e.g.:`
		2052	`<PRE>`
		2053	`rule1 = {`
		2054	`terminal1 ;`
		2055	`\|\|`
		2056	`$ ;`
		2057	`} ;`
		2058
		2059	`rule2 = {`
		2060	`rule1 ;`
		2061	`terminal1 ;`
		2062	`\|\|`
		2063	`terminal2 ;`
		2064	`} ;`
		2065	`</PRE>`
		2066	`The alternatives listed are the alternatives which call the rule,`
		2067	`and contain (some of) the named terminals after the call. The call`
		2068	`is highlighted.`
		2069	`</P>`
		2070
		2071	`<P>`
		2072	`<B>The predicate(s) <I>PREDICATES</I> can start rule '<I>RULE</I>'`
		2073	`which is see-through and the same predicate(s) may appear in the following`
		2074	`situations: <I>ALTERNATIVES</I></B>: This means that there are one`
		2075	`or more predicates that can start the named rule (which is see-through),`
		2076	`and may also follow it, e.g.:`
		2077	`<PRE>`
		2078	`rule1 = {`
		2079	`? = <predicate> ;`
		2080	`\|\|`
		2081	`$ ;`
		2082	`} ;`
		2083
		2084	`rule2 = {`
		2085	`terminal1 ;`
		2086	`rule1 ;`
		2087	`? = <predicate> ;`
		2088	`\|\|`
		2089	`terminal2 ;`
		2090	`} ;`
		2091	`</PRE>`
		2092	`The alternatives listed are the alternatives which call the rule,`
		2093	`and contain (some of) the named predicates after the call. The call`
		2094	`is highlighted.`
		2095	`</P>`
		2096
		2097	`<P>`
		2098	`<B>The rule '<I>RULE</I>' contains more than one see-through alternative</B>:`
		2099	`This error occurs if the rule has more than one alternative that doesn't`
		2100	`need to read a terminal or a predicate, e.g.:`
		2101	`<PRE>`
		2102	`rule1 = {`
		2103	`<action1> ;`
		2104	`\|\|`
		2105	`<action2> ;`
		2106	`} ;`
		2107	`</PRE>`
		2108	`</P>`
		2109
		2110	`<HR>`
		2111	`<P><I>Part of the <A HREF="../index.html">TenDRA Web</A>.<BR>Crown`
		2112	`Copyright © 1998.</I></P>`
		2113	`</BODY>`
		2114	`</HTML>`

Subversion Repositories tendra.SVN

(root)/trunk/doc/utilities/sid.html @ 2 – Rev