WebSVN – tendra.SVN – Blame – /trunk/doc/user/guides/sid/sid.xml

Rev	Author	Line No.	Line
6	7u83	1	`<?xml version="1.0" standalone="no"?>`
		2	`<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"`
		3	`"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">`
		4
		5	`<!--`
		6	$Id$
		7	`-->`
		8
		9	`<book>`
		10	`<bookinfo>`
		11	`<title>The sid users' guide</title>`
		12
		13	`<corpauthor>The TenDRA Project</corpauthor>`
		14
		15	`<author>`
		16	`<firstname>Jeroen</firstname>`
		17
		18	`<surname>Ruigrok van der Werven</surname>`
		19	`</author>`
		20
		21	`<authorinitials>JRvdW</authorinitials>`
		22
		23	`<pubdate>2006</pubdate>`
		24
		25	`<copyright>`
		26	`<year>2006</year>`
		27
		28	`<holder>The TenDRA Project</holder>`
		29	`</copyright>`
		30
		31	`<copyright>`
		32	`<year>1998</year>`
		33
		34	`<holder>DERA</holder>`
		35	`</copyright>`
		36	`</bookinfo>`
		37
		38	`<chapter id="introduction">`
		39	`<title>Introduction</title>`
		40
		41	`<para>This document describes how to use the <code>sid</code> parser`
		42	`generator. It was written for <code>sid</code> version 1.9. The`
		43	`main features of each version of <code>sid</code> are listed`
		44	`below:</para>`
		45
		46	`<itemizedlist>`
		47	`<listitem>`
		48	`<para><code>sid</code> 1.0 - this was the first version of`
		49	`<code>sid</code> to support attribute grammars. The output`
		50	`language was C.</para>`
		51	`</listitem>`
		52
		53	`<listitem>`
		54	`<para><code>sid</code> 1.1 - this was a bug fix version of`
		55	`<code>sid</code> 1.0.</para>`
		56	`</listitem>`
		57
		58	`<listitem>`
		59	`<para><code>sid</code> 1.2 - this version of <code>sid</code> added`
		60	`predicates, exception handling, improved inlining options, and a`
		61	`grammar test pseudo-language.</para>`
		62	`</listitem>`
		63
		64	`<listitem>`
		65	`<para><code>sid</code> 1.3 - this version of <code>sid</code> added`
		66	`anonymous rules, and a better syntax for the C specific`
		67	`information.</para>`
		68	`</listitem>`
		69
		70	`<listitem>`
		71	`<para><code>sid</code> 1.4 - this was a bug fix version of`
		72	`<code>sid</code> 1.3.</para>`
		73	`</listitem>`
		74
		75	`<listitem>`
		76	`<para><code>sid</code> 1.5 - this was a bug fix version of`
		77	`<code>sid</code> 1.4. The command line option syntax changed in`
		78	`this release as well.</para>`
		79	`</listitem>`
		80
		81	`<listitem>`
		82	`<para><code>sid</code> 1.6 - this version of <code>sid</code>`
		83	`changed the input syntax, added scoped rules, and removed basics`
		84	`(replacing them with terminal symbols). It also added a stricter`
		85	`ISO C language.</para>`
		86	`</listitem>`
		87
		88	`<listitem>`
		89	`<para><code>sid</code> 1.7 - The syntax of the actions file changed`
		90	`slightly in this version. The error messages and recovery from`
		91	`syntax errors were also improved in this version. This version`
		92	`also added explicit call by reference support (rather than the`
		93	`inconsistent function call semantics of earlier versions). The`
		94	`command line options changed in this version, to support language`
		95	`specific options, and the <code>strict-ansi-c</code> language was`
		96	`dropped. Non-local variables were also added.</para>`
		97	`</listitem>`
		98
		99	`<listitem>`
		100	`<para><code>sid</code> 1.8 - Initialisers were added for non-local`
		101	`variables. Assignment was added.</para>`
		102	`</listitem>`
		103
		104	`<listitem>`
		105	`<para><code>sid</code> 1.9 - this was a bug fix version of`
		106	`<code>sid</code> 1.8.</para>`
		107	`</listitem>`
		108	`</itemizedlist>`
		109
		110	`<para><code>sid</code> turns specifications of languages into`
		111	`programs that recognise those languages. One of the aims of`
		112	`<code>sid</code> was to separate the specification of the`
		113	`language to be recognised from the language that the recogniser`
		114	`program is written in. For this reason, input to <code>sid</code>`
		115	`is split into two components: output language independent`
		116	`information, and output language dependent information.</para>`
		117
		118	`<para>At present, <code>sid</code> will only output programs in C`
		119	`(either ISO or pre-ISO), but it is designed so that adding new`
		120	`output languages should be fairly simple. There is one other`
		121	`pseudo-language: the test language. This is used for testing`
		122	`grammars and the transforms, but will not output a parser.</para>`
		123	`</chapter>`
		124
		125	`<chapter id="grammar">`
		126	`<title>Grammars</title>`
		127
		128	`<sect1 id="parsing">`
		129	`<title>Parsing</title>`
		130
		131	`<para>There are two phases in traditional language recognition that`
		132	`are relevant to <code>sid</code>: the first is lexical analysis`
		133	`(breaking the input up into terminal symbols); the second is`
		134	`syntax analysis or parsing (ensuring that the terminal symbols in`
		135	`the input are in the correct order).</para>`
		136
		137	`<para><code>sid</code> currently does very little to help with`
		138	`lexical analysis. It is possible to use <code>sid</code> to`
		139	`produce the lexical analyser, but <code>sid</code> provides no`
		140	`real support for it at present. For now, the programmer is`
		141	`expected to write the lexical analyser, or use another tool to`
		142	`produce it.</para>`
		143
		144	`<para>The lexical analyser should break the input up into a series`
		145	`of terminals. Each of these terminals is allocated a number. In`
		146	`<code>sid</code>, these numbers range from zero to the maximum`
		147	`terminal number (specified in the grammar description). The`
		148	`terminals may also have data associated with them (e.g. the value`
		149	`of a number), known as the attributes of the terminal, or the`
		150	`results of the terminal.</para>`
		151
		152	`<para><code>sid</code> generates the parser. The parser is a program`
		153	`that reads the sequence of terminals from the lexical analyser,`
		154	`and ensures that they form a valid input in the language being`
		155	`recognised. If the input is not valid, then the parser will fail`
		156	`(<code>sid</code> provides mechanisms to allow the parser to`
		157	`recover from errors).</para>`
		158	`</sect1>`
		159
		160	`<sect1 id="context-free">`
		161	`<title>Context free grammars</title>`
		162
		163	`<para>This section provides a brief introduction to grammars. It is`
		164	`not intended to be an in-depth guide to grammars, more an`
		165	`introduction which defines some terminology.</para>`
		166
		167	`<para><code>sid</code> works with a subset of what are known as`
		168	`context free grammars. A context free grammar consists of a set`
		169	`of input symbols (known as terminals), a set of rules`
		170	`(descriptions of what are legal forms in the language, also known`
		171	`as non-terminals), and an entry point (the rule from which the`
		172	`grammar starts).</para>`
		173
		174	`<para>A rule is defined as a series of alternatives (throughout this`
		175	`document the definition of a rule is called a production - this`
		176	`may conflict with some other uses of the term). Each alternative`
		177	`consists of a sequence of items. An item can be a terminal or a`
		178	`rule. As an example (using the <code>sid</code> notation, which`
		179	`now looks unlike the conventional syntax for grammars), a comma`
		180	`separated list of integers could be specified as:`
		181
		182	`<programlisting>`
		183	`list-of-integers = {`
		184	`integer ;`
		185	`\|\|`
		186	`integer ;`
		187	`comma ;`
		188	`list-of-integers ;`
		189	`} ;`
		190	`</programlisting></para>`
		191
		192	`<para>This production contains two alternatives: the first matches the`
		193	`terminal <code>integer</code>; the second matches the sequence of`
		194	`terminals <code>integer</code> followed by <code>comma</code>,`
		195	`followed by another list of integers.</para>`
		196
		197
		198	`<para>There is much documentation available on context free grammars`
		199	`(and other types of formal grammar). The reader is advised to`
		200	`find an alternative source for more information.</para>`
		201	`</sect1>`
		202
		203	`<sect1 id="sid-grammar">`
		204	`<title>sid grammars</title>`
		205
		206	`<para><code>sid</code> grammars are based upon a subset of context`
		207	`free grammars, known as LL(1) grammars. The main property of such`
		208	`grammars is that the parser can always tell which alternative it`
		209	`is going to parse next by looking at the current terminal symbol.`
		210	`<code>sid</code> does a number of transforms to turn grammars`
		211	`that are not in this form into it (although it cannot turn all`
		212	`possible grammars into this form). It also provides facilities`
		213	`that allow the user to alter the control flow of the parser.</para>`
		214
		215	`<para><code>sid</code> makes the following changes to the context`
		216	`free grammars described above:</para>`
		217
		218	`<orderedlist>`
		219	`<listitem><para>There may be more than one entry point to the`
		220	`grammar.</para></listitem>`
		221
		222	`<listitem><para>As well as being a rule or a terminal, an item may be`
		223	`an action, a predicate or an identity. An action is just a user`
		224	`supplied function. A predicate is a cross between a basic and an`
		225	`action (it is a user defined function but it may alter the control`
		226	`flow). An identity is like an assignment in a conventional`
		227	`programming language.</para></listitem>`
		228
		229	`<listitem><para>Rules may take parameters and return results (as may`
		230	`actions; terminals may only return results). These may be`
		231	`passed between items using names.</para></listitem>`
		232
		233	`<listitem><para>Each rule can have an exception handler associated`
		234	`with it. Exception handlers are used for error recovery when the`
		235	`input being parsed does not match the grammar.</para></listitem>`
		236
		237	`<listitem><para>Rules may be anonymous.</para></listitem>`
		238
		239	`<listitem><para>Rules may have non-local names associated with them.`
		240	`These names are in scope for that rule and any rules that are`
		241	`defined inside it. The value of each non-local name is saved on`
		242	`entry to the rule, and restored when the rule is`
		243	`exited.</para></listitem>`
		244	`</orderedlist>`
		245	`</sect1>`
		246	`</chapter>`
		247
		248	`<chapter id="overview">`
		249	`<title>Overview</title>`
		250
		251	`<para><code>sid</code> takes the input grammar and applies several`
		252	`transformations to it, before it produces the parser. These`
		253	`transformations allow the language descriptions to be written in`
		254	`a slightly more natural form than would otherwise be`
		255	`necessary.</para>`
		256
		257	`<sect1 id="left-recursion">`
		258	`<title>Left recursion elimination</title>`
		259
		260	`<para>The first transformation is to eliminate any left recursive`
		261	`productions, replacing them with right recursive ones. This`
		262	`transforms a set of rules of the form:`
		263
		264	`<programlisting>`
		265	`Ai = Aj Bji, Ci`
		266	`</programlisting>`
		267
		268	`into:`
		269
		270	`<programlisting>`
		271	`Ai = Cj Xji`
		272	`Xji = Bjk Xki, Yji`
		273	`</programlisting>`
		274
		275	`where <code>Yji</code> is the identity function if <code>i</code>`
		276	`equals <code>j</code>, and an error otherwise. In order for this`
		277	`transformation to work, the following conditions must hold:`
		278
		279	`<orderedlist>`
		280	`<listitem>`
		281	`<para>The parameter type for all <code>Ai</code> must be the`
		282	`same.</para>`
		283	`</listitem>`
		284
		285	`<listitem>`
		286	`<para>The result type for all <code>Ai</code> must be the`
		287	`same.</para>`
		288	`</listitem>`
		289
		290	`<listitem>`
		291	`<para>The exception handlers for all <code>Ai</code> must be the`
		292	`same.</para>`
		293	`</listitem>`
		294
		295	`<listitem>`
		296	`<para>The parameters to each left recursive call to some`
		297	`<code>Aj</code> must be exactly the formal parameters to the`
		298	`calling rule <code>Ai</code>.</para>`
		299	`</listitem>`
		300
		301	`<listitem>`
		302	`<para>Any non-local name referenced by any rule must be in scope`
		303	`for all rules.</para>`
		304	`</listitem>`
		305
		306	`<listitem>`
		307	`<para>A rule may not define non-local variables unless it is the`
		308	`only entry point into the cycle (i.e. there is only one named`
		309	`rule in the cycle).</para>`
		310	`</listitem>`
		311	`</orderedlist>`
		312
		313	`<code>sid</code> will report an error if these conditions are not`
		314	`met.</para>`
		315	`</sect1>`
		316
		317	`<sect1 id="factoring">`
		318	`<title>Factoring</title>`
		319
		320	`<para>The second major transformation is to factor the grammar. It is`
		321	`possible that this phase could go on for ever, so there is a limit to`
		322	`the number of rules that can be generated. This limit may be changed`
		323	`(see the <link linkend="invoke">invocation section</link>). In`
		324	`practice it is rare for this to happen.</para>`
		325
		326	`<para>The factoring phase tries to increase the number of language`
		327	`specifications that <code>sid</code> can produce a parser for, by`
		328	`factoring out common initial items in alternatives, e.g.:`
		329
		330	`<programlisting>`
		331	`X = {`
		332	`a ; b ;`
		333	`\|\| a ; c ;`
		334	`} ;`
		335	`</programlisting>`
		336
		337	`would be transformed into something like:`
		338
		339	`<programlisting>`
		340	`X = {`
		341	`a ; X1 ;`
		342	`} ;`
		343
		344	`X1 = {`
		345	`b ;`
		346	`\|\| c ;`
		347	`} ;`
		348	`</programlisting></para>`
		349
		350	`<para>It will also expand calls to rules at the beginning of`
		351	`alternatives if this is necessary to allow the parser to be produced`
		352	`(this is the phase that the limit is needed for). The expansions are`
		353	`done in the following cases:`
		354
		355	`<orderedlist>`
		356	`<listitem>`
		357	`<para>If the rule is see-through (i.e. there is an expansion of`
		358	`the rule that does not contain any terminals or predicates) and`
		359	`its first set contains terminals in the first set of the rest of`
		360	`the alternative.</para>`
		361	`</listitem>`
		362
		363	`<listitem>`
		364	`<para>If the rule has a predicate in its first set.</para>`
		365	`</listitem>`
		366
		367	`<listitem>`
		368	`<para>If the rule has terminals in its first set that are also in`
		369	`the first set of another alternative that does not begin with`
		370	`the same rule.</para>`
		371	`</listitem>`
		372	`</orderedlist></para>`
		373
		374	`<para>If the rule being expanded contains an exception handler, and it`
		375	`is not identical to the exception handler in the enclosing rule, then`
		376	`an error will occur. Similarly if the rule being expanded defines any`
		377	`non-local variables then an error will occur.</para>`
		378	`</sect1>`
		379
		380	`<sect1 id="optimise">`
		381	`<title>Optimisations</title>`
		382
		383	`<para>After these two transformations, <code>sid</code> performs some`
		384	`optimisations on the grammar, such as reducing multiple copies of`
		385	`identical rules, removing tail recursion, inlining all basic rules,`
		386	`inlining single alternative rules, inling rules used only once, and`
		387	`inlining everything that can be inlined. All of the inlining is`
		388	`optional.</para>`
		389
		390	`<para>After these optimisations, <code>sid</code> checks the language`
		391	`description to ensure that it is possible to produce a parser for it,`
		392	`and if so it outputs the parser.</para>`
		393	`</sect1>`
		394	`</chapter>`
		395
		396	`<chapter id="invoke">`
		397	`<title>Invocation</title>`
		398
		399	`<para><code>sid</code> should be invoked in the following manner:`
		400
		401	`<programlisting>`
		402	`sid options input-files output-files`
		403	`</programlisting></para>`
		404
		405	`<para>The options are described later. The input files should be a`
		406	`number of input file names dependent upon the output language. The`
		407	`first input file is the <code>sid</code> grammar file. In the case of`
		408	`either C dialects there should be one other input file that provides C`
		409	`specific information to <code>sid</code>. The number of output files`
		410	`is also language specific. At present, two output files should be`
		411	`specified for the C languages. The first should be a <code>.c</code>`
		412	`file into which the parser is written; the second should be a`
		413	`<code>.h</code> file into which the terminal definitions and external`
		414	`function declarations are written.</para>`
		415
		416	`<para>The options list should consist of zero or more of the following`
		417	`options. There are short forms for each of these options as well; see`
		418	`the <code>sid</code> manual page for more information on`
		419	`invocation.</para>`
		420
		421	`<itemizedlist>`
		422	`<listitem>`
		423	`<para>--dump-file <filename>FILE</filename></para>`
		424
		425	`<para>This option causes intermediate dumps of the grammar to be`
		426	`written to the named file. The format of the dump files is similar`
		427	`to the format of the grammar specification, with the following`
		428	`exceptions:</para>`
		429
		430	`<orderedlist>`
		431	`<listitem>`
		432	`<para>Predicates are written with the predicate result replaced`
		433	`by the predicate identifier (this will always be zero), and`
		434	`the result is followed by a <code>?</code> to indicate that it`
		435	`was a predicate. As an example, the predicate:`
		436
		437	`<programlisting>`
		438	`( b, ? ) = <pred> ( a )`
		439	`</programlisting>`
		440
		441	`would be printed out as:`
		442
		443	`<programlisting>`
		444	`( b : Type1T, 0 : Type2T ) ? = <pred> ( a : Type3T )`
		445	`</programlisting></para>`
		446	`</listitem>`
		447
		448	`<listitem>`
		449	`<para>Items that are considered to be inlinable are prefixed by`
		450	`a <code>+</code>. Items that are tail calls which will be`
		451	`eliminated are prefixed by a <code>*</code>.</para>`
		452	`</listitem>`
		453
		454	`<listitem>`
		455	`<para>Nested rules are written at the outer level, with names of`
		456	`the form`
		457	`<code>outer-rule::....::inner-rule</code>.</para>`
		458	`</listitem>`
		459
		460	`<listitem>`
		461	`<para>Types are provided on call parameter and result`
		462	`tuples.</para>`
		463	`</listitem>`
		464
		465	`<listitem>`
		466	`<para>Inline rules are given a generated name, and are written`
		467	`out as a call to the generated rule (and a definition`
		468	`elsewhere).</para>`
		469	`</listitem>`
		470	`</orderedlist>`
		471	`</listitem>`
		472
		473	`<listitem>`
		474	`<para>--factor-limit <token>LIMIT</token></para>`
		475
		476	`<para>This option limits the number of rules that can be created`
		477	`during the factorisation process. It is probably best not to change`
		478	`this.</para>`
		479	`</listitem>`
		480
		481	`<listitem>`
		482	`<para>--help</para>`
		483
		484	`<para>This option writes a summary of the command line options to the`
		485	`standard error stream.</para>`
		486	`</listitem>`
		487
		488	`<listitem>`
		489	`<para>--inline <token>INLINES</token></para>`
		490
		491	`<para>This option controls what inlining will be done in the output`
		492	`parser. The inlines argument should be a comma separated list of the`
		493	`following words:`
		494
		495	`<itemizedlist>`
		496	`<listitem>`
		497	`<para><code>SINGLES</code></para>`
		498
		499	`<para>This causes single alternative rules to be inlined. This`
		500	`inlining is no longer performed as a modification to the`
		501	`grammar (it was in version 1.0).</para>`
		502	`</listitem>`
		503
		504	`<listitem>`
		505	`<para><code>BASICS</code></para>`
		506
		507	`<para>This causes rules that contain only basics (and no`
		508	`exception handlers or empty alternatives) to be inlined. The`
		509	`restriction on exception handlers and empty alternatives is`
		510	`rather arbitrary, and may be changed later.</para>`
		511	`</listitem>`
		512
		513	`<listitem>`
		514	`<para><code>TAIL</code></para>`
		515
		516	`<para>This causes tail recursive calls to be inlined. Without`
		517	`this, tail recursion elimination will not be`
		518	`performed.</para>`
		519	`</listitem>`
		520
		521	`<listitem>`
		522	`<para><code>OTHER</code></para>`
		523
		524	`<para>This causes other calls to be inlined wherever possible.`
		525	`Unless the <code>MULTI</code> inlining is also specified, this`
		526	`will be done only for productions that are called`
		527	`once.</para>`
		528	`</listitem>`
		529
		530	`<listitem>`
		531	`<para><code>MULTI</code></para>`
		532
		533	`<para>This causes calls to be inlined, even if the rule being`
		534	`called is called more than once. Turning this inlining on`
		535	`implies <code>OTHER</code>. Similarly turning off`
		536	`<code>OTHER</code> inlining will turn off <code>MULTI</code>`
		537	`inlining. For grammars of any size, this is probably best`
		538	`avoided; if used the generated parser may be huge (e.g. a C`
		539	`grammar has produced a file that was several hundred MB in`
		540	`size).</para>`
		541	`</listitem>`
		542
		543	`<listitem>`
		544	`<para><code>ALL</code></para>`
		545
		546	`<para>This turns on all inlining.</para>`
		547	`</listitem>`
		548	`</itemizedlist>`
		549
		550	`In addition, prefixing a word with <code>NO</code> turns off that`
		551	`inlining phase. The words may be given in any case. They are`
		552	`evaluated in the order given, so:`
		553
		554	`<programlisting>`
		555	`--inline noall,singles`
		556	`</programlisting>`
		557
		558	`would turn on single alternative rule inlining only, whilst:`
		559
		560	`<programlisting>`
		561	`--inline singles,noall`
		562	`</programlisting>`
		563
		564	`would turn off all inlining. The default is as if <code>sid</code>`
		565	`were invoked with the option:`
		566
		567	`<programlisting>`
		568	`--inline noall,basics,tail`
		569	`</programlisting></para>`
		570	`</listitem>`
		571
		572	`<listitem>`
		573	`<para id="lang">--language <token>LANGUAGE</token></para>`
		574
		575	`<para>This option specifies the output language. Currently this should`
		576	`be one of <code>ansi-c</code>, <code>pre-ansi-c</code> and`
		577	`<code>test</code>. The default is <code>ansi-c</code>.</para>`
		578
		579	`<para>The <code>ansi-c</code> and <code>pre-ansi-c</code> languages`
		580	`are basically the same. The only difference is that`
		581	`<code>ansi-c</code> initially uses function prototypes, and`
		582	`<code>pre-ansi-c</code> doesn't. The C language specific options`
		583	`are:`
		584
		585	`<itemizedlist>`
		586	`<listitem>`
		587	`<para><code>prototypes, proto, no-prototypes,`
		588	`no-proto</code></para>`
		589
		590	`<para>These enable or disable the use of function prototypes.`
		591	`By default this is enabled for <code>ansi-c</code> and`
		592	`disabled for <code>pre-ansi-c</code>.</para>`
		593	`</listitem>`
		594
		595	`<listitem>`
		596	`<para><code>numeric-ids, numeric, no-numeric-ids,`
		597	`no-numeric</code></para>`
		598
		599	`<para>These enable or disable the use of numeric identifiers.`
		600	`Numeric identifiers replace the identifier name with a number,`
		601	`which is mainly of use in stopping identifier names getting`
		602	`too long. The disadvantage is that the code becomes less`
		603	`readable, and more difficult to debug. Numeric identifiers are`
		604	`not used by default.</para>`
		605	`</listitem>`
		606
		607	`<listitem>`
		608	`<para id="casts"><code>casts, cast, no-casts,`
		609	`no-cast</code></para>`
		610
		611	`<para>These enable or disable casting of action and assignment`
		612	`operator immutable parameters. If enabled, a parameter is cast`
		613	`to its own type when it is substituted into the action. This`
		614	`will cause some compilers to complain about attempts to modify`
		615	`the parameter (which can help pick out attempts at mutating`
		616	`parameters that should not be mutated). The disadvantage is`
		617	`that not all compilers will reject attempts at mutation, and`
		618	`that ISO C doesn't allow casting to structure and union types,`
		619	`which means that some code may be illegal. Parameter casting`
		620	`is disabled by default.</para>`
		621	`</listitem>`
		622
		623	`<listitem>`
		624	`<para><code>unreachable-macros, unreachable-macro,`
		625	`unreachable-comments, unreachable-comment</code></para>`
		626
		627	`<para>These choose whether unreachable code is marked by a macro`
		628	`or a comment. The default is to mark unreachable code with a`
		629	`comment <code>/UNREACHED/</code>, however a macro`
		630	`<code>UNREACHED ;</code> may be used instead, if`
		631	`desired.</para>`
		632	`</listitem>`
		633	`</itemizedlist>`
		634
		635	`The <code>test</code> language only takes one input file, and`
		636	`produces no output file. It may be used to check that a grammar is`
		637	`valid. In conjunction with the dump file, it may be used to check`
		638	`the transformations that would be applied to the grammar. There are`
		639	`no language specific options for the <code>test</code>`
		640	`language.</para>`
		641	`</listitem>`
		642
		643	`<listitem>`
		644	`<para>--show-errors</para>`
		645
		646	`<para>This option writes a copy of the current error messages to the`
		647	`standard output. See the manual entry for more details about`
		648	`changing the error message content.</para>`
		649	`</listitem>`
		650
		651	`<listitem>`
		652	`<para>--switch <token>OPTION</token></para>`
		653
		654	`<para>This passes through <token>OPTION</token> as a language specific`
		655	`option. The valid options are described`
		656	`<link linkend="lang">above</link>.</para>`
		657	`</listitem>`
		658
		659	`<listitem>`
		660	`<para>--tab-width <token>NUMBER</token></para>`
		661
		662	`<para>This option specifies the number of spaces that a tab occupies.`
		663	`It defaults to 8. It is only used when indenting output.</para>`
		664	`</listitem>`
		665
		666	`<listitem>`
		667	`<para>--version</para>`
		668
		669	`<para>This option causes the version number and supported languages to`
		670	`be written to the standard error stream.</para>`
		671	`</listitem>`
		672	`</itemizedlist>`
		673	`</chapter>`
		674
		675	`<chapter id="grammar-file">`
		676	`<title>The sid grammar file</title>`
		677
		678	`<para>The <code>sid</code> grammar file should always be the first input`
		679	`file specified on the command line. It provides an output language`
		680	`independent description of the language to be recognised. The file is`
		681	`split up into a number of different sections, each of which begins with`
		682	`a section header. All sections must be included, although it is possible`
		683	`to leave most of them empty. The following sections exist at present:`
		684	`type declaration, terminal declaration, rule definition, and grammar`
		685	`entry points. They must appear in that order. The sections are detailed`
		686	`below, after the lexical conventions.</para>`
		687
		688	`<sect1 id="grammar-lex">`
		689	`<title>Lexical conventions</title>`
		690
		691	`<para>Almost all non-printable characters (but definitely space, tab,`
		692	`newline and form-feed) are considered to be white space, and are`
		693	`ignored other than to separate other tokens. In addition, two comment`
		694	`forms are recognised: the C comment syntax, where comments begin with`
		695	`<code>/</code>, and end with <code>/</code>, although unlike C these`
		696	`comments nest properly; and the C++ line comment syntax, where`
		697	`comments begin with <code>//</code>, and are terminated at the end of`
		698	`the line. Comments are treated in the same way as white space`
		699	`characters.</para>`
		700
		701	`<para>Section headers are enclosed in percent characters, and are case`
		702	`insensitive. The following section headers are currently`
		703	`recognised:`
		704
		705	`<programlisting>`
		706	`%types%`
		707	`%terminals%`
		708	`%productions%`
		709	`%entry%`
		710	`</programlisting></para>`
		711
		712	`<para>Identifiers must begin with a letter, an underscore or a hyphen.`
		713	`This may be followed by zero or more letters, digits, underscores and`
		714	`hyphens. Identifiers are case sensitive. The following are all legal`
		715	`identifiers:`
		716
		717	`<programlisting>`
		718	`expression mult-expr plus_expr expr-1`
		719	`</programlisting></para>`
		720
		721	`<para>Identifiers are split into two namespaces: local names have one`
		722	`space; types, actions, rules, non-local names and terminals share the`
		723	`other namespace, so it is not possible to have an identifier that is a`
		724	`type as well as being a rule for example.</para>`
		725
		726	`<para>The following symbols are also used:`
		727
		728	`<programlisting>`
		729	`& ; = [ : ] ! ,`
		730	`\|\| $ ? { } ( ) <`
		731	`> -> :: ##`
		732	`</programlisting></para>`
		733
		734	`<para>All other characters are illegal.</para>`
		735	`</sect1>`
		736
		737	`<sect1 id="type-decl">`
		738	`<title>The type declaration section</title>`
		739
		740	`<para>The first section is the type declaration section. This begins`
		741	`with the types section header, followed by the names of all of the`
		742	`types used by the grammar. Each type name should be terminated by a`
		743	`semicolon. An example of the type declaration section follows:`
		744
		745	`<programlisting>`
		746	`%types%`
		747	`NumberT ;`
		748	`StringT ;`
		749	`</programlisting>`
		750
		751	`This declares two types: <code>NumberT</code> and`
		752	`<code>StringT</code>. There is no requirement for the type names to`
		753	`resemble names in the target language (in fact this should be avoided,`
		754	`as it is possible to use many different target languages). All types`
		755	`used in the grammar must be declared here. Similarly, all types`
		756	`declared here must be used in the grammar.</para>`
		757	`</sect1>`
		758
		759	`<sect1 id="term-decl">`
		760	`<title>The terminal declaration section</title>`
		761
		762	`<para>After the type declaration section comes the terminal declaration`
		763	`section. This section declares the terminals that will be used by the`
		764	`grammar. A terminal is a recogniser for a symbol in the input alphabet`
		765	`of the language to be recognised. It is possible to declare terminals`
		766	`that are not used by the grammar.</para>`
		767
		768	`<para>The section begins with its section header, followed by the`
		769	`declarations of the terminals. Each terminal declaration begins with`
		770	`the name of the terminal being defined, followed by its type, and`
		771	`terminated by a semicolon. If the terminal is not used in the grammar,`
		772	`the declaration should be preceded by a <code>!</code> symbol.</para>`
		773
		774	`<para>A type (for terminals, rules and actions) is written as a colon,`
		775	`followed by the parameter tuple, followed by the <code>-></code>`
		776	`symbol, followed by the result tuple. If the type is zero-tuple to`
		777	`zero-tuple, then the type may be omitted. A tuple consists of a comma`
		778	`separated sequence of name-type pairs (with the name and type being`
		779	`separated by a colon), surrounded by parentheses. For parameter`
		780	`tuples, the type may be suffixed by a <code>&</code> symbol,`
		781	`indicating that call by reference should be used (the default is call`
		782	`by copy). For declarations, the names should be omitted. For`
		783	`terminals, the parameter type must be the zero-tuple.</para>`
		784
		785	`<para>The simplest type of terminal declaration is as follows:`
		786
		787	`<programlisting>`
		788	`terminal1 ;`
		789	`</programlisting></para>`
		790
		791	`<para>This means the same as:`
		792
		793	`<programlisting>`
		794	`terminal1 : () -> () ;`
		795	`</programlisting></para>`
		796
		797	`<para>An example of a more complex terminal declaration is:`
		798
		799	`<programlisting>`
		800	`terminal2 : () -> ( :StringT ) ;`
		801	`</programlisting></para>`
		802
		803	`<para>If these terminals were not to be used in the grammar, they should`
		804	`be declared as:`
		805
		806	`<programlisting>`
		807	`!terminal1 ;`
		808	`!terminal2 : () -> ( :StringT ) ;`
		809	`</programlisting></para>`
		810	`</sect1>`
		811
		812	`<sect1 id="rule-defn">`
		813	`<title>The rule definition section</title>`
		814
		815	`<para>The rule definition section follows the terminal declaration`
		816	`section. It begins with the section header, followed by the`
		817	`definitions and declarations of all of the rules used in the grammar,`
		818	`and the declarations of all of the actions used in the grammar.</para>`
		819
		820	`<para>Rule declarations look identical to terminal declarations, e.g.:`
		821
		822	`<programlisting>`
		823	`rule1 : ( :NumberT ) -> ( :NumberListT ) ;`
		824	`rule2 ;`
		825	`</programlisting></para>`
		826
		827	`<para>Action declarations are similar, although the names are surrounded`
		828	`by angle brackets, e.g.:`
		829
		830	`<programlisting>`
		831	`<action1> : ( :StringT & ) -> () ;`
		832	`<action2> ;`
		833	`</programlisting></para>`
		834
		835	`<para>A declaration (or a definition) may be prefixed with the`
		836	`<code>::</code> symbol. This symbol forces the definition into the`
		837	`outermost scope. Scopes are described later on.</para>`
		838
		839	`<para>A rule definition (called a production) looks something like the`
		840	`following:`
		841
		842	`<programlisting>`
		843	`add-expr : () -> ( v : NumberT ) = {`
		844	`v1 = mul-expr ;`
		845	`plus ;`
		846	`v2 = expr ;`
		847	`v = <add> ( v1, v2 ) ;`
		848	`\|\|`
		849	`v1 = mul-expr ;`
		850	`minus ;`
		851	`v2 = expr ;`
		852	`v = <subtract> ( v1, v2 ) ;`
		853	`##`
		854	`v = <ex> ;`
		855	`} ;`
		856	`</programlisting></para>`
		857
		858	`<para>The production begins with the rule name, followed by the`
		859	`parameter and result names and types (in this case, the rule name is`
		860	`<code>add-expr</code>, there are no parameters, and there is one`
		861	`result name <code>v</code> of type <code>NumberT</code>). This may`
		862	`optionally be followed by local declarations (there are none here -`
		863	`they are described later).</para>`
		864
		865	`<para>The left hand side of the rule is followed by the <code>=</code>`
		866	`symbol, a list of alternatives surrounded by curly braces, and is`
		867	`terminated by a semicolon. The alternatives are separated by the`
		868	`<code>\|\|</code> symbol, and the last alternative may be separated from`
		869	`its predecessor (there must be one) using the <code>##</code> symbol;`
		870	`if this is the case, then this alternative is the exception handler`
		871	`for the production (otherwise it has no exception handler).</para>`
		872
		873	`<para>An alternative may match the empty string, using the symbol`
		874	`<code>$</code> and the terminator symbol <code>;</code>, i.e.:`
		875
		876	`<programlisting>`
		877	`$ ;`
		878	`</programlisting>`
		879
		880	`unless it is an exception handler alternative (in which case it must`
		881	`do something), or a sequence of items. The empty string is only valid`
		882	`if the production has no results. If you want to match an empty string`
		883	`in a production that has a result, it is necessary to use an action`
		884	`(or identity) to provide a result.</para>`
		885
		886	`<para>An item is an identity, or a call to a (possibly anonymous) rule,`
		887	`a terminal, an action or a predicate. An identity looks like an`
		888	`assignment in conventional programming languages:`
		889
		890	`<programlisting>`
		891	`( a, b, c ) = ( d, e, f ) ;`
		892	`</programlisting></para>`
		893
		894	`<para>Each tuple must contain the same number of names. In the case of a`
		895	`one-tuple, the parentheses may be dropped, e.g.:`
		896
		897	`<programlisting>`
		898	`a = d ;`
		899	`</programlisting></para>`
		900
		901	`<para>Note that this is a binding operation, not an assignment. Each`
		902	`name on the left hand side must be a new name. It is illegal to`
		903	`redeclare a name that is already in scope. It is possible to assign to`
		904	`a name which is already in scope by prefixing that name with the`
		905	`<code>&</code> symbol, e.g.:`
		906
		907	`<programlisting>`
		908	`( a, &b, c ) = ( d, e, f ) ;`
		909	`</programlisting>`
		910
		911	`would assign to the name <code>b</code>, which must have been`
		912	`previously defined (it may be a parameter; if it is a call by`
		913	`reference parameter, then the change will propagate outside to the`
		914	`calling rule).</para>`
		915
		916	`<para>It is also possible to use the <code>!</code> symbol in the result`
		917	`tuple to ignore results, e.g.:`
		918
		919	`<programlisting>`
		920	`( a, !, b, ! ) = ( c, d, e, f ) ;`
		921	`</programlisting></para>`
		922
		923	`<para>This is not particularly useful in an identity, but may be more`
		924	`useful in a call to a rule, terminal or action. A call to a terminal`
		925	`or rule looks like a call to a function in a conventional programming`
		926	`language, e.g.:`
		927
		928	`<programlisting>`
		929	`( a, b ) = rule1 ( c, d ) ;`
		930	`( e, f ) = terminal1 () ;`
		931	`</programlisting></para>`
		932
		933	`<para>Calls to actions have the same form, except that action names are`
		934	`surrounded by angle brackets, e.g.:`
		935
		936	`<programlisting>`
		937	`( g, h, i ) = <action1> ( a, e ) ;`
		938	`</programlisting></para>`
		939
		940	`<para>In addition, one of the names in the result tuple of the call to`
		941	`the action may be the predicate result symbol <code>?</code>, in which`
		942	`case the action is used as a predicate (more details on predicates are`
		943	`given later).</para>`
		944
		945	`<para>When calling a rule, terminal or action, it is necessary to have`
		946	`declared it (or in the case of a rule declared or defined it) before`
		947	`the call.</para>`
		948
		949	`<para>If a rule or action is being invoked, and it takes one or more`
		950	`call by reference parameters, then the corresponding arguments should`
		951	`be prefixed by the <code>&</code> symbol, e.g.:`
		952
		953	`<programlisting>`
		954	`length = <string-length> ( &string ) ;`
		955	`</programlisting></para>`
		956
		957	`<para>If the rule, terminal or action has the zero-tuple as a result,`
		958	`then only the right hand side of the definition is required, e.g.:`
		959
		960	`<programlisting>`
		961	`rule2 ( a, b ) ;`
		962	`terminal2 () ;`
		963	`<action2> ( c, d ) ;`
		964	`</programlisting></para>`
		965
		966	`<para>If the rule, terminal or action has the zero-tuple as a parameter,`
		967	`then the parameter tuple may be omitted, e.g.:`
		968
		969	`<programlisting>`
		970	`( a, b ) = rule3 ;`
		971	`terminal3 ;`
		972	`c = <action3> ;`
		973	`</programlisting></para>`
		974
		975	`<para>In older versions of <code>sid</code>, it used to be possible to`
		976	`have ambiguous items, e.g.:`
		977
		978	`<programlisting>`
		979	`a = b ;`
		980	`</programlisting>`
		981
		982	`where <code>b</code> was both a rule and a name. As local names may`
		983	`not shadow non-local and global names, then this is no longer a`
		984	`problem.</para>`
		985
		986	`<para>In each case, the data flow through the rule is indicated using`
		987	`names. In the previous example of a production, both alternatives have`
		988	`the same data flow: the call to <code>mul-expr</code> returns one`
		989	`value, which is stored in the name <code>v1</code>, and the call to`
		990	`<code>expr</code> returns one value, which is stored in the name`
		991	`<code>v2</code>. Both of these names are passed to the action`
		992	`(<code>add</code> in the first alternative, <code>subtract</code> in`
		993	`the second), which returns a value into the name <code>v</code>`
		994	`(presumably the sum or difference of the previous two values). The`
		995	`name <code>v</code> is the name of the result, so this will be`
		996	`returned as the result of the rule. The exception handler (which is`
		997	`invoked if something fails) calls the action <code>ex</code> to`
		998	`produce the result <code>v</code>.</para>`
		999
		1000	`<para>It is necessary that the types of the data flow through the`
		1001	`production are correct. It is also necessary to define all of the`
		1002	`result names for the production in each of the alternatives in that`
		1003	`production.</para>`
		1004
		1005	`<para>An anonymous rule is written in the same way as the body of a`
		1006	`normal rule, e.g.:`
		1007
		1008	`<programlisting>`
		1009	`list : () -> ( l : ListT ) = {`
		1010	`n = number ;`
		1011	`/* START ANONYMOUS RULE */ {`
		1012	`? = <at-eof> ;`
		1013	`l = <make-list> ( n ) ;`
		1014	`\|\|`
		1015	`comma ;`
		1016	`l1 = list ;`
		1017	`l = <cons> ( n, l1 ) ;`
		1018	`##`
		1019	`l = <com-or-eof> ( n ) ;`
		1020	`} ; /* END ANONYMOUS RULE */`
		1021	`} ;`
		1022	`</programlisting></para>`
		1023
		1024	`<para>An anonymous rule is always inlined.</para>`
		1025
		1026	`<para>The rule name may be followed by a sequence of definitions and`
		1027	`declarations surrounded by the <code>[</code> and <code>]</code>`
		1028	`symbols (which are followed by the rest of the rule). In this case,`
		1029	`the definitions are local to the rule, e.g.:`
		1030
		1031	`<programlisting>`
		1032	`x-list [`
		1033	`x = {`
		1034	`terminal1 ;`
		1035	`terminal2 ;`
		1036	`\|\|`
		1037	`terminal3 ;`
		1038	`} ;`
		1039	`] = {`
		1040	`x ;`
		1041	`\|\|`
		1042	`x ;`
		1043	`x-list ;`
		1044	`} ;`
		1045	`</programlisting></para>`
		1046
		1047	`<para>In this case, the rule <code>x</code> would be local to the rule`
		1048	`<code>x-list</code> and no other rule would be able to use it. In`
		1049	`error messages, the name of the rule <code>x</code> would be written`
		1050	`as <code>x-list::x</code>. All declarations and definitions that occur`
		1051	`inside of the <code>[</code> and <code>]</code> symbols have the scope`
		1052	`of the enclosing rule, unless they are preceded by the <code>::</code>`
		1053	`symbol, in which case they have global scope. This is particularly`
		1054	`necessary for actions, as actions can only be defined with global`
		1055	`scope.</para>`
		1056
		1057	`<para>It is also possible to define non-local names. These are declared`
		1058	`as an identifier (the name), followed by the <code>:</code> symbol,`
		1059	`followed by another identifier (its type), in a similar manner to an`
		1060	`entry in a type tuple. Non-local names are not allowed at the`
		1061	`outermost level (so they may not be prefixed with the <code>::</code>`
		1062	`symbol either). When a non-local name is defined, it is in scope for`
		1063	`all of the rules in its scope that are defined after it is, plus its`
		1064	`defining rule.</para>`
		1065
		1066	`<para>Non-local names have their values saved on entry to their defining`
		1067	`rule, and the value will be restored when the rule is exited. This`
		1068	`includes exiting the rule tail recursively or because of an exception`
		1069	`(if the rule has an exception handler, the non-local name will not be`
		1070	`restored until the exception handler has exited). In almost all other`
		1071	`respects non-local names are the same as local names. An example`
		1072	`follows:`
		1073
		1074	`<programlisting>`
		1075	`rule1 [`
		1076	`name1 : Type1T ;`
		1077	`rule1_1 = {`
		1078	`<action1> ( name1 ) ;`
		1079	`rule2 ( name1 ) ;`
		1080	`} ;`
		1081	`] = {`
		1082	`<action2> ( &name1 ) ;`
		1083	`rule1_1 ;`
		1084	`} ;`
		1085	`</programlisting></para>`
		1086
		1087	`<para>It is possible to associate an initialiser with a non-local name,`
		1088	`by following the type name with a <code>=</code> symbol and the action`
		1089	`name in angle brackets, e.g.:`
		1090
		1091	`<programlisting>`
		1092	`rule1 [`
		1093	`name1 : Type1T = <action1> ;`
		1094	`] = {`
		1095	`// ....`
		1096	`} ;`
		1097	`</programlisting></para>`
		1098
		1099	`<para>In this case the action is called at the start of the rule to`
		1100	`initialise the non-local name. The action should return an object of`
		1101	`the same type as the non-local name. Normally, the action takes no`
		1102	`parameters, however it may take one parameter of the same type as the`
		1103	`non-local name (or a reference to that type), in which case it will be`
		1104	`given the saved value of the non-local name as an argument (this may`
		1105	`be used to build a stack automatically for example).</para>`
		1106	`</sect1>`
		1107
		1108	`<sect1 id="entry-point">`
		1109	`<title>The grammar entry points section</title>`
		1110
		1111	`<para>The final section lists the entry points to the grammar. It begins`
		1112	`with the section header, followed by a comma separated list of rule`
		1113	`names, terminated by a semicolon, e.g.:`
		1114
		1115	`<programlisting>`
		1116	`%entry% expr ;`
		1117	`</programlisting></para>`
		1118
		1119	`<para>If you are going to use a rule as an entry point into the grammar`
		1120	`(i.e. you wish to call the associated function), you must list it in`
		1121	`the entry points list. If not, the function may not exist.</para>`
		1122	`</sect1>`
		1123	`</chapter>`
		1124
		1125	`<chapter id="info-file">`
		1126	`<title>The C information file</title>`
		1127
		1128	`<para>The grammar specification itself is not sufficient to produce a`
		1129	`parser. There also needs to be output language specific information to`
		1130	`allow the parser to interface with the program it is to be part of. In`
		1131	`the case of the C output routines, <code>sid</code> needs to know the`
		1132	`following information:`
		1133
		1134	`<orderedlist>`
		1135	`<listitem>`
		1136	`<para>What code should precede and succeed the automatically`
		1137	`generated code.</para>`
		1138	`</listitem>`
		1139
		1140	`<listitem>`
		1141	`<para>How to map the <code>sid</code> identifiers into C`
		1142	`identifiers.</para>`
		1143	`</listitem>`
		1144
		1145	`<listitem>`
		1146	`<para>How to do assignments for each type.</para>`
		1147	`</listitem>`
		1148
		1149	`<listitem>`
		1150	`<para>How to get the current terminal number.</para>`
		1151	`</listitem>`
		1152
		1153	`<listitem>`
		1154	`<para>How to get the result of the current terminal.</para>`
		1155	`</listitem>`
		1156
		1157	`<listitem>`
		1158	`<para>How to advance the lexical analyser, to get the next`
		1159	`terminal.</para>`
		1160	`</listitem>`
		1161
		1162	`<listitem>`
		1163	`<para>What the actions are defined as, and how to pass parameters to`
		1164	`them.</para>`
		1165	`</listitem>`
		1166
		1167	`<listitem>`
		1168	`<para>How to save and restore the current terminal when an error`
		1169	`occurs.</para>`
		1170	`</listitem>`
		1171	`</orderedlist></para>`
		1172
		1173	`<para>Eventually almost all of this should be user suppliable. At the`
		1174	`moment, some of the information is supplied by the user in the C`
		1175	`information file, some through macros, and some is built in.`
		1176	`<code>sid</code> currently gets the information as follows:`
		1177
		1178	`<orderedlist>`
		1179	`<listitem>`
		1180	`<para>The C information file has a header and a trailer section,`
		1181	`which define code that precedes and succeeds the code that`
		1182	`<code>sid</code> generates.</para>`
		1183	`</listitem>`
		1184
		1185	`<listitem>`
		1186	`<para>The C information file has a section that allows the user to`
		1187	`specify mappings from <code>sid</code> identifiers into C`
		1188	`identifiers. These are only valid for the following types of`
		1189	`identifiers: types, functions (implementations of rules) and`
		1190	`terminals. For other identifier types (or when no mapping is`
		1191	`supplied), <code>sid</code> uses some default rules:</para>`
		1192
		1193	`<para>Firstly, <code>sid</code> applies a transform to the`
		1194	`<code>sid</code> identifier name, to make it a legal C identifier.`
		1195	`At present this maps <code>_</code> to <code>__</code>,`
		1196	`<code>-</code> to <code>_H</code> and <code>:</code> (this occurs`
		1197	`in scoped names) to <code>_C</code>. All other characters are`
		1198	`left unmodified. This transform cannot be changed.</para>`
		1199
		1200	`<para><code>sid</code> also puts a prefix before all identifiers, to`
		1201	`try to prevent clashes (and also to make automatically generated`
		1202	`- i.e. numeric - identifiers legal). These prefixes can be`
		1203	`redefined for each class of identifier, in the C information file.`
		1204	`They should be chosen so as not to clash with any other`
		1205	`identifiers (i.e. no other identifiers should begin with that`
		1206	`prefix).</para>`
		1207
		1208	`<para>By default, the following prefixes are used:`
		1209
		1210	`<table id="identifier-prefixes">`
		1211	`<title>Identifier prefixes</title>`
		1212
		1213	`<tgroup cols="2">`
		1214	`<thead>`
		1215	`<row>`
		1216	`<entry>Prefix</entry>`
		1217
		1218	`<entry>Meaning</entry>`
		1219	`</row>`
		1220	`</thead>`
		1221
		1222	`<tbody>`
		1223	`<row>`
		1224	`<entry><code>ZT</code></entry>`
		1225
		1226	`<entry>This prefix is used before type identifiers, for`
		1227	`the type name itself.</entry>`
		1228	`</row>`
		1229
		1230	`<row>`
		1231	`<entry><code>ZR</code></entry>`
		1232
		1233	`<entry>This prefix is used before rule identifiers, for`
		1234	`the rule's implementation function.</entry>`
		1235	`</row>`
		1236
		1237	`<row>`
		1238	`<entry><code>ZL</code></entry>`
		1239
		1240	`<entry>This prefix is used before rule identifiers, for`
		1241	`the rule's label when tail recursion is being`
		1242	`eliminated. In this case, a number is added to the`
		1243	`suffix before the identifier name, to prevent clashes`
		1244	`when a rule is inlined twice in the same function. It`
		1245	`is also used before other labels that are automatically`
		1246	`generated and are just numbered.</entry>`
		1247	`</row>`
		1248
		1249	`<row>`
		1250	`<entry><code>ZI</code></entry>`
		1251
		1252	`<entry>This prefix is used before name identifiers used as`
		1253	`parameters to functions, or in normal usage. It is also`
		1254	`used by non-local names (which doesn't cause a problem`
		1255	`as they always occur scoped, and local names never`
		1256	`do).</entry>`
		1257	`</row>`
		1258
		1259	`<row>`
		1260	`<entry><code>ZO</code></entry>`
		1261
		1262	`<entry>This prefix is used before name identifiers used as`
		1263	`results of functions. Results are passed as reference`
		1264	`parameters, and this suffix is used then. Another`
		1265	`identifier with the <code>ZI</code> prefix is also used`
		1266	`within the function, and the type reference assignment`
		1267	`operator is used at the end of the function to assign`
		1268	`the results to the reference parameters.</entry>`
		1269	`</row>`
		1270
		1271	`<row>`
		1272	`<entry><code>ZB</code></entry>`
		1273
		1274	`<entry>This prefix is used before the terminal symbol`
		1275	`names in the generated header file.</entry>`
		1276	`</row>`
		1277	`</tbody>`
		1278	`</tgroup>`
		1279	`</table>`
		1280	`</para>`
		1281	`</listitem>`
		1282
		1283	`<listitem>`
		1284	`<para>Normally, <code>sid</code> will do assignments using the C`
		1285	`assignment operator. Sometimes, this will not do the right thing,`
		1286	`so the user can define a set of assignment operations for any type`
		1287	`in the C information file.</para>`
		1288	`</listitem>`
		1289
		1290	`<listitem>`
		1291	`<para><code>sid</code> expects the <code>CURRENT_TERMINAL</code>`
		1292	`macro to be defined, and its definition should return an integer`
		1293	`that is the current terminal. The macro should be an expression,`
		1294	`not a statement.</para>`
		1295	`</listitem>`
		1296
		1297	`<listitem>`
		1298	`<para>It is necessary to define how to extract the results of all`
		1299	`terminals in the C information file (if a terminal doesn't return`
		1300	`anything, then it is not necessary to define how to get the`
		1301	`result).</para>`
		1302	`</listitem>`
		1303
		1304	`<listitem>`
		1305	`<para><code>sid</code> expects the <code>ADVANCE_LEXER</code> macro`
		1306	`to be defined, and its definition should cause the lexical`
		1307	`analyser to read a new token. The new terminal number should be`
		1308	`accessible through the <code>CURRENT_TERMINAL</code> macro. On`
		1309	`entry into the parser <code>CURRENT_TERMINAL</code> should give`
		1310	`the first terminal number.</para>`
		1311	`</listitem>`
		1312
		1313	`<listitem>`
		1314	`<para>All actions, and their parameter and result names are defined`
		1315	`in the C information file.</para>`
		1316	`</listitem>`
		1317
		1318	`<listitem>`
		1319	`<para><code>sid</code> expects the <code>SAVE_LEXER</code> and`
		1320	`<code>RESTORE_LEXER</code> macros to be defined. The first is`
		1321	`called with an argument which is the error terminal value. The`
		1322	`macro should save the current terminal's value, and set the`
		1323	`current terminal to be the error terminal value. The second macro`
		1324	`is called without arguments, and should restore the saved value of`
		1325	`the current terminal. <code>SAVE_LEXER</code> will never be`
		1326	`called more than once without a call to`
		1327	`<code>RESTORE_LEXER</code>, so the save stack only needs one`
		1328	`element.</para>`
		1329	`</listitem>`
		1330	`</orderedlist></para>`
		1331
		1332	`<para>The remainder of this section describes the layout of the C`
		1333	`information file. The lexical conventions are described first, followed`
		1334	`by a description of the sections in the order in which they should`
		1335	`appear. Unlike the <code>sid</code> grammar file, not all sections are`
		1336	`mandatory.</para>`
		1337
		1338	`<sect1 id="info-lex">`
		1339	`<title>Lexical conventions</title>`
		1340
		1341	`<para>The lexical conventions of the C information file are very similar`
		1342	`to those of the <code>sid</code> grammar file. There is a second class`
		1343	`of identifier: the C identifier, which is a subset of the valid`
		1344	`<code>sid</code> identifiers; there is also the C code block.</para>`
		1345
		1346	`<para>A C code block begins with <code>@{</code> and is terminated by`
		1347	`<code>@}</code>. The code block consists of all of the characters`
		1348	`between the start and end of the code block, subject to substitutions.`
		1349	`All substitutions begin with the <code>@</code> character. The`
		1350	`following substitutions are recognised:`
		1351
		1352	`<itemizedlist>`
		1353	`<listitem>`
		1354	`<para><code>@@</code></para>`
		1355
		1356	`<para>This substitutes the <code>@</code> character itself.</para>`
		1357	`</listitem>`
		1358
		1359	`<listitem><para><code>@:</code><token>label</token></para>`
		1360
		1361	`<para>This form marks a label, which will be substituted for in`
		1362	`the output code. This is necessary, because an action may be`
		1363	`inlined into the same function more than once. If this happens,`
		1364	`then without doing label substitution there would be two`
		1365	`identical labels in the same scope. With label substitution,`
		1366	`this problem is avoided. In general, all references to a label`
		1367	`within an action should be prefixed with <code>@:</code>. This`
		1368	`substitution may not be used in header and trailer code`
		1369	`blocks.</para>`
		1370	`</listitem>`
		1371
		1372	`<listitem>`
		1373	`<para><code>@</code><token>identifier</token></para>`
		1374
		1375	`<para>This form marks a parameter or result identifier`
		1376	`substitution. If parameter and result identifiers are not`
		1377	`prefixed with an <code>@</code> character, then they will not be`
		1378	`substituted. It is an error if the identifier is not a parameter`
		1379	`or a result. Header and trailer code blocks have no parameters`
		1380	`or results, so it is always an error to use identifier`
		1381	`substitution in them. It is an error if any of the result`
		1382	`identifiers are not substituted at least once.</para>`
		1383
		1384	`<para>Result identifiers may be assigned to using this form of`
		1385	`identifier substitution, but parameter identifiers may not be`
		1386	`(nor may there address be taken - they are immutable). To try`
		1387	`to prevent this, parameters that are substituted may be cast`
		1388	`to their own type, which makes them unmodifiable in ISO C (see`
		1389	`the notes on the <link linkend="casts"><code>casts</code>`
		1390	`language specific option</link>).</para>`
		1391	`</listitem>`
		1392
		1393	`<listitem>`
		1394	`<para><code>@&</code><token>identifier</token></para>`
		1395
		1396	`<para>This form marks a parameter identifier whose address is to`
		1397	`be substituted, but whose contents will not be modified. The`
		1398	`effects of modifying the identifier are undefined. It is an`
		1399	`error to use this in parameter assignment operator`
		1400	`definitions.</para>`
		1401	`</listitem>`
		1402
		1403	`<listitem>`
		1404	`<para><code>@=</code><token>identifier</token></para>`
		1405
		1406	`<para>This form marks a parameter identifier that will be`
		1407	`modified. For this to be useful, the parameter should be a call`
		1408	`by reference parameter, so that the effect of the modification`
		1409	`will be propagated. This substitution is only valid in actions`
		1410	`(assignment operators are not allowed to modify their`
		1411	`parameters).</para>`
		1412	`</listitem>`
		1413
		1414	`<listitem>`
		1415	`<para><code>@!</code></para>`
		1416
		1417	`<para>This form marks an exception raise. In the generated code, a`
		1418	`jump to the current exception handler will be substituted. This`
		1419	`substitution is only valid in actions.</para>`
		1420	`</listitem>`
		1421
		1422	`<listitem>`
		1423	`<para><code>@.</code></para>`
		1424
		1425	`<para>This form marks an attempt to access the current terminal.`
		1426	`This substitution is only valid in actions.</para>`
		1427	`</listitem>`
		1428
		1429	`<listitem>`
		1430	`<para><code>@></code></para>`
		1431
		1432	`<para>This form marks an attempt to advance the lexical analyser.`
		1433	`This substitution is only valid in actions.</para>`
		1434	`</listitem>`
		1435	`</itemizedlist></para>`
		1436
		1437	`<para>All other forms are illegal. Note that in the case of labels and`
		1438	`identifiers, no white space is allowed between the <code>@:</code>,`
		1439	`<code>@</code>, <code>@&</code> or <code>@=</code> and the`
		1440	`identifier name. An example of a code block is:`
		1441
		1442	`<programlisting>`
		1443	`@{`
		1444	`/* A code block */`
		1445	`{`
		1446	`int i ;`
		1447	`if ( @param ) {`
		1448	`@! ;`
		1449	`}`
		1450	`@result = 0 ;`
		1451	`for ( i = 0 ; i < 100 ; i++ ) {`
		1452	`printf ( "{%d}\n", i ) ;`
		1453	`@result += i ;`
		1454	`}`
		1455	`@=param += @result ;`
		1456	`if ( @. == TOKEN_SEMI ) {`
		1457	`@> ;`
		1458	`}`
		1459	`}`
		1460	`@}`
		1461	`</programlisting></para>`
		1462	`</sect1>`
		1463
		1464	`<sect1 id="prefixes">`
		1465	`<title>The prefixes section</title>`
		1466
		1467	`<para>The first section in the C information file is the prefix`
		1468	`definition section. This section is optional. It begins with the`
		1469	`section header, followed by a list of prefix definitions. A prefix`
		1470	`definition begins with the prefix name, followed by a <code>=</code>`
		1471	`symbol, followed by a C identifier that is the new prefix, and`
		1472	`terminated by a semicolon. The following example shows all of the`
		1473	`prefix names, and their default values:`
		1474
		1475	`<programlisting>`
		1476	`%prefixes%`
		1477	`type = ZT ;`
		1478	`function = ZR ;`
		1479	`label = ZL ;`
		1480	`input = ZI ;`
		1481	`output = ZO ;`
		1482	`terminal = ZB ;`
		1483	`</programlisting></para>`
		1484	`</sect1>`
		1485
		1486	`<sect1 id="maps">`
		1487	`<title>The maps section</title>`
		1488
		1489	`<para>The section that follows the prefixes section is the maps section.`
		1490	`This section is also optional. It begins with its section header,`
		1491	`followed by a list of identifier mappings. An identifier mapping`
		1492	`begins with a <code>sid</code> identifier (either a type, a rule or a`
		1493	`terminal), followed by the <code>-></code> symbol, followed by the`
		1494	`C identifier it is to be mapped to, and terminated by a semicolon. An`
		1495	`example follows:`
		1496
		1497	`<programlisting>`
		1498	`%maps%`
		1499	`NumberT -> unsigned ;`
		1500	`calculator -> calculator ;`
		1501	`</programlisting></para>`
		1502
		1503	`<para>Note that it is not possible to map type identifiers to be`
		1504	`arbitrary C types. It will be necessary to <code>typedef</code> or`
		1505	`macro define the type name in the C file.</para>`
		1506
		1507	`<para>It is recommended that all types, terminals and entry point rules`
		1508	`have their names mapped in this section, although this is not`
		1509	`necessary. If the names are not mapped, they will have funny names in`
		1510	`the rest of the program.</para>`
		1511	`</sect1>`
		1512
		1513	`<sect1 id="header">`
		1514	`<title>The header section</title>`
		1515
		1516	`<para>After the maps section comes the header section. This begins with`
		1517	`the section header, followed by a code block, followed by a comma,`
		1518	`followed by a second code block, and terminated with a semicolon. The`
		1519	`first code block will be inserted at the beginning of the generated`
		1520	`parser file; the second code block will be inserted at the start of`
		1521	`the generated header file. An example is:`
		1522
		1523	`<programlisting>`
		1524	`%header% @{`
		1525	`#include "lexer.h"`
		1526
		1527	`LexerT token ;`
		1528
		1529	`#define CURRENT_TERMINAL token.t`
		1530	`#define ADVANCE_LEXER next_token ()`
		1531
		1532	`extern void terminal_error () ;`
		1533	`extern void syntax_error () ;`
		1534	`@}, @{`
		1535	`@} ;`
		1536	`</programlisting></para>`
		1537	`</sect1>`
		1538
		1539	`<sect1 id="assign">`
		1540	`<title>The assignments section</title>`
		1541
		1542	`<para>The assignments section follows the header section. This section`
		1543	`is optional. Normally, assignment between two identifiers will be done`
		1544	`using the C assignment operator. In some cases this will not do the`
		1545	`correct thing, and it is necessary to do the assignment differently.`
		1546	`All types for which this applies should have an entry in the`
		1547	`assignments section. The section begins with its header, followed by`
		1548	`definitions for each type that needs its own assignment operator. Each`
		1549	`definition should have one parameter, and one result. The action's`
		1550	`name should be the name of the type. An example follows:`
		1551
		1552	`<programlisting>`
		1553	`%assignments%`
		1554
		1555	`ListT : ( l1 ) -> ( l2 ) = @{`
		1556	`if ( @l2.head = @l1.head ) {`
		1557	`@l2.tail = @l1.tail ;`
		1558	`} else {`
		1559	`@l2.tail = &( @l2.head ) ;`
		1560	`}`
		1561	`@} ;`
		1562	`</programlisting></para>`
		1563
		1564	`<para>If a type has an assignment operator defined, it must also have a`
		1565	`parameter assignment operator type defined and a result assignment`
		1566	`operator defined (more precisely it must have either no assignment`
		1567	`operations defined, or all three assignment operations`
		1568	`defined).</para>`
		1569	`</sect1>`
		1570
		1571	`<sect1 id="param-assign">`
		1572	`<title>The parameter assignments section</title>`
		1573
		1574	`<para>The parameter assignments section is very similar to the`
		1575	`assignments section (which it follows), and is also optional. If a`
		1576	`type has an assignment section entry, it must have a parameter`
		1577	`assignment entry as well.</para>`
		1578
		1579	`<para>The parameter assignment operator is used in function calls to`
		1580	`ensure that the object is copied correctly: if no parameter assignment`
		1581	`operator is provided for a type, the standard C call by copy mechanism`
		1582	`is used; if a parameter assignment operator is provided for a type,`
		1583	`then the address of the object is passed by the calling function, and`
		1584	`the called function declares a local of the same type, and uses the`
		1585	`parameter assignment operator to copy the object (this should be`
		1586	`remembered when passing parameters to entry points that have arguments`
		1587	`of a type that has a parameter assignment operator defined).</para>`
		1588
		1589	`<para>The difference between the parameter assignment operator and the`
		1590	`assignment operator is that the parameter identifier to the parameter`
		1591	`assignment operator is a pointer to the object being manipulated,`
		1592	`rather than the object itself. An example reference assignment section`
		1593	`is:`
		1594
		1595	`<programlisting>`
		1596	`%parameter-assignments%`
		1597
		1598	`ListT : ( l1 ) -> ( l2 ) = @{`
		1599	`if ( @l2.head = @l1->head ) {`
		1600	`@l2.tail = @l1->tail ;`
		1601	`} else {`
		1602	`@l2.tail = &( @l2.head ) ;`
		1603	`}`
		1604	`@} ;`
		1605	`</programlisting></para>`
		1606	`</sect1>`
		1607
		1608	`<sect1 id="result-assign">`
		1609	`<title>The result assignments section</title>`
		1610
		1611	`<para>The result assignments section is very similar to the assignments`
		1612	`section and the parameter assignments section (which it follows), and`
		1613	`is also optional. If a type has an assignment section entry, it must`
		1614	`also have a result assignment entry. The only difference between the`
		1615	`two is that the result identifier of the result assignment operation`
		1616	`is a pointer to the object being manipulated, rather than the object`
		1617	`itself. Result assignments are only used when the results of a rule`
		1618	`are assigned back through the reference parameters passed into the`
		1619	`function. An example result assignment section is:`
		1620
		1621	`<programlisting>`
		1622	`%result-assignments%`
		1623
		1624	`ListT : ( l1 ) -> ( l2 ) = @{`
		1625	`if ( @l2->head = @l1.head ) {`
		1626	`@l2->tail = @l1.tail ;`
		1627	`} else {`
		1628	`@l2->tail = &( @l2->head ) ;`
		1629	`}`
		1630	`@} ;`
		1631	`</programlisting></para>`
		1632	`</sect1>`
		1633
		1634	`<sect1 id="term-result">`
		1635	`<title>The terminal result extraction section</title>`
		1636
		1637	`<para>The terminal result extraction section follows the reference`
		1638	`assignment section. It defines how to extract the results from`
		1639	`terminals. The section begins with its section header, followed by the`
		1640	`terminal extraction definitions.</para>`
		1641
		1642	`<para>There must be a definition for every terminal in the grammar that`
		1643	`returns a result. It is an error to include a definition for a`
		1644	`terminal that doesn't return a result. The result of the definition`
		1645	`should be the same as the result of the terminal. An example of the`
		1646	`terminal result extraction section follows:`
		1647
		1648	`<programlisting>`
		1649	`%terminals%`
		1650
		1651	`number : () -> ( n ) = @{`
		1652	`@n = token.u.number ;`
		1653	`@} ;`
		1654
		1655	`identifier : () -> ( i ) = @{`
		1656	`@i = token.u.identifier ;`
		1657	`@} ;`
		1658
		1659	`string : () -> ( s ) = @{`
		1660	`@s = token.u.string ;`
		1661	`@} ;`
		1662	`</programlisting></para>`
		1663	`</sect1>`
		1664
		1665	`<sect1 id="action-defn">`
		1666	`<title>The action definition section</title>`
		1667
		1668	`<para>The action definition section follows the terminal result`
		1669	`extractor definition section. The format is similar to the previous`
		1670	`sections: the section header followed by definitions for all of the`
		1671	`actions. An action definition has the following form:`
		1672
		1673	`<programlisting>`
		1674	`<action-name> : ( parameters ) -> ( results ) = code-block ;`
		1675	`</programlisting></para>`
		1676
		1677	`<para>This is similar to the form of all previous definitions, except`
		1678	`that the name is surrounded in angle brackets. What follows is also`
		1679	`true of the other definitions as well (unless they state`
		1680	`otherwise).</para>`
		1681
		1682	`<para>The <code>action-name</code> is a <code>sid</code> identifier that`
		1683	`is the name of the action being defined; <code>parameters</code> is a`
		1684	`comma separated list of C identifiers that will be the names of the`
		1685	`parameters passed to the action, and <code>results</code> is a comma`
		1686	`separated list of C identifiers that will be the names of the result`
		1687	`parameters passed to the action. The <code>code-block</code> is the C`
		1688	`code that defines the action. It is expected that this will assign a`
		1689	`valid result to each of the result identifier names.</para>`
		1690
		1691	`<para>The parameter and result tuples have the same form as in the`
		1692	`language independent file, except that the types are optional. Like`
		1693	`the language independent file, if the type of an action is zero-tuple`
		1694	`to zero-tuple, then the type can be omitted, e.g.:`
		1695
		1696	`<programlisting>`
		1697	`<action> = @{ /* .... */ @} ;`
		1698	`</programlisting></para>`
		1699
		1700	`<para>An example action definition section is:`
		1701
		1702	`<programlisting>`
		1703	`%actions%`
		1704
		1705	`<add> : ( v1, v2 ) -> ( v3 ) = @{`
		1706	`@v3 = @v1 + @v2 ;`
		1707	`@} ;`
		1708
		1709	`<subtract> : ( v1 : NumberT, v2 : NumberT ) -> ( v3 : NumberT ) = @{`
		1710	`@v3 = @v1 - @v2 ;`
		1711	`@} ;`
		1712
		1713	`<multiply> : ( v1 : NumberT, v2 ) -> ( v3 ) = @{`
		1714	`@v3 = @v1 * @v2 ;`
		1715	`@} ;`
		1716
		1717	`<divide> : ( v1, v2 ) -> ( v3 : NumberT ) = @{`
		1718	`@v3 = @v1 / @v2 ;`
		1719	`@} ;`
		1720
		1721	`<print> : ( v ) -> () = @{`
		1722	`printf ( "%u\n", @v ) ;`
		1723	`@} ;`
		1724
		1725	`<error> = @{`
		1726	`fprintf ( stderr, "ERROR\n" ) ;`
		1727	`exit ( EXIT_FAILURE ) ;`
		1728	`@} ;`
		1729	`</programlisting></para>`
		1730
		1731	`<para>Do not define static variables in action definitions; if you do,`
		1732	`you will get unexpected results. If you wish to use static variables`
		1733	`in actions definitions, then define them in the header block.</para>`
		1734	`</sect1>`
		1735
		1736	`<sect1 id="trailer">`
		1737	`<title>The trailer section</title>`
		1738
		1739	`<para>After the action definition section comes the trailer section.`
		1740	`This has the same form as the header section. An example is:`
		1741
		1742	`<programlisting>`
		1743	`%trailer% @{`
		1744	`int main ()`
		1745	`{`
		1746	`next_token () ;`
		1747	`calculator ( NULL ) ;`
		1748	`return 0 ;`
		1749	`}`
		1750	`@}, @{`
		1751	`@} ;`
		1752	`</programlisting></para>`
		1753
		1754	`<para>The code blocks will be appended to the generated parser, and the`
		1755	`generated header file respectively.</para>`
		1756	`</sect1>`
		1757	`</chapter>`
		1758
		1759	`<chapter id="predicate">`
		1760	`<title>Predicates</title>`
		1761
		1762	`<para>Predicates provide the user with a mechanism for altering the`
		1763	`control flow in a manner that terminals alone cannot do.</para>`
		1764
		1765	`<para>During the factorisation process, rules that begin with predicates`
		1766	`are expanded if necessary to ensure that predicates that may be used to`
		1767	`select which alternative to go down always begin the alternative, e.g.:`
		1768
		1769	`<programlisting>`
		1770	`rule1 = {`
		1771	`rule2 ;`
		1772	`/* .... */`
		1773	`\|\|`
		1774	`/* .... */`
		1775	`} ;`
		1776
		1777	`rule2 = {`
		1778	`? = <predicate> ;`
		1779	`/* .... */`
		1780	`\|\|`
		1781	`/* .... */`
		1782	`} ;`
		1783	`</programlisting>`
		1784
		1785	`would be expanded into:`
		1786
		1787	`<programlisting>`
		1788	`rule1 = {`
		1789	`? = predicate ;`
		1790	`/* .... */`
		1791	`/* .... */`
		1792	`\|\|`
		1793	`/* .... */`
		1794	`/* .... */`
		1795	`\|\|`
		1796	`/* .... */`
		1797	`} ;`
		1798	`</programlisting></para>`
		1799
		1800	`<para>Also, if a predicate is used to select which alternative to use, it`
		1801	`must be the first thing in the alternative, so the following would not`
		1802	`be allowed:`
		1803
		1804	`<programlisting>`
		1805	`rule = {`
		1806	`<action> ;`
		1807	`? = <predicate> ;`
		1808	`/* .... */`
		1809	`\|\|`
		1810	`/* .... */`
		1811	`} ;`
		1812	`</programlisting></para>`
		1813
		1814	`<para>When predicates begin a rule, they are executed (in some arbitrary`
		1815	`order) until one of them returns true. The alternative that this`
		1816	`predicate begins is then selected. If no predicates return true, then`
		1817	`one of the remaining alternatives is selected based upon the current`
		1818	`terminal (or an error occurs).</para>`
		1819
		1820	`<para>It is important that predicates do not contain dependencies upon the`
		1821	`order of evaluation. In practice, predicates are likely to be simple, so`
		1822	`this shouldn't be a problem.</para>`
		1823
		1824	`<para>When predicates are used within an alternative, they behave like`
		1825	`terminals. If they evaluate to true, then parsing continues. If they`
		1826	`evaluate to false, then an exception is raised.</para>`
		1827	`</chapter>`
		1828
		1829	`<chapter id="exception">`
		1830	`<title>Error handling</title>`
		1831
		1832	`<para>If the input given to the parser is valid, then the parser will not`
		1833	`need to produce any errors. Unfortunately this is not always the case,`
		1834	`so <code>sid</code> provides a mechanism for handling errors.</para>`
		1835
		1836	`<para>When an error occurs, an exception is raised. This passes control to`
		1837	`the nearest enclosing exception handler. If there is no exception`
		1838	`handler at all, the entry point function will return with the current`
		1839	`terminal set to the error value.</para>`
		1840
		1841	`<para>An exception handler is just an alternative that is executed when a`
		1842	`terminal or predicate fails. This should obviate the need to rely upon`
		1843	`language specific mechanisms (such as <code>setjmp</code> and`
		1844	`<code>longjmp</code>) for error recovery.</para>`
		1845	`</chapter>`
		1846
		1847	`<chapter id="call-by-ref">`
		1848	`<title>Call by reference</title>`
		1849
		1850	`<para>The default behaviour of <code>sid</code> is to do argument passing`
		1851	`using call by copy semantics, and to not allow mutation of parameters of`
		1852	`rules and actions (however inlined rules, and rules created during`
		1853	`factoring have call by reference parameters). However it is possible to`
		1854	`give rule and action parameters call by reference semantics, using the`
		1855	`<code>&</code> symbol in the type specification (as described`
		1856	`earlier). It is also possible to mutate parameters of actions, using the`
		1857	`<code>@=</code> substitution in the action body (also described`
		1858	`earlier). It is important to do the correct substitutions in action`
		1859	`definitions, as <code>sid</code> uses this information to decide where`
		1860	`it can optimise the output code.</para>`
		1861
		1862	`<para>If a call by copy parameter is mutated, then <code>sid</code> will`
		1863	`introduce a new temporary variable and copy the parameter into it - this`
		1864	`temporary will then be mutated. Similar code will be output for rules`
		1865	`that have call by copy parameters that are mutated (e.g. as a call by`
		1866	`reference argument to an action that mutates its parameters).</para>`
		1867	`</chapter>`
		1868
		1869	`<chapter id="call-entry">`
		1870	`<title>Calling entry points</title>`
		1871
		1872	`<para>When calling a function that implements an entry point rule, it`
		1873	`should be called with the rule's parameters as the first arguments,`
		1874	`followed by the addresses of the rule's results as the remaining`
		1875	`arguments. The parameters should have their addresses passed if they are`
		1876	`of a type that has a parameter assignment operator defined, or if the`
		1877	`parameter is a call by reference parameter.</para>`
		1878
		1879	`<para>For example, given the following rule:`
		1880
		1881	`<programlisting>`
		1882	`rule1 : ( :Type1T, :Type2T, :Type3T & ) -> ( :Type4T ) ;`
		1883	`</programlisting>`
		1884
		1885	`where <code>Type2T</code> has a parameter assignment operator defined,`
		1886	`and <code>rule1</code> is mapped to <code>rule1</code> (and the type`
		1887	`names are mapped to themselves), the call would be something like:`
		1888
		1889	`<programlisting>`
		1890	`Type1T a = make_type1 () ;`
		1891	`Type2T b = make_type2 () ;`
		1892	`Type3T c = make_type3 () ;`
		1893	`Type4T d ;`
		1894
		1895	`rule1 ( a, b, &c, &d ) ;`
		1896	`</programlisting></para>`
		1897	`</chapter>`
		1898
		1899	`<chapter id="glossary">`
		1900	`<title>Glossary</title>`
		1901
		1902	`<para>This section describes some of the terms used in the`
		1903	`<code>sid</code> documentation.</para>`
		1904
		1905	`<glossary id="sid-glossary">`
		1906	`<glossentry>`
		1907	`<glossterm>Alternative</glossterm>`
		1908
		1909	`<glossdef>`
		1910	`<para>An alternative is a sequence of items.</para>`
		1911	`</glossdef>`
		1912	`</glossentry>`
		1913
		1914	`<glossentry>`
		1915	`<glossterm>Exception handler</glossterm>`
		1916
		1917	`<glossdef>`
		1918	`<para>An exception handler is a special type of alternative. Each`
		1919	`rule may have at most one exception handler. An exception handler`
		1920	`is invoked if the current terminal does not match any of the`
		1921	`expected terminals, if a predicate fails, or if an action raises`
		1922	`an exception, within the scope of the exception handler.</para>`
		1923	`</glossdef>`
		1924	`</glossentry>`
		1925
		1926	`<glossentry>`
		1927	`<glossterm>Expansion</glossterm>`
		1928
		1929	`<glossdef>`
		1930	`<para>This is the process of physically inlining a rule into another`
		1931	`rule. It is done during the factoring process to turn the grammar`
		1932	`into a form that a parser can be produced for. See the entry for`
		1933	`factoring.</para>`
		1934	`</glossdef>`
		1935	`</glossentry>`
		1936
		1937	`<glossentry>`
		1938	`<glossterm>Factoring</glossterm>`
		1939
		1940	`<glossdef>`
		1941	`<para>This is one of the transforms that <code>sid</code> performs`
		1942	`on the grammar. See the <link linkend="overview">overview`
		1943	`section</link> for a description of the factoring process.</para>`
		1944	`</glossdef>`
		1945	`</glossentry>`
		1946
		1947	`<glossentry>`
		1948	`<glossterm>First set</glossterm>`
		1949
		1950	`<glossdef>`
		1951	`<para>The first set of a rule (or alternative) is the set of`
		1952	`terminals and predicates that can start that rule (or`
		1953	`alternative).</para>`
		1954	`</glossdef>`
		1955	`</glossentry>`
		1956
		1957	`<glossentry>`
		1958	`<glossterm>Follow set</glossterm>`
		1959
		1960	`<glossdef>`
		1961	`<para>The follow set of a rule is the set of terminals and`
		1962	`predicates that can follow the rule in any of its`
		1963	`invocations.</para>`
		1964	`</glossdef>`
		1965	`</glossentry>`
		1966
		1967	`<glossentry>`
		1968	`<glossterm>Inlining</glossterm>`
		1969
		1970	`<glossdef>`
		1971	`<para>This is the process of outputting the code for parsing one`
		1972	`rule within the function that parses another rule. This is`
		1973	`normally done as part of the output process. Expansion is a form`
		1974	`of inlining performed during the factoring process, but the`
		1975	`inlining is done by modifying the grammar, rather than as part of`
		1976	`the output phase.</para>`
		1977	`</glossdef>`
		1978	`</glossentry>`
		1979
		1980	`<glossentry>`
		1981	`<glossterm>Item</glossterm>`
		1982
		1983	`<glossdef>`
		1984	`<para>An item is the equivalent of a statement in a conventional`
		1985	`programming language. It can be an invocation of a rule, terminal,`
		1986	`action or predicate, or an identity operation (assignment).</para>`
		1987	`</glossdef>`
		1988	`</glossentry>`
		1989
		1990	`<glossentry>`
		1991	`<glossterm>Name</glossterm>`
		1992
		1993	`<glossdef>`
		1994	`<para>A name is an identifier that is used to pass information`
		1995	`between rules and actions. Local names are defined within a rule,`
		1996	`and only exist within the rule itself. Non-local names are defined`
		1997	`in a rule's scoped definitions section and exists in all of the`
		1998	`rules in that scope. Non-local rules are also saved and restored`
		1999	`across calls to the rule that defines them.</para>`
		2000	`</glossdef>`
		2001	`</glossentry>`
		2002
		2003	`<glossentry>`
		2004	`<glossterm>Recursion</glossterm>`
		2005
		2006	`<glossdef>`
		2007	`<para>Recursion is where a rule invokes itself. Direct recursion is`
		2008	`where the rule invokes itself from one of its own alternatives;`
		2009	`indirect recursion is where a rule invokes another rule (which`
		2010	`invokes another rule etc.) which eventually invokes the original`
		2011	`rule.</para>`
		2012
		2013	`<para>Left recursion is a form of recursion where all of the`
		2014	`recursive calls occur as the first item in an alternative. It is`
		2015	`not possible to produce a parser for a grammar that contains left`
		2016	`recursions, so <code>sid</code> turns left recursions into right`
		2017	`recursions. This process is known as left recursion`
		2018	`elimination.</para>`
		2019
		2020	`<para>Right recursion is a form of recursion where all of the`
		2021	`recursive calls occur as the last item in an alternative.</para>`
		2022	`</glossdef>`
		2023	`</glossentry>`
		2024
		2025	`<glossentry>`
		2026	`<glossterm>Production</glossterm>`
		2027
		2028	`<glossdef>`
		2029	`<para>See rule.</para>`
		2030	`</glossdef>`
		2031	`</glossentry>`
		2032
		2033	`<glossentry>`
		2034	`<glossterm>Rule</glossterm>`
		2035
		2036	`<glossdef>`
		2037	`<para>A rule is a sequence of alternatives. A rule may contain a`
		2038	`special alternative that is used as an exception handler. A rule`
		2039	`is also referred to as a production; this term is normally used`
		2040	`when talking about the definition of a rule.</para>`
		2041	`</glossdef>`
		2042	`</glossentry>`
		2043
		2044	`<glossentry>`
		2045	`<glossterm>See-through</glossterm>`
		2046
		2047	`<glossdef>`
		2048	`<para>A rule is said to be see-through if there is an expansion of`
		2049	`the rule that does not contain any terminals or predicates.</para>`
		2050	`</glossdef>`
		2051	`</glossentry>`
		2052	`</glossary>`
		2053	`</chapter>`
		2054
		2055	`<chapter id="errors">`
		2056	`<title>Understanding error messages</title>`
		2057
		2058	`<para>This section tries to explain what some of the error messages that`
		2059	`are reported by the <code>sid</code> transforms mean. It does not`
		2060	`contain descriptions of messages like "type 'some type' is unknown", as`
		2061	`these should be self-explanatory.</para>`
		2062
		2063	`<sect1 id="left-errors">`
		2064	`<title>Left recursion elimination errors</title>`
		2065
		2066	`<para><errortext>The parameter or result types of the left recursive`
		2067	`calls in the following productions do not match:`
		2068	`PRODUCTIONS</errortext>: This means that there is a set of rules which`
		2069	`call each other left recursively (i.e. the first item in some of the`
		2070	`alternatives in each rule is a call to another rule in the set), and`
		2071	`they do not all have the same parameter and result types, e.g.:`
		2072
		2073	`<programlisting>`
		2074	`rule1 : ( a : Type1T, b : Type1T, c : Type2T, d : Type2T ) -> () = {`
		2075	`rule2 ( a, b ) ;`
		2076	`\|\|`
		2077	`terminal1 ;`
		2078	`} ;`
		2079
		2080	`rule2 : ( a : Type1T, b : Type2T ) -> () = {`
		2081	`rule1 ( a, a, b, b ) ;`
		2082	`\|\|`
		2083	`terminal2 ;`
		2084	`} ;`
		2085	`</programlisting></para>`
		2086
		2087	`<para><errortext>The exception handlers in the left recursion involving`
		2088	`the following productions do not match: PRODUCTIONS</errortext>: This`
		2089	`means that there is a set of productions which call each other left`
		2090	`recursively (i.e. the first item in an alternative is a call to`
		2091	`another production in the set), and they do not all have the same`
		2092	`exception handler, e.g.:`
		2093
		2094	`<programlisting>`
		2095	`rule1 = {`
		2096	`rule2 ;`
		2097	`\|\|`
		2098	`terminal1 ;`
		2099	`##`
		2100	`<action1> ;`
		2101	`} ;`
		2102
		2103	`rule2 = {`
		2104	`rule1 ;`
		2105	`\|\|`
		2106	`terminal2 ;`
		2107	`##`
		2108	`<action2> ;`
		2109	`} ;`
		2110	`</programlisting></para>`
		2111
		2112	`<para>It is quite likely that when using exception handlers, it may be`
		2113	`necessary to do the left recursion elimination manually to ensure that`
		2114	`the exception handlers occur at the correct place.</para>`
		2115
		2116	`<para><errortext>The argument names of the left recursive calls in the`
		2117	`following productions do not match: PRODUCTIONS</errortext>: This`
		2118	`means that there is a set of productions which call each other left`
		2119	`recursively (i.e. the first item in an alternative is a call to`
		2120	`another production in the set), and the arguments to one of the left`
		2121	`recursive calls are not the same as the parameters of the calling`
		2122	`rule, e.g.:`
		2123
		2124	`<programlisting>`
		2125	`rule1 : ( a : Type1T, b : Type1T ) -> () = {`
		2126	`rule1 ( b, a ) ;`
		2127	`\|\|`
		2128	`terminal1 ;`
		2129	`} ;`
		2130	`</programlisting></para>`
		2131
		2132
		2133	`<para><errortext>A non-local name in the rule 'RULE' is not in scope in`
		2134	`the rule 'RULE' in the left recursive cycle involving the following`
		2135	`productions: PRODUCTIONS</errortext>: This means that there is a set`
		2136	`of productions which call each other left recursively (i.e. the first`
		2137	`item in an alternative is a call to another production in the set),`
		2138	`and the first named rule uses a non-local name that is not in scope in`
		2139	`the second named rule, e.g.:`
		2140
		2141	`<programlisting>`
		2142	`rule1 [`
		2143	`name1 : Type1T ;`
		2144	`rule1_1 [`
		2145	`name1_1 : Type1T ;`
		2146	`] = {`
		2147	`rule1 ;`
		2148	`<action1_1> ( name1_1 ) ;`
		2149	`\|\|`
		2150	`terminal1 ;`
		2151	`} ;`
		2152	`] = {`
		2153	`terminal2 ;`
		2154	`\|\|`
		2155	`rule1_1 ;`
		2156	`<action1> ( name1 ) ;`
		2157	`} ;`
		2158	`</programlisting></para>`
		2159
		2160	`<para><errortext>The rule 'RULE' declares non-local names in the left`
		2161	`recursive cycle with more than one entry point involving the following`
		2162	`productions: PRODUCTIONS</errortext>: This means that there is a set`
		2163	`of productions which call each other left recursively (i.e. the first`
		2164	`item in an alternative is a call to another production in the set),`
		2165	`and the named rule defines non-local variables even though it is not`
		2166	`the only entry point to the cycle, e.g.:`
		2167
		2168	`<programlisting>`
		2169	`rule1 [`
		2170	`name1 : Type1T ;`
		2171	`rule1_1 = {`
		2172	`<action1_1> ( name1 ) ;`
		2173	`} ;`
		2174	`] = {`
		2175	`terminal1 ;`
		2176	`rule1_1 ;`
		2177	`\|\|`
		2178	`rule2 ;`
		2179	`<action1> ( name1 ) ;`
		2180	`} ;`
		2181
		2182	`rule2 = {`
		2183	`rule1 ;`
		2184	`<action2> ;`
		2185	`\|\|`
		2186	`terminal2 ;`
		2187	`} ;`
		2188	`</programlisting></para>`
		2189
		2190	`<para>No cycle termination for the left recursive set involving the`
		2191	`following rules: <token>RULES</token>: This means that there is a set`
		2192	`of productions which call each other left recursively (i.e. the first`
		2193	`item in an alternative is a call to another production in the set),`
		2194	`and they do not contain an alternative that begins with a non-left`
		2195	`recursive call, e.g.:`
		2196
		2197	`<programlisting>`
		2198	`rule1 = {`
		2199	`rule2 ;`
		2200	`\|\|`
		2201	`rule3 ;`
		2202	`} ;`
		2203
		2204	`rule2 = {`
		2205	`rule1 ;`
		2206	`\|\|`
		2207	`rule3 ;`
		2208	`} ;`
		2209
		2210	`rule3 = {`
		2211	`rule1 ;`
		2212	`\|\|`
		2213	`rule2 ;`
		2214	`} ;`
		2215	`</programlisting></para>`
		2216	`</sect1>`
		2217
		2218	`<sect1 id="first-errors">`
		2219	`<title>First set computation errors</title>`
		2220
		2221	`<para>Cannot compute first set for <token>PRODUCTION</token>: This means`
		2222	`that <code>sid</code> cannot compute the set of terminals and`
		2223	`predicates that may start the production. This is normally because`
		2224	`there is a recursive call (or cycle) that contains no terminals, e.g.:`
		2225
		2226	`<programlisting>`
		2227	`rule1 = {`
		2228	`<action1> ;`
		2229	`rule1 ;`
		2230	`\|\|`
		2231	`terminal1 ;`
		2232	`} ;`
		2233	`</programlisting></para>`
		2234
		2235	`<para>This is not removed by the left recursion elimination phase, as`
		2236	`the call is not the leftmost item in the alternative.</para>`
		2237
		2238	`<para>Can see through to predicate '<token>PREDICATE</token>' in`
		2239	`production <token>PRODUCTION</token>: This means that there is a`
		2240	`predicate that isn't the first item in its alternative, but is`
		2241	`preceded only by see-through items, e.g.:`
		2242
		2243	`<programlisting>`
		2244	`rule1 = {`
		2245	`<action1> ;`
		2246	`? = <predicate> ;`
		2247	`\|\|`
		2248	`terminal1 ;`
		2249	`} ;`
		2250	`</programlisting></para>`
		2251
		2252
		2253	`<para>Can see through to predicates in rule '<token>RULE</token>' in`
		2254	`production <token>PRODUCTION</token>: This means that the first rule`
		2255	`has at least one predicate in its first set, and the second rule calls`
		2256	`it in a position where it is not the first item in the alternative and`
		2257	`is preceded only by see-through items, e.g.:`
		2258
		2259	`<programlisting>`
		2260	`rule1 = {`
		2261	`? = <predicate> ;`
		2262	`\|\|`
		2263	`terminal1 ;`
		2264	`} ;`
		2265
		2266	`rule2 = {`
		2267	`<action> ;`
		2268	`rule1 ;`
		2269	`\|\|`
		2270	`terminal2 ;`
		2271	`} ;`
		2272	`</programlisting></para>`
		2273
		2274	`<para>The rule '<token>RULE</token>' has all terminals in its first set`
		2275	`and has a redundant see-through alternative: This means that the`
		2276	`rule's first set (the set of all terminals that can start the rule)`
		2277	`includes all possible input terminals, and the rule also has a`
		2278	`see-through alternative. The see-through alternative will never be`
		2279	`used, as one of the other alternatives will always be chosen.</para>`
		2280	`</sect1>`
		2281
		2282	`<sect1 id="factor-errors">`
		2283	`<title>Factoring errors</title>`
		2284
		2285	`<para>Too many productions (<token>NUMBER</token>) created during`
		2286	`factorisation: This normally means that <code>sid</code> cannot factor`
		2287	`the grammar. You will need to rewrite the offending part.`
		2288	`Unfortunately there is no easy way to do this. Start by looking at the`
		2289	`dump file for a set of rules that seem to have been expanded a`
		2290	`lot.</para>`
		2291
		2292	`<para>The rule '<token>RULE</token>' cannot be expanded into`
		2293	`'<token>RULE</token>' as the exception handlers don't match: When`
		2294	`<code>sid</code> performs factoring, it needs to expand calls to`
		2295	`certain rules into the rules that calls them (this is described in the`
		2296	`<link linkend="overview">overview section</link>). If the called rule`
		2297	`has an exception handler and it is not the same as the exception`
		2298	`handler of the calling rule, then the expansion will fail.</para>`
		2299
		2300	`<para>The rule '<token>RULE</token>' cannot be expanded into`
		2301	`'<token>RULE</token>' as it contains non-local name definitions: When`
		2302	`<code>sid</code> performs factoring, it needs to expand calls to`
		2303	`certain rules into the rules that calls them (this is described in the`
		2304	`<link linkend="overview">overview section</link>). If the called rule`
		2305	`defines any non-local names, then the expansion will fail.</para>`
		2306	`</sect1>`
		2307
		2308	`<sect1 id="check-errors">`
		2309	`<title>Checking errors</title>`
		2310
		2311	`<para><errortext>Collision of terminal(s) TERMINALS in rule`
		2312	`'RULE'</errortext>: This error means that more than one alternative in`
		2313	`the named rule begins with the named terminals, e.g.:`
		2314
		2315	`<programlisting>`
		2316	`rule1 = {`
		2317	`<action1> ;`
		2318	`terminal1 ;`
		2319	`\|\|`
		2320	`terminal1 ;`
		2321	`} ;`
		2322	`</programlisting></para>`
		2323
		2324	`<para>Normally, the factoring process will remove the problem, but when`
		2325	`something like the above happens to stop the factoring occurring, this`
		2326	`error will be produced.</para>`
		2327
		2328	`<para><errortext>Collision of predicate 'PREDICATE' in rule`
		2329	`'RULE'</errortext>: This error occurs when more than one alternative`
		2330	`in the named rule begins with the named predicate, e.g.:`
		2331
		2332	`<programlisting>`
		2333	`rule1 = {`
		2334	`( a, ? ) = <predicate> ;`
		2335	`<action1> ( a ) ;`
		2336	`\|\|`
		2337	`( ?, b ) = <predicate> ;`
		2338	`<action2> ( b ) ;`
		2339	`} ;`
		2340	`</programlisting></para>`
		2341
		2342	`<para>Again, it is normally the case that the factoring process will`
		2343	`remove this problem, but if the same predicate uses different`
		2344	`predicate results in different alternatives, this error will be`
		2345	`produced.</para>`
		2346
		2347	`<para>The terminal(s) <token>TERMINALS</token> can start rule`
		2348	`'<token>RULE</token>' which is see-through, and the same terminal(s)`
		2349	`may appear in the following situations: <token>ALTERNATIVES</token>:`
		2350	`This means that there are one or more terminals that can start the`
		2351	`named rule (which is see-through), and may also follow it, e.g.:`
		2352
		2353	`<programlisting>`
		2354	`rule1 = {`
		2355	`terminal1 ;`
		2356	`\|\|`
		2357	`$ ;`
		2358	`} ;`
		2359
		2360	`rule2 = {`
		2361	`rule1 ;`
		2362	`terminal1 ;`
		2363	`\|\|`
		2364	`terminal2 ;`
		2365	`} ;`
		2366	`</programlisting></para>`
		2367
		2368	`<para>The alternatives listed are the alternatives which call the rule,`
		2369	`and contain (some of) the named terminals after the call. The call is`
		2370	`highlighted.</para>`
		2371
		2372	`<para>The predicate(s) <token>PREDICATES</token> can start rule`
		2373	`'<token>RULE</token>' which is see-through and the same predicate(s)`
		2374	`may appear in the following situations: <token>ALTERNATIVES</token>:`
		2375	`This means that there are one or more predicates that can start the`
		2376	`named rule (which is see-through), and may also follow it, e.g.:`
		2377
		2378	`<programlisting>`
		2379	`rule1 = {`
		2380	`? = <predicate> ;`
		2381	`\|\|`
		2382	`$ ;`
		2383	`} ;`
		2384
		2385	`rule2 = {`
		2386	`terminal1 ;`
		2387	`rule1 ;`
		2388	`? = <predicate> ;`
		2389	`\|\|`
		2390	`terminal2 ;`
		2391	`} ;`
		2392	`</programlisting></para>`
		2393
		2394	`<para>The alternatives listed are the alternatives which call the rule,`
		2395	`and contain (some of) the named predicates after the call. The call`
		2396	`is highlighted.</para>`
		2397
		2398	`<para>The rule '<token>RULE</token>' contains more than one see-through`
		2399	`alternative: This error occurs if the rule has more than one`
		2400	`alternative that doesn't need to read a terminal or a predicate, e.g.:`
		2401
		2402	`<programlisting>`
		2403	`rule1 = {`
		2404	`<action1> ;`
		2405	`\|\|`
		2406	`<action2> ;`
		2407	`} ;`
		2408	`</programlisting></para>`
		2409	`</sect1>`
		2410	`</chapter>`
		2411	`</book>`

Subversion Repositories tendra.SVN

(root)/trunk/doc/user/guides/sid/sid.xml – Rev 7