WebSVN – planix.SVN – Blame – /os/trunk/sys/doc/sleep.ms

Rev	Author	Line No.	Line
2	-	1	`.HTML "Process Sleep and Wakeup on a Shared-memory Multiprocessor`
		2	`.TL`
		3	`Process Sleep and Wakeup on a Shared-memory Multiprocessor`
		4	`.AU`
		5	`Rob Pike`
		6	`Dave Presotto`
		7	`Ken Thompson`
		8	`Gerard Holzmann`
		9	`.sp`
		10	`rob,presotto,ken,gerard@plan9.bell-labs.com`
		11	`.AB`
		12	`.FS`
		13	`Appeared in a slightly different form in`
		14	`.I`
		15	`Proceedings of the Spring 1991 EurOpen Conference,`
		16	`.R`
		17	`Tromsø, Norway, 1991, pp. 161-166.`
		18	`.FE`
		19	The problem of enabling a `sleeping' process on a shared-memory multiprocessor
		20	`is a difficult one, especially if the process is to be awakened by an interrupt-time`
		21	`event. We present here the code`
		22	`for sleep and wakeup primitives that we use in our multiprocessor system.`
		23	`The code has been exercised by years of active use and by a verification`
		24	`system.`
		25	`.AE`
		26	`.LP`
		27	`Our problem is to synchronise processes on a symmetric shared-memory multiprocessor.`
		28	`Processes suspend execution, or`
		29	`.I sleep,`
		30	`while awaiting an enabling event such as an I/O interrupt.`
		31	`When the event occurs, the process is issued a`
		32	`.I wakeup`
		33	`to resume its execution.`
		34	`During these events, other processes may be running and other interrupts`
		35	`occurring on other processors.`
		36	`.LP`
		37	`More specifically, we wish to implement subroutines called`
		38	`.CW sleep ,`
		39	`callable by a process to relinquish control of its current processor,`
		40	`and`
		41	`.CW wakeup ,`
		42	`callable by another process or an interrupt to resume the execution`
		43	`of a suspended process.`
		44	`The calling conventions of these subroutines will remain unspecified`
		45	`for the moment.`
		46	`.LP`
		47	`We assume the processors have an atomic test-and-set or equivalent`
		48	`operation but no other synchronisation method. Also, we assume interrupts`
		49	`can occur on any processor at any time, except on a processor that has`
		50	`locally inhibited them.`
		51	`.LP`
		52	`The problem is the generalisation to a multiprocessor of a familiar`
		53	`and well-understood uniprocessor problem. It may be reduced to a`
		54	`uniprocessor problem by using a global test-and-set to serialise the`
		55	`sleeps and wakeups,`
		56	`which is equivalent to synchronising through a monitor.`
		57	`For performance and cleanliness, however,`
		58	`we prefer to allow the interrupt handling and process control to be multiprocessed.`
		59	`.LP`
		60	`Our attempts to solve the sleep/wakeup problem in Plan 9`
		61	`[Pik90]`
		62	`prompted this paper.`
		63	`We implemented solutions several times over several months and each`
		64	`time convinced ourselves \(em wrongly \(em they were correct.`
		65	`Multiprocessor algorithms can be`
		66	`difficult to prove correct by inspection and formal reasoning about them`
		67	`is impractical. We finally developed an algorithm we trust by`
		68	`verifying our code using an`
		69	`empirical testing tool.`
		70	`We present that code here, along with some comments about the process by`
		71	`which it was designed.`
		72	`.SH`
		73	`History`
		74	`.LP`
		75	`Since processes in Plan 9 and the UNIX`
		76	`system have similar structure and properties, one might ask if`
		77	`UNIX`
		78	`.CW sleep`
		79	`and`
		80	`.CW wakeup`
		81	`[Bac86]`
		82	`could not easily be adapted from their standard uniprocessor implementation`
		83	`to our multiprocessor needs.`
		84	`The short answer is, no.`
		85	`.LP`
		86	`The`
		87	`UNIX`
		88	`routines`
		89	`take as argument a single global address`
		90	`that serves as a unique`
		91	`identifier to connect the wakeup with the appropriate process or processes.`
		92	`This has several inherent disadvantages.`
		93	`From the point of view of`
		94	`.CW sleep`
		95	`and`
		96	`.CW wakeup ,`
		97	`it is difficult to associate a data structure with an arbitrary address;`
		98	`the routines are unable to maintain a state variable recording the`
		99	`status of the event and processes.`
		100	`(The reverse is of course easy \(em we could`
		101	`require the address to point to a special data structure \(em`
		102	`but we are investigating`
		103	`UNIX`
		104	`.CW sleep`
		105	`and`
		106	`.CW wakeup ,`
		107	`not the code that calls them.)`
		108	Also, multiple processes sleep `on' a given address, so
		109	`.CW wakeup`
		110	`must enable them all, and let process scheduling determine which process`
		111	`actually benefits from the event.`
		112	`This is inefficient;`
		113	`a queueing mechanism would be preferable`
		114	`but, again, it is difficult to associate a queue with a general address.`
		115	`Moreover, the lack of state means that`
		116	`.CW sleep`
		117	`and`
		118	`.CW wakeup`
		119	`cannot know what the corresponding process (or interrupt) is doing;`
		120	`.CW sleep`
		121	`and`
		122	`.CW wakeup`
		123	`must be executed atomically.`
		124	`On a uniprocessor it suffices to disable interrupts during their`
		125	`execution.`
		126	`On a multiprocessor, however,`
		127	`most processors`
		128	`can inhibit interrupts only on the current processor,`
		129	`so while a process is executing`
		130	`.CW sleep`
		131	`the desired interrupt can come and go on another processor.`
		132	`If the wakeup is to be issued by another process, the problem is even harder.`
		133	`Some inter-process mutual exclusion mechanism must be used,`
		134	`which, yet again, is difficult to do without a way to communicate state.`
		135	`.LP`
		136	`In summary, to be useful on a multiprocessor,`
		137	`UNIX`
		138	`.CW sleep`
		139	`and`
		140	`.CW wakeup`
		141	`must either be made to run atomically on a single`
		142	`processor (such as by using a monitor)`
		143	`or they need a richer model for their communication.`
		144	`.SH`
		145	`The design`
		146	`.LP`
		147	`Consider the case of an interrupt waking up a sleeping process.`
		148	`(The other case, a process awakening a second process, is easier because`
		149	`atomicity can be achieved using an interlock.)`
		150	`The sleeping process is waiting for some event to occur, which may be`
		151	`modeled by a condition coming true.`
		152	`The condition could be just that the event has happened, or something`
		153	`more subtle such as a queue draining below some low-water mark.`
		154	`We represent the condition by a function of one`
		155	`argument of type`
		156	`.CW void* ;`
		157	`the code supporting the device generating the interrupts`
		158	`provides such a function to be used by`
		159	`.CW sleep`
		160	`and`
		161	`.CW wakeup`
		162	`to synchronise. The function returns`
		163	`.CW false`
		164	`if the event has not occurred, and`
		165	`.CW true`
		166	`some time after the event has occurred.`
		167	`The`
		168	`.CW sleep`
		169	`and`
		170	`.CW wakeup`
		171	`routines must, of course, work correctly if the`
		172	`event occurs while the process is executing`
		173	`.CW sleep .`
		174	`.LP`
		175	`We assume that a particular call to`
		176	`.CW sleep`
		177	`corresponds to a particular call to`
		178	`.CW wakeup ,`
		179	`that is,`
		180	`at most one process is asleep waiting for a particular event.`
		181	`This can be guaranteed in the code that calls`
		182	`.CW sleep`
		183	`and`
		184	`.CW wakeup`
		185	`by appropriate interlocks.`
		186	`We also assume for the moment that there will be only one interrupt`
		187	`and that it may occur at any time, even before`
		188	`.CW sleep`
		189	`has been called.`
		190	`.LP`
		191	`For performance,`
		192	`we desire that multiple instances of`
		193	`.CW sleep`
		194	`and`
		195	`.CW wakeup`
		196	`may be running simultaneously on our multiprocessor.`
		197	`For example, a process calling`
		198	`.CW sleep`
		199	`to await a character from an input channel need not`
		200	`wait for another process to finish executing`
		201	`.CW sleep`
		202	`to await a disk block.`
		203	`At a finer level, we would like a process reading from one input channel`
		204	`to be able to execute`
		205	`.CW sleep`
		206	`in parallel with a process reading from another input channel.`
		207	A standard approach to synchronisation is to interlock the channel `driver'
		208	`so that only one process may be executing in the channel code at once.`
		209	`This method is clearly inadequate for our purposes; we need`
		210	`fine-grained synchronisation, and in particular to apply`
		211	`interlocks at the level of individual channels rather than at the level`
		212	`of the channel driver.`
		213	`.LP`
		214	`Our approach is to use an object called a`
		215	`.I rendezvous ,`
		216	`which is a data structure through which`
		217	`.CW sleep`
		218	`and`
		219	`.CW wakeup`
		220	`synchronise.`
		221	`(The similarly named construct in Ada is a control structure;`
		222	`ours is an unrelated data structure.)`
		223	`A rendezvous`
		224	`is allocated for each active source of events:`
		225	`one for each I/O channel,`
		226	`one for each end of a pipe, and so on.`
		227	`The rendezvous serves as an interlockable structure in which to record`
		228	`the state of the sleeping process, so that`
		229	`.CW sleep`
		230	`and`
		231	`.CW wakeup`
		232	`can communicate if the event happens before or while`
		233	`.CW sleep`
		234	`is executing.`
		235	`.LP`
		236	`Our design for`
		237	`.CW sleep`
		238	`is therefore a function`
		239	`.P1`
		240	`void sleep(Rendezvous r, int (condition)(void), void arg)`
		241	`.P2`
		242	`called by the sleeping process.`
		243	`The argument`
		244	`.CW r`
		245	`connects the call to`
		246	`.CW sleep`
		247	`with the call to`
		248	`.CW wakeup ,`
		249	`and is part of the data structure for the (say) device.`
		250	`The function`
		251	`.CW condition`
		252	`is described above;`
		253	`called with argument`
		254	`.CW arg ,`
		255	`it is used by`
		256	`.CW sleep`
		257	`to decide whether the event has occurred.`
		258	`.CW Wakeup`
		259	`has a simpler specification:`
		260	`.P1`
		261	`void wakeup(Rendezvous *r).`
		262	`.P2`
		263	`.CW Wakeup`
		264	`must be called after the condition has become true.`
		265	`.SH`
		266	`An implementation`
		267	`.LP`
		268	`The`
		269	`.CW Rendezvous`
		270	`data type is defined as`
		271	`.P1`
		272	`typedef struct{`
		273	`Lock l;`
		274	`Proc *p;`
		275	`}Rendezvous;`
		276	`.P2`
		277	`Our`
		278	`.CW Locks`
		279	`are test-and-set spin locks.`
		280	`The routine`
		281	`.CW lock(Lock\ *l)`
		282	`returns when the current process holds that lock;`
		283	`.CW unlock(Lock\ *l)`
		284	`releases the lock.`
		285	`.LP`
		286	`Here is our implementation of`
		287	`.CW sleep .`
		288	`Its details are discussed below.`
		289	`.CW Thisp`
		290	`is a pointer to the current process on the current processor.`
		291	`(Its value differs on each processor.)`
		292	`.P1`
		293	`void`
		294	`sleep(Rendezvous r, int (condition)(void), void arg)`
		295	`{`
		296	`int s;`
		297
		298	`s = inhibit(); /* interrupts */`
		299	`lock(&r->l);`
		300
		301	`/*`
		302	`* if condition happened, never mind`
		303	`*/`
		304	`if((*condition)(arg)){`
		305	`unlock(&r->l);`
		306	`allow(); /* interrupts */`
		307	`return;`
		308	`}`
		309
		310	`/*`
		311	`* now we are committed to`
		312	`* change state and call scheduler`
		313	`*/`
		314	`if(r->p)`
		315	`error("double sleep %d %d", r->p->pid, thisp->pid);`
		316	`thisp->state = Wakeme;`
		317	`r->p = thisp;`
		318	`unlock(&r->l);`
		319	`allow(s); /* interrupts */`
		320	`sched(); /* relinquish CPU */`
		321	`}`
		322	`.P2`
		323	`.ne 3i`
		324	`Here is`
		325	`.CW wakeup.`
		326	`.P1`
		327	`void`
		328	`wakeup(Rendezvous *r)`
		329	`{`
		330	`Proc *p;`
		331	`int s;`
		332
		333	`s = inhibit(); /* interrupts; return old state */`
		334	`lock(&r->l);`
		335	`p = r->p;`
		336	`if(p){`
		337	`r->p = 0;`
		338	`if(p->state != Wakeme)`
		339	`panic("wakeup: not Wakeme");`
		340	`ready(p);`
		341	`}`
		342	`unlock(&r->l);`
		343	`if(s)`
		344	`allow();`
		345	`}`
		346	`.P2`
		347	`.CW Sleep`
		348	`and`
		349	`.CW wakeup`
		350	`both begin by disabling interrupts`
		351	`and then locking the rendezvous structure.`
		352	`Because`
		353	`.CW wakeup`
		354	`may be called in an interrupt routine, the lock must be set only`
		355	`with interrupts disabled on the current processor,`
		356	`so that if the interrupt comes during`
		357	`.CW sleep`
		358	`it will occur only on a different processor;`
		359	`if it occurred on the processor executing`
		360	`.CW sleep ,`
		361	`the spin lock in`
		362	`.CW wakeup`
		363	`would hang forever.`
		364	`At the end of each routine, the lock is released and processor priority`
		365	`returned to its previous value.`
		366	`.CW Wakeup "" (`
		367	`needs to inhibit interrupts in case`
		368	`it is being called by a process;`
		369	`this is a no-op if called by an interrupt.)`
		370	`.LP`
		371	`.CW Sleep`
		372	`checks to see if the condition has become true, and returns if so.`
		373	`Otherwise the process posts its name in the rendezvous structure where`
		374	`.CW wakeup`
		375	`may find it, marks its state as waiting to be awakened`
		376	`(this is for error checking only) and goes to sleep by calling`
		377	`.CW sched() .`
		378	`The manipulation of the rendezvous structure is all done under the lock,`
		379	`and`
		380	`.CW wakeup`
		381	`only examines it under lock, so atomicity and mutual exclusion`
		382	`are guaranteed.`
		383	`.LP`
		384	`.CW Wakeup`
		385	`has a simpler job. When it is called, the condition has implicitly become true,`
		386	`so it locks the rendezvous, sees if a process is waiting, and readies it to run.`
		387	`.SH`
		388	`Discussion`
		389	`.LP`
		390	`The synchronisation technique used here`
		391	`is similar to known methods, even as far back as Saltzer's thesis`
		392	`[Sal66].`
		393	`The code looks trivially correct in retrospect: all access to data structures is done`
		394	`under lock, and there is no place that things may get out of order.`
		395	`Nonetheless, it took us several iterations to arrive at the above`
		396	`implementation, because the things that`
		397	`.I can`
		398	`go wrong are often hard to see. We had four earlier implementations`
		399	`that were examined at great length and only found faulty when a new,`
		400	`different style of device or activity was added to the system.`
		401	`.LP`
		402	`.ne 3i`
		403	`Here, for example, is an incorrect implementation of wakeup,`
		404	`closely related to one of our versions.`
		405	`.P1`
		406	`void`
		407	`wakeup(Rendezvous *r)`
		408	`{`
		409	`Proc *p;`
		410	`int s;`
		411
		412	`p = r->p;`
		413	`if(p){`
		414	`s = inhibit();`
		415	`lock(&r->l);`
		416	`r->p = 0;`
		417	`if(p->state != Wakeme)`
		418	`panic("wakeup: not Wakeme");`
		419	`ready(p);`
		420	`unlock(&r->l);`
		421	`if(s)`
		422	`allow();`
		423	`}`
		424	`}`
		425	`.P2`
		426	`The mistake is that the reading of`
		427	`.CW r->p`
		428	`may occur just as the other process calls`
		429	`.CW sleep ,`
		430	`so when the interrupt examines the structure it sees no one to wake up,`
		431	`and the sleeping process misses its wakeup.`
		432	`We wrote the code this way because we reasoned that the fetch`
		433	`.CW p`
		434	`.CW =`
		435	`.CW r->p`
		436	`was inherently atomic and need not be interlocked.`
		437	`The bug was found by examination when a new, very fast device`
		438	`was added to the system and sleeps and interrupts were closely overlapped.`
		439	`However, it was in the system for a couple of months without causing an error.`
		440	`.LP`
		441	`How many errors lurk in our supposedly correct implementation above?`
		442	`We would like a way to guarantee correctness; formal proofs are beyond`
		443	`our abilities when the subtleties of interrupts and multiprocessors are`
		444	`involved.`
		445	`With that in mind, the first three authors approached the last to see`
		446	`if his automated tool for checking protocols`
		447	`[Hol91]`
		448	`could be`
		449	`used to verify our new`
		450	`.CW sleep`
		451	`and`
		452	`.CW wakeup`
		453	`for correctness.`
		454	`The code was translated into the language for that system`
		455	`(with, unfortunately, no way of proving that the translation is itself correct)`
		456	`and validated by exhaustive simulation.`
		457	`.LP`
		458	`The validator found a bug.`
		459	`Under our assumption that there is only one interrupt, the bug cannot`
		460	`occur, but in the more general case of multiple interrupts synchronising`
		461	`through the same condition function and rendezvous,`
		462	`the process and interrupt can enter a peculiar state.`
		463	`A process may return from`
		464	`.CW sleep`
		465	`with the condition function false`
		466	`if there is a delay between`
		467	`the condition coming true and`
		468	`.CW wakeup`
		469	`being called,`
		470	`with the delay occurring`
		471	`just as the receiving process calls`
		472	`.CW sleep .`
		473	`The condition is now true, so that process returns immediately,`
		474	`does whatever is appropriate, and then (say) decides to call`
		475	`.CW sleep`
		476	`again. This time the condition is false, so it goes to sleep.`
		477	`The wakeup process then finds a sleeping process,`
		478	`and wakes it up, but the condition is now false.`
		479	`.LP`
		480	`There is an easy (and verified) solution: at the end of`
		481	`.CW sleep`
		482	`or after`
		483	`.CW sleep`
		484	`returns,`
		485	`if the condition is false, execute`
		486	`.CW sleep`
		487	`again. This re-execution cannot repeat; the second synchronisation is guaranteed`
		488	`to function under the external conditions we are supposing.`
		489	`.LP`
		490	`Even though the original code is completely`
		491	`protected by interlocks and had been examined carefully by all of us`
		492	`and believed correct, it still had problems.`
		493	`It seems to us that some exhaustive automated analysis is`
		494	`required of multiprocessor algorithms to guarantee their safety.`
		495	`Our experience has confirmed that it is almost impossible to`
		496	`guarantee by inspection or simple testing the correctness`
		497	`of a multiprocessor algorithm. Testing can demonstrate the presence`
		498	`of bugs but not their absence`
		499	`[Dij72].`
		500	`.LP`
		501	`We close by claiming that the code above with`
		502	`the suggested modification passes all tests we have for correctness`
		503	`under the assumptions used in the validation.`
		504	`We would not, however, go so far as to claim that it is universally correct.`
		505	`.SH`
		506	`References`
		507	`.LP`
		508	`[Bac86] Maurice J. Bach,`
		509	`.I "The Design of the UNIX Operating System,`
		510	`Prentice-Hall,`
		511	`Englewood Cliffs,`
		512	`1986.`
		513	`.LP`
		514	`[Dij72] Edsger W. Dijkstra,`
		515	``The Humble Programmer \- 1972 Turing Award Lecture'',
		516	`.I "Comm. ACM,`
		517	`15(10), pp. 859-866,`
		518	`October 1972.`
		519	`.LP`
		520	`[Hol91] Gerard J. Holzmann,`
		521	`.I "Design and Validation of Computer Protocols,`
		522	`Prentice-Hall,`
		523	`Englewood Cliffs,`
		524	`1991.`
		525	`.LP`
		526	`[Pik90]`
		527	`Rob Pike,`
		528	`Dave Presotto,`
		529	`Ken Thompson,`
		530	`Howard Trickey,`
		531	``Plan 9 from Bell Labs'',
		532	`.I "Proceedings of the Summer 1990 UKUUG Conference,`
		533	`pp. 1-9,`
		534	`London,`
		535	`July, 1990.`
		536	`.LP`
		537	`[Sal66] Jerome H. Saltzer,`
		538	`.I "Traffic Control in a Multiplexed Computer System`
		539	`MIT,`
		540	`Cambridge, Mass.,`
		541	`1966.`

Subversion Repositories planix.SVN

(root)/os/trunk/sys/doc/sleep.ms – Rev 2