Subversion Repositories planix.SVN

Rev

Rev 2 | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
2 - 1
.HTML "The Use of Name Spaces in Plan 9
2
.TL
3
The Use of Name Spaces in Plan 9
4
.AU
5
Rob Pike
6
Dave Presotto
7
Ken Thompson
8
Howard Trickey
9
Phil Winterbottom
10
.AI
11
.MH
12
USA
13
.AB
14
.FS
15
Appeared in
16
.I
17
Operating Systems Review,
18
.R
19
Vol. 27, #2, April 1993, pp. 72-76
20
(reprinted from
21
.I
22
Proceedings of the 5th ACM SIGOPS European Workshop,
23
.R
24
Mont Saint-Michel, 1992, Paper nº 34).
25
.FE
26
Plan 9 is a distributed system built at the Computing Sciences Research
27
Center of AT&T Bell Laboratories (now Lucent Technologies, Bell Labs) over the last few years.
28
Its goal is to provide a production-quality system for software
29
development and general computation using heterogeneous hardware
30
and minimal software.  A Plan 9 system comprises CPU and file
31
servers in a central location connected together by fast networks.
32
Slower networks fan out to workstation-class machines that serve as
33
user terminals.  Plan 9 argues that given a few carefully
34
implemented abstractions
35
it is possible to
36
produce a small operating system that provides support for the largest systems
37
on a variety of architectures and networks. The foundations of the system are
38
built on two ideas: a per-process name space and a simple message-oriented 
39
file system protocol.
40
.AE
41
.PP
42
The operating system for the CPU servers and terminals is
43
structured as a traditional kernel: a single compiled image
44
containing code for resource management, process control,
45
user processes,
46
virtual memory, and I/O.  Because the file server is a separate
47
machine, the file system is not compiled in, although the management
48
of the name space, a per-process attribute, is.
49
The entire kernel for the multiprocessor SGI Power Series machine
50
is 25000 lines of C,
51
the largest part of which is code for four networks including the
52
Ethernet with the Internet protocol suite.
53
Fewer than 1500 lines are machine-specific, and a
54
functional kernel with minimal I/O can be put together from
55
source files totaling 6000 lines. [Pike90]
56
.PP
57
The system is relatively small for several reasons.
58
First, it is all new: it has not had time to accrete as many fixes
59
and features as other systems.
60
Also, other than the network protocol, it adheres to no
61
external interface; in particular, it is not Unix-compatible.
62
Economy stems from careful selection of services and interfaces.
63
Finally, wherever possible the system is built around
64
two simple ideas:
65
every resource in the system, either local or remote,
66
is represented by a hierarchical file system; and
67
a user or process
68
assembles a private view of the system by constructing a file
69
.I
70
name space
71
.R
72
that connects these resources. [Needham]
73
.SH
74
File Protocol
75
.PP
76
All resources in Plan 9 look like file systems.
77
That does not mean that they are repositories for
78
permanent files on disk, but that the interface to them
79
is file-oriented: finding files (resources) in a hierarchical
80
name tree, attaching to them by name, and accessing their contents
81
by read and write calls.
82
There are dozens of file system types in Plan 9, but only a few
83
represent traditional files.
84
At this level of abstraction, files in Plan 9 are similar
85
to objects, except that files are already provided with naming,
86
access, and protection methods that must be created afresh for
87
objects.  Object-oriented readers may approach the rest of this
88
paper as a study in how to make objects look like files.
89
.PP
90
The interface to file systems is defined by a protocol, called 9P,
91
analogous but not very similar to the NFS protocol.
92
The protocol talks about files, not blocks; given a connection to the root
93
directory of a file server,
94
the 9P messages navigate the file hierarchy, open files for I/O,
95
and read or write arbitrary bytes in the files.
96
9P contains 17 message types: three for
97
initializing and
98
authenticating a connection and fourteen for manipulating objects.
99
The messages are generated by the kernel in response to user- or
100
kernel-level I/O requests.
101
Here is a quick tour of the major message types.
102
The
103
.CW auth
104
and
105
.CW attach
106
messages authenticate a connection, established by means outside 9P,
107
and validate its user.
108
The result is an authenticated
109
.I channel
110
that points to the root of the
111
server.
112
The
113
.CW clone
114
message makes a new channel identical to an existing channel,
115
which may be moved to a file on the server using a
116
.CW walk
117
message to descend each level in the hierarchy.
118
The
119
.CW stat
120
and
121
.CW wstat
122
messages read and write the attributes of the file pointed to by a channel.
123
The
124
.CW open
125
message prepares a channel for subsequent
126
.CW read
127
and
128
.CW write
129
messages to access the contents of the file, while
130
.CW create
131
and
132
.CW remove
133
perform, on the files, the actions implied by their names.
134
The
135
.CW clunk
136
message discards a channel without affecting the file.
137
None of the 9P messages consider caching; file caches are provided,
138
when needed, either within the server (centralized caching)
139
or by implementing the cache as a transparent file system between the
140
client and the 9P connection to the server (client caching).
141
.PP
142
For efficiency, the connection to local
143
kernel-resident file systems, misleadingly called
144
.I devices,
145
is by regular rather than remote procedure calls.
146
The procedures map one-to-one with 9P message  types.
147
Locally each channel has an associated data structure
148
that holds a type field used to index
149
a table of procedure calls, one set per file system type,
150
analogous to selecting the method set for an object. 
151
One kernel-resident file system, the
152
.I
153
mount device,
154
.R
155
translates the local 9P procedure calls into RPC messages to
156
remote services over a separately provided transport protocol
157
such as TCP or IL, a new reliable datagram protocol, or over a pipe to
158
a user process.
159
Write and read calls transmit the messages over the transport layer.
160
The mount device is the sole bridge between the procedural
161
interface seen by user programs and remote and user-level services.
162
It does all associated marshaling, buffer
163
management, and multiplexing and is
164
the only integral RPC mechanism in Plan 9.
165
The mount device is in effect a proxy object.
166
There is no RPC stub compiler; instead the mount driver and
167
all servers just share a library that packs and unpacks 9P messages.
168
.SH
169
Examples
170
.PP
171
One file system type serves
172
permanent files from the main file server,
173
a stand-alone multiprocessor system with a
174
350-gigabyte
175
optical WORM jukebox that holds the data, fronted by a two-level
176
block cache comprising 7 gigabytes of
177
magnetic disk and 128 megabytes of RAM.
178
Clients connect to the file server using any of a variety of
179
networks and protocols and access files using 9P.
180
The file server runs a distinct operating system and has no
181
support for user processes; other than a restricted set of commands
182
available on the console, all it does is answer 9P messages from clients.
183
.PP
184
Once a day, at 5:00 AM,
185
the file server sweeps through the cache blocks and marks dirty blocks
186
copy-on-write.
187
It creates a copy of the root directory
188
and labels it with the current date, for example
189
.CW 1995/0314 .
190
It then starts a background process to copy the dirty blocks to the WORM.
191
The result is that the server retains an image of the file system as it was
192
early each morning.
193
The set of old root directories is accessible using 9P, so a client
194
may examine backup files using ordinary commands.
195
Several advantages stem from having the backup service implemented
196
as a plain file system.
197
Most obviously, ordinary commands can access them.
198
For example, to see when a bug was fixed
199
.P1
200
grep 'mouse bug fix' 1995/*/sys/src/cmd/8½/file.c
201
.P2
202
The owner, access times, permissions, and other properties of the
203
files are also backed up.
204
Because it is a file system, the backup
205
still has protections;
206
it is not possible to subvert security by looking at the backup.
207
.PP
208
The file server is only one type of file system.
209
A number of unusual services are provided within the kernel as
210
local file systems.
211
These services are not limited to I/O devices such
212
as disks.  They include network devices and their associated protocols,
213
the bitmap display and mouse,
214
a representation of processes similar to
215
.CW /proc
216
[Killian], the name/value pairs that form the `environment'
217
passed to a new process, profiling services,
218
and other resources.
219
Each of these is represented as a file system \(em
220
directories containing sets of files \(em
221
but the constituent files do not represent permanent storage on disk.
222
Instead, they are closer in properties to UNIX device files.
223
.PP
224
For example, the
225
.I console
226
device contains the file
227
.CW /dev/cons ,
228
similar to the UNIX file
229
.CW /dev/console :
230
when written,
231
.CW /dev/cons
232
appends to the console typescript; when read,
233
it returns characters typed on the keyboard.
234
Other files in the console device include
235
.CW /dev/time ,
236
the number of seconds since the epoch,
237
.CW /dev/cputime ,
238
the computation time used by the process reading the device,
239
.CW /dev/pid ,
240
the process id of the process reading the device, and
241
.CW /dev/user ,
242
the login name of the user accessing the device.
243
All these files contain text, not binary numbers,
244
so their use is free of byte-order problems.
245
Their contents are synthesized on demand when read; when written,
246
they cause modifications to kernel data structures.
247
.PP
248
The
249
.I process
250
device contains one directory per live local process, named by its numeric
251
process id:
252
.CW /proc/1 ,
253
.CW /proc/2 ,
254
etc.
255
Each directory contains a set of files that access the process.
256
For example, in each directory the file
257
.CW mem
258
is an image of the virtual memory of the process that may be read or
259
written for debugging.
260
The
261
.CW text
262
file is a sort of link to the file from which the process was executed;
263
it may be opened to read the symbol tables for the process.
264
The
265
.CW ctl
266
file may be written textual messages such as
267
.CW stop
268
or
269
.CW kill
270
to control the execution of the process.
271
The
272
.CW status
273
file contains a fixed-format line of text containing information about
274
the process: its name, owner, state, and so on.
275
Text strings written to the
276
.CW note
277
file are delivered to the process as
278
.I notes,
279
analogous to UNIX signals.
280
By providing these services as textual I/O on files rather
281
than as system calls (such as
282
.CW kill )
283
or special-purpose operations (such as
284
.CW ptrace ),
285
the Plan 9 process device simplifies the implementation of
286
debuggers and related programs.
287
For example, the command
288
.P1
289
cat /proc/*/status
290
.P2
291
is a crude form of the
292
.CW ps
293
command; the actual
294
.CW ps
295
merely reformats the data so obtained.
296
.PP
297
The
298
.I bitmap
299
device contains three files,
300
.CW /dev/mouse ,
301
.CW /dev/screen ,
302
and
303
.CW /dev/bitblt ,
304
that provide an interface to the local bitmap display (if any) and pointing device.
305
The
306
.CW mouse
307
file returns a fixed-format record containing
308
1 byte of button state and 4 bytes each of
309
.I x
310
and
311
.I y
312
position of the mouse.
313
If the mouse has not moved since the file was last read, a subsequent read will
314
block.
315
The
316
.CW screen
317
file contains a memory image of the contents of the display;
318
the
319
.CW bitblt
320
file provides a procedural interface.
321
Calls to the graphics library are translated into messages that are written
322
to the
323
.CW bitblt
324
file to perform bitmap graphics operations.  (This is essentially a nested
325
RPC protocol.)
326
.PP
327
The various services being used by a process are gathered together into the
328
process's
329
.I
330
name space,
331
.R
332
a single rooted hierarchy of file names.
333
When a process forks, the child process shares the name space with the parent.
334
Several system calls manipulate name spaces.
335
Given a file descriptor
336
.CW fd
337
that holds an open communications channel to a service,
338
the call
339
.P1
340
mount(int fd, char *old, int flags)
341
.P2
342
authenticates the user and attaches the file tree of the service to
343
the directory named by
344
.CW old .
345
The
346
.CW flags
347
specify how the tree is to be attached to
348
.CW old :
349
replacing the current contents or appearing before or after the
350
current contents of the directory.
351
A directory with several services mounted is called a
352
.I union
353
directory and is searched in the specified order.
354
The call
355
.P1
356
bind(char *new, char *old, int flags)
357
.P2
358
takes the portion of the existing name space visible at
359
.CW new ,
360
either a file or a directory, and makes it also visible at
361
.CW old .
362
For example,
363
.P1
364
bind("1995/0301/sys/include", "/sys/include", REPLACE)
365
.P2
366
causes the directory of include files to be overlaid with its
367
contents from the dump on March first.
368
.PP
369
A process is created by the
370
.CW rfork
371
system call, which takes as argument a bit vector defining which
372
attributes of the process are to be shared between parent
373
and child instead of copied.
374
One of the attributes is the name space: when shared, changes
375
made by either process are visible in the other; when copied,
376
changes are independent.
377
.PP
378
Although there is no global name space,
379
for a process to function sensibly the local name spaces must adhere
380
to global conventions. 
381
Nonetheless, the use of local name spaces is critical to the system.
382
Both these ideas are illustrated by the use of the name space to
383
handle heterogeneity.
384
The binaries for a given architecture are contained in a directory
385
named by the architecture, for example
386
.CW /mips/bin ;
387
in use, that directory is bound to the conventional location
388
.CW /bin .
389
Programs such as shell scripts need not know the CPU type they are
390
executing on to find binaries to run.
391
A directory of private binaries
392
is usually unioned with
393
.CW /bin .
394
(Compare this to the
395
.I
396
ad hoc
397
.R
398
and special-purpose idea of the
399
.CW PATH
400
variable, which is not used in the Plan 9 shell.)
401
Local bindings are also helpful for debugging, for example by binding
402
an old library to the standard place and linking a program to see
403
if recent changes to the library are responsible for a bug in the program.
404
.PP
405
The window system,
406
.CW 8½
407
[Pike91], is a server for files such as
408
.CW /dev/cons
409
and
410
.CW /dev/bitblt .
411
Each client sees a distinct copy of these files in its local
412
name space: there are many instances of
413
.CW /dev/cons ,
414
each served by
415
.CW 8½
416
to the local name space of a window.
417
Again,
418
.CW 8½
419
implements services using
420
local name spaces plus the use
421
of I/O to conventionally named files.
422
Each client just connects its standard input, output, and error files
423
to
424
.CW /dev/cons ,
425
with analogous operations to access bitmap graphics.
426
Compare this to the implementation of
427
.CW /dev/tty
428
on UNIX, which is done by special code in the kernel
429
that overloads the file, when opened,
430
with the standard input or output of the process.
431
Special arrangement must be made by a UNIX window system for
432
.CW /dev/tty
433
to behave as expected;
434
.CW 8½
435
instead uses the provision of the corresponding file as its
436
central idea, which to succeed depends critically on local name spaces.
437
.PP
438
The environment
439
.CW 8½
440
provides its clients is exactly the environment under which it is implemented:
441
a conventional set of files in
442
.CW /dev .
443
This permits the window system to be run recursively in one of its own
444
windows, which is handy for debugging.
445
It also means that if the files are exported to another machine,
446
as described below, the window system or client applications may be
447
run transparently on remote machines, even ones without graphics hardware.
448
This mechanism is used for Plan 9's implementation of the X window
449
system: X is run as a client of
450
.CW 8½ ,
451
often on a remote machine with lots of memory.
452
In this configuration, using Ethernet to connect
453
MIPS machines, we measure only a 10% degradation in graphics
454
performance relative to running X on
455
a bare Plan 9 machine.
456
.PP
457
An unusual application of these ideas is a statistics-gathering
458
file system implemented by a command called
459
.CW iostats .
460
The command encapsulates a process in a local name space, monitoring 9P
461
requests from the process to the outside world \(em the name space in which
462
.CW iostats
463
is itself running.  When the command completes,
464
.CW iostats
465
reports usage and performance figures for file activity.
466
For example
467
.P1
468
iostats 8½
469
.P2
470
can be used to discover how much I/O the window system
471
does to the bitmap device, font files, and so on.
472
.PP
473
The
474
.CW import
475
command connects a piece of name space from a remote system
476
to the local name space.
477
Its implementation is to dial the remote machine and start
478
a process there that serves the remote name space using 9P.
479
It then calls
480
.CW mount
481
to attach the connection to the name space and finally dies;
482
the remote process continues to serve the files.
483
One use is to access devices not available
484
locally.  For example, to write a floppy one may say
485
.P1
486
import lab.pc /a: /n/dos
487
cp foo /n/dos/bar
488
.P2
489
The call to
490
.CW import
491
connects the file tree from
492
.CW /a:
493
on the machine
494
.CW lab.pc
495
(which must support 9P) to the local directory
496
.CW /n/dos .
497
Then the file
498
.CW foo
499
can be written to the floppy just by copying it across.
500
.PP
501
Another application is remote debugging:
502
.P1
503
import helix /proc
504
.P2
505
makes the process file system on machine
506
.CW helix
507
available locally; commands such as
508
.CW ps
509
then see
510
.CW helix 's
511
processes instead of the local ones.
512
The debugger may then look at a remote process:
513
.P1
514
db /proc/27/text /proc/27/mem
515
.P2
516
allows breakpoint debugging of the remote process.
517
Since
518
.CW db
519
infers the CPU type of the process from the executable header on
520
the text file, it supports
521
cross-architecture debugging, too.
522
Care is taken within
523
.CW db
524
to handle issues of byte order and floating point; it is possible to
525
breakpoint debug a big-endian MIPS process from a little-endian i386.
526
.PP
527
Network interfaces are also implemented as file systems [Presotto].
528
For example,
529
.CW /net/tcp
530
is a directory somewhat like
531
.CW /proc :
532
it contains a set of numbered directories, one per connection,
533
each of which contains files to control and communicate on the connection.
534
A process allocates a new connection by accessing
535
.CW /net/tcp/clone ,
536
which evaluates to the directory of an unused connection.
537
To make a call, the process writes a textual message such as
538
.CW 'connect
539
.CW 135.104.53.2!512'
540
to the
541
.CW ctl
542
file and then reads and writes the
543
.CW data
544
file.
545
An
546
.CW rlogin
547
service can be implemented in a few of lines of shell code.
548
.PP
549
This structure makes network gatewaying easy to provide.
550
We have machines with Datakit interfaces but no Internet interface.
551
On such a machine one may type
552
.P1
553
import helix /net
554
telnet tcp!ai.mit.edu
555
.P2
556
The
557
.CW import
558
uses Datakit to pull in the TCP interface from
559
.CW helix ,
560
which can then be used directly; the
561
.CW tcp!
562
notation is necessary because we routinely use multiple networks
563
and protocols on Plan 9\(emit identifies the network in which
564
.CW ai.mit.edu
565
is a valid name.
566
.PP
567
In practice we do not use
568
.CW rlogin
569
or
570
.CW telnet
571
between Plan 9 machines.  Instead a command called
572
.CW cpu
573
in effect replaces the CPU in a window with that
574
on another machine, typically a fast multiprocessor CPU server.
575
The implementation is to recreate the
576
name space on the remote machine, using the equivalent of
577
.CW import
578
to connect pieces of the terminal's name space to that of
579
the process (shell) on the CPU server, making the terminal
580
a file server for the CPU.
581
CPU-local devices such as fast file system connections
582
are still local; only terminal-resident devices are
583
imported.
584
The result is unlike UNIX
585
.CW rlogin ,
586
which moves into a distinct name space on the remote machine,
587
or file sharing with
588
.CW NFS ,
589
which keeps the name space the same but forces processes to execute
590
locally.
591
Bindings in
592
.CW /bin
593
may change because of a change in CPU architecture, and
594
the networks involved may be different because of differing hardware,
595
but the effect feels like simply speeding up the processor in the
596
current name space.
597
.SH
598
Position
599
.PP
600
These examples illustrate how the ideas of representing resources
601
as file systems and per-process name spaces can be used to solve
602
problems often left to more exotic mechanisms.
603
Nonetheless there are some operations in Plan 9 that are not
604
mapped into file I/O.
605
An example is process creation.
606
We could imagine a message to a control file in
607
.CW /proc
608
that creates a process, but the details of
609
constructing the environment of the new process \(em its open files,
610
name space, memory image, etc. \(em are too intricate to
611
be described easily in a simple I/O operation.
612
Therefore new processes on Plan 9 are created by fairly conventional
613
.CW rfork
614
and
615
.CW exec
616
system calls;
617
.CW /proc
618
is used only to represent and control existing processes.
619
.PP
620
Plan 9 does not attempt to map network name spaces into the file
621
system name space, for several reasons.
622
The different addressing rules for various networks and protocols
623
cannot be mapped uniformly into a hierarchical file name space.
624
Even if they could be,
625
the various mechanisms to authenticate,
626
select a service,
627
and control the connection would not map consistently into
628
operations on a file.
629
.PP
630
Shared memory is another resource not adequately represented by a
631
file name space.
632
Plan 9 takes care to provide mechanisms
633
to allow groups of local processes to share and map memory.
634
Memory is controlled
635
by system calls rather than special files, however,
636
since a representation in the file system would imply that memory could
637
be imported from remote machines.
638
.PP
639
Despite these limitations, file systems and name spaces offer an effective
640
model around which to build a distributed system.
641
Used well, they can provide a uniform, familiar, transparent
642
interface to a diverse set of distributed resources.
643
They carry well-understood properties of access, protection,
644
and naming.
645
The integration of devices into the hierarchical file system
646
was the best idea in UNIX.
647
Plan 9 pushes the concepts much further and shows that
648
file systems, when used inventively, have plenty of scope
649
for productive research.
650
.SH
651
References
652
.LP
653
[Killian] T. Killian, ``Processes as Files'', USENIX Summer Conf. Proc., Salt Lake City, 1984
654
.br
655
[Needham] R. Needham, ``Names'', in
656
.I
657
Distributed systems,
658
.R
659
S. Mullender, ed.,
660
Addison Wesley, 1989
661
.br
662
[Pike90] R. Pike, D. Presotto, K. Thompson, H. Trickey,
663
``Plan 9 from Bell Labs'',
664
UKUUG Proc. of the Summer 1990 Conf.,
665
London, England,
666
1990
667
.br
668
[Presotto] D. Presotto, ``Multiprocessor Streams for Plan 9'',
669
UKUUG Proc. of the Summer 1990 Conf.,
670
London, England,
671
1990
672
.br
673
[Pike91] Pike, R., ``8.5, The Plan 9 Window System'', USENIX Summer
674
Conf. Proc., Nashville, 1991