2 |
- |
1 |
.HTML "The Organization of Networks in Plan 9
|
|
|
2 |
.TL
|
|
|
3 |
The Organization of Networks in Plan 9
|
|
|
4 |
.AU
|
|
|
5 |
Dave Presotto
|
|
|
6 |
Phil Winterbottom
|
|
|
7 |
.sp
|
|
|
8 |
presotto,philw@plan9.bell-labs.com
|
|
|
9 |
.AB
|
|
|
10 |
.FS
|
|
|
11 |
Originally appeared in
|
|
|
12 |
.I
|
|
|
13 |
Proc. of the Winter 1993 USENIX Conf.,
|
|
|
14 |
.R
|
|
|
15 |
pp. 271-280,
|
|
|
16 |
San Diego, CA
|
|
|
17 |
.FE
|
|
|
18 |
In a distributed system networks are of paramount importance. This
|
|
|
19 |
paper describes the implementation, design philosophy, and organization
|
|
|
20 |
of network support in Plan 9. Topics include network requirements
|
|
|
21 |
for distributed systems, our kernel implementation, network naming, user interfaces,
|
|
|
22 |
and performance. We also observe that much of this organization is relevant to
|
|
|
23 |
current systems.
|
|
|
24 |
.AE
|
|
|
25 |
.NH
|
|
|
26 |
Introduction
|
|
|
27 |
.PP
|
|
|
28 |
Plan 9 [Pike90] is a general-purpose, multi-user, portable distributed system
|
|
|
29 |
implemented on a variety of computers and networks.
|
|
|
30 |
What distinguishes Plan 9 is its organization.
|
|
|
31 |
The goals of this organization were to
|
|
|
32 |
reduce administration
|
|
|
33 |
and to promote resource sharing. One of the keys to its success as a distributed
|
|
|
34 |
system is the organization and management of its networks.
|
|
|
35 |
.PP
|
|
|
36 |
A Plan 9 system comprises file servers, CPU servers and terminals.
|
|
|
37 |
The file servers and CPU servers are typically centrally
|
|
|
38 |
located multiprocessor machines with large memories and
|
|
|
39 |
high speed interconnects.
|
|
|
40 |
A variety of workstation-class machines
|
|
|
41 |
serve as terminals
|
|
|
42 |
connected to the central servers using several networks and protocols.
|
|
|
43 |
The architecture of the system demands a hierarchy of network
|
|
|
44 |
speeds matching the needs of the components.
|
|
|
45 |
Connections between file servers and CPU servers are high-bandwidth point-to-point
|
|
|
46 |
fiber links.
|
|
|
47 |
Connections from the servers fan out to local terminals
|
|
|
48 |
using medium speed networks
|
|
|
49 |
such as Ethernet [Met80] and Datakit [Fra80].
|
|
|
50 |
Low speed connections via the Internet and
|
|
|
51 |
the AT&T backbone serve users in Oregon and Illinois.
|
|
|
52 |
Basic Rate ISDN data service and 9600 baud serial lines provide slow
|
|
|
53 |
links to users at home.
|
|
|
54 |
.PP
|
|
|
55 |
Since CPU servers and terminals use the same kernel,
|
|
|
56 |
users may choose to run programs locally on
|
|
|
57 |
their terminals or remotely on CPU servers.
|
|
|
58 |
The organization of Plan 9 hides the details of system connectivity
|
|
|
59 |
allowing both users and administrators to configure their environment
|
|
|
60 |
to be as distributed or centralized as they wish.
|
|
|
61 |
Simple commands support the
|
|
|
62 |
construction of a locally represented name space
|
|
|
63 |
spanning many machines and networks.
|
|
|
64 |
At work, users tend to use their terminals like workstations,
|
|
|
65 |
running interactive programs locally and
|
|
|
66 |
reserving the CPU servers for data or compute intensive jobs
|
|
|
67 |
such as compiling and computing chess endgames.
|
|
|
68 |
At home or when connected over
|
|
|
69 |
a slow network, users tend to do most work on the CPU server to minimize
|
|
|
70 |
traffic on the slow links.
|
|
|
71 |
The goal of the network organization is to provide the same
|
|
|
72 |
environment to the user wherever resources are used.
|
|
|
73 |
.NH
|
|
|
74 |
Kernel Network Support
|
|
|
75 |
.PP
|
|
|
76 |
Networks play a central role in any distributed system. This is particularly
|
|
|
77 |
true in Plan 9 where most resources are provided by servers external to the kernel.
|
|
|
78 |
The importance of the networking code within the kernel
|
|
|
79 |
is reflected by its size;
|
|
|
80 |
of 25,000 lines of kernel code, 12,500 are network and protocol related.
|
|
|
81 |
Networks are continually being added and the fraction of code
|
|
|
82 |
devoted to communications
|
|
|
83 |
is growing.
|
|
|
84 |
Moreover, the network code is complex.
|
|
|
85 |
Protocol implementations consist almost entirely of
|
|
|
86 |
synchronization and dynamic memory management, areas demanding
|
|
|
87 |
subtle error recovery
|
|
|
88 |
strategies.
|
|
|
89 |
The kernel currently supports Datakit, point-to-point fiber links,
|
|
|
90 |
an Internet (IP) protocol suite and ISDN data service.
|
|
|
91 |
The variety of networks and machines
|
|
|
92 |
has raised issues not addressed by other systems running on commercial
|
|
|
93 |
hardware supporting only Ethernet or FDDI.
|
|
|
94 |
.NH 2
|
|
|
95 |
The File System protocol
|
|
|
96 |
.PP
|
|
|
97 |
A central idea in Plan 9 is the representation of a resource as a hierarchical
|
|
|
98 |
file system.
|
|
|
99 |
Each process assembles a view of the system by building a
|
|
|
100 |
.I "name space
|
|
|
101 |
[Needham] connecting its resources.
|
|
|
102 |
File systems need not represent disc files; in fact, most Plan 9 file systems have no
|
|
|
103 |
permanent storage.
|
|
|
104 |
A typical file system dynamically represents
|
|
|
105 |
some resource like a set of network connections or the process table.
|
|
|
106 |
Communication between the kernel, device drivers, and local or remote file servers uses a
|
|
|
107 |
protocol called 9P. The protocol consists of 17 messages
|
|
|
108 |
describing operations on files and directories.
|
|
|
109 |
Kernel resident device and protocol drivers use a procedural version
|
|
|
110 |
of the protocol while external file servers use an RPC form.
|
|
|
111 |
Nearly all traffic between Plan 9 systems consists
|
|
|
112 |
of 9P messages.
|
|
|
113 |
9P relies on several properties of the underlying transport protocol.
|
|
|
114 |
It assumes messages arrive reliably and in sequence and
|
|
|
115 |
that delimiters between messages
|
|
|
116 |
are preserved.
|
|
|
117 |
When a protocol does not meet these
|
|
|
118 |
requirements (for example, TCP does not preserve delimiters)
|
|
|
119 |
we provide mechanisms to marshal messages before handing them
|
|
|
120 |
to the system.
|
|
|
121 |
.PP
|
|
|
122 |
A kernel data structure, the
|
|
|
123 |
.I channel ,
|
|
|
124 |
is a handle to a file server.
|
|
|
125 |
Operations on a channel generate the following 9P messages.
|
|
|
126 |
The
|
|
|
127 |
.CW session
|
|
|
128 |
and
|
|
|
129 |
.CW attach
|
|
|
130 |
messages authenticate a connection, established by means external to 9P,
|
|
|
131 |
and validate its user.
|
|
|
132 |
The result is an authenticated
|
|
|
133 |
channel
|
|
|
134 |
referencing the root of the
|
|
|
135 |
server.
|
|
|
136 |
The
|
|
|
137 |
.CW clone
|
|
|
138 |
message makes a new channel identical to an existing channel, much like
|
|
|
139 |
the
|
|
|
140 |
.CW dup
|
|
|
141 |
system call.
|
|
|
142 |
A
|
|
|
143 |
channel
|
|
|
144 |
may be moved to a file on the server using a
|
|
|
145 |
.CW walk
|
|
|
146 |
message to descend each level in the hierarchy.
|
|
|
147 |
The
|
|
|
148 |
.CW stat
|
|
|
149 |
and
|
|
|
150 |
.CW wstat
|
|
|
151 |
messages read and write the attributes of the file referenced by a channel.
|
|
|
152 |
The
|
|
|
153 |
.CW open
|
|
|
154 |
message prepares a channel for subsequent
|
|
|
155 |
.CW read
|
|
|
156 |
and
|
|
|
157 |
.CW write
|
|
|
158 |
messages to access the contents of the file.
|
|
|
159 |
.CW Create
|
|
|
160 |
and
|
|
|
161 |
.CW remove
|
|
|
162 |
perform the actions implied by their names on the file
|
|
|
163 |
referenced by the channel.
|
|
|
164 |
The
|
|
|
165 |
.CW clunk
|
|
|
166 |
message discards a channel without affecting the file.
|
|
|
167 |
.PP
|
|
|
168 |
A kernel resident file server called the
|
|
|
169 |
.I "mount driver"
|
|
|
170 |
converts the procedural version of 9P into RPCs.
|
|
|
171 |
The
|
|
|
172 |
.I mount
|
|
|
173 |
system call provides a file descriptor, which can be
|
|
|
174 |
a pipe to a user process or a network connection to a remote machine, to
|
|
|
175 |
be associated with the mount point.
|
|
|
176 |
After a mount, operations
|
|
|
177 |
on the file tree below the mount point are sent as messages to the file server.
|
|
|
178 |
The
|
|
|
179 |
mount
|
|
|
180 |
driver manages buffers, packs and unpacks parameters from
|
|
|
181 |
messages, and demultiplexes among processes using the file server.
|
|
|
182 |
.NH 2
|
|
|
183 |
Kernel Organization
|
|
|
184 |
.PP
|
|
|
185 |
The network code in the kernel is divided into three layers: hardware interface,
|
|
|
186 |
protocol processing, and program interface.
|
|
|
187 |
A device driver typically uses streams to connect the two interface layers.
|
|
|
188 |
Additional stream modules may be pushed on
|
|
|
189 |
a device to process protocols.
|
|
|
190 |
Each device driver is a kernel-resident file system.
|
|
|
191 |
Simple device drivers serve a single level
|
|
|
192 |
directory containing just a few files;
|
|
|
193 |
for example, we represent each UART
|
|
|
194 |
by a data and a control file.
|
|
|
195 |
.P1
|
|
|
196 |
cpu% cd /dev
|
|
|
197 |
cpu% ls -l eia*
|
|
|
198 |
--rw-rw-rw- t 0 bootes bootes 0 Jul 16 17:28 eia1
|
|
|
199 |
--rw-rw-rw- t 0 bootes bootes 0 Jul 16 17:28 eia1ctl
|
|
|
200 |
--rw-rw-rw- t 0 bootes bootes 0 Jul 16 17:28 eia2
|
|
|
201 |
--rw-rw-rw- t 0 bootes bootes 0 Jul 16 17:28 eia2ctl
|
|
|
202 |
cpu%
|
|
|
203 |
.P2
|
|
|
204 |
The control file is used to control the device;
|
|
|
205 |
writing the string
|
|
|
206 |
.CW b1200
|
|
|
207 |
to
|
|
|
208 |
.CW /dev/eia1ctl
|
|
|
209 |
sets the line to 1200 baud.
|
|
|
210 |
.PP
|
|
|
211 |
Multiplexed devices present
|
|
|
212 |
a more complex interface structure.
|
|
|
213 |
For example, the LANCE Ethernet driver
|
|
|
214 |
serves a two level file tree (Figure 1)
|
|
|
215 |
providing
|
|
|
216 |
.IP \(bu
|
|
|
217 |
device control and configuration
|
|
|
218 |
.IP \(bu
|
|
|
219 |
user-level protocols like ARP
|
|
|
220 |
.IP \(bu
|
|
|
221 |
diagnostic interfaces for snooping software.
|
|
|
222 |
.LP
|
|
|
223 |
The top directory contains a
|
|
|
224 |
.CW clone
|
|
|
225 |
file and a directory for each connection, numbered
|
|
|
226 |
.CW 1
|
|
|
227 |
to
|
|
|
228 |
.CW n .
|
|
|
229 |
Each connection directory corresponds to an Ethernet packet type.
|
|
|
230 |
Opening the
|
|
|
231 |
.CW clone
|
|
|
232 |
file finds an unused connection directory
|
|
|
233 |
and opens its
|
|
|
234 |
.CW ctl
|
|
|
235 |
file.
|
|
|
236 |
Reading the control file returns the ASCII connection number; the user
|
|
|
237 |
process can use this value to construct the name of the proper
|
|
|
238 |
connection directory.
|
|
|
239 |
In each connection directory files named
|
|
|
240 |
.CW ctl ,
|
|
|
241 |
.CW data ,
|
|
|
242 |
.CW stats ,
|
|
|
243 |
and
|
|
|
244 |
.CW type
|
|
|
245 |
provide access to the connection.
|
|
|
246 |
Writing the string
|
|
|
247 |
.CW "connect 2048"
|
|
|
248 |
to the
|
|
|
249 |
.CW ctl
|
|
|
250 |
file sets the packet type to 2048
|
|
|
251 |
and
|
|
|
252 |
configures the connection to receive
|
|
|
253 |
all IP packets sent to the machine.
|
|
|
254 |
Subsequent reads of the file
|
|
|
255 |
.CW type
|
|
|
256 |
yield the string
|
|
|
257 |
.CW 2048 .
|
|
|
258 |
The
|
|
|
259 |
.CW data
|
|
|
260 |
file accesses the media;
|
|
|
261 |
reading it
|
|
|
262 |
returns the
|
|
|
263 |
next packet of the selected type.
|
|
|
264 |
Writing the file
|
|
|
265 |
queues a packet for transmission after
|
|
|
266 |
appending a packet header containing the source address and packet type.
|
|
|
267 |
The
|
|
|
268 |
.CW stats
|
|
|
269 |
file returns ASCII text containing the interface address,
|
|
|
270 |
packet input/output counts, error statistics, and general information
|
|
|
271 |
about the state of the interface.
|
|
|
272 |
.so tree.pout
|
|
|
273 |
.PP
|
|
|
274 |
If several connections on an interface
|
|
|
275 |
are configured for a particular packet type, each receives a
|
|
|
276 |
copy of the incoming packets.
|
|
|
277 |
The special packet type
|
|
|
278 |
.CW -1
|
|
|
279 |
selects all packets.
|
|
|
280 |
Writing the strings
|
|
|
281 |
.CW promiscuous
|
|
|
282 |
and
|
|
|
283 |
.CW connect
|
|
|
284 |
.CW -1
|
|
|
285 |
to the
|
|
|
286 |
.CW ctl
|
|
|
287 |
file
|
|
|
288 |
configures a conversation to receive all packets on the Ethernet.
|
|
|
289 |
.PP
|
|
|
290 |
Although the driver interface may seem elaborate,
|
|
|
291 |
the representation of a device as a set of files using ASCII strings for
|
|
|
292 |
communication has several advantages.
|
|
|
293 |
Any mechanism supporting remote access to files immediately
|
|
|
294 |
allows a remote machine to use our interfaces as gateways.
|
|
|
295 |
Using ASCII strings to control the interface avoids byte order problems and
|
|
|
296 |
ensures a uniform representation for
|
|
|
297 |
devices on the same machine and even allows devices to be accessed remotely.
|
|
|
298 |
Representing dissimilar devices by the same set of files allows common tools
|
|
|
299 |
to serve
|
|
|
300 |
several networks or interfaces.
|
|
|
301 |
Programs like
|
|
|
302 |
.CW stty
|
|
|
303 |
are replaced by
|
|
|
304 |
.CW echo
|
|
|
305 |
and shell redirection.
|
|
|
306 |
.NH 2
|
|
|
307 |
Protocol devices
|
|
|
308 |
.PP
|
|
|
309 |
Network connections are represented as pseudo-devices called protocol devices.
|
|
|
310 |
Protocol device drivers exist for the Datakit URP protocol and for each of the
|
|
|
311 |
Internet IP protocols TCP, UDP, and IL.
|
|
|
312 |
IL, described below, is a new communication protocol used by Plan 9 for
|
|
|
313 |
transmitting file system RPC's.
|
|
|
314 |
All protocol devices look identical so user programs contain no
|
|
|
315 |
network-specific code.
|
|
|
316 |
.PP
|
|
|
317 |
Each protocol device driver serves a directory structure
|
|
|
318 |
similar to that of the Ethernet driver.
|
|
|
319 |
The top directory contains a
|
|
|
320 |
.CW clone
|
|
|
321 |
file and a directory for each connection numbered
|
|
|
322 |
.CW 0
|
|
|
323 |
to
|
|
|
324 |
.CW n .
|
|
|
325 |
Each connection directory contains files to control one
|
|
|
326 |
connection and to send and receive information.
|
|
|
327 |
A TCP connection directory looks like this:
|
|
|
328 |
.P1
|
|
|
329 |
cpu% cd /net/tcp/2
|
|
|
330 |
cpu% ls -l
|
|
|
331 |
--rw-rw---- I 0 ehg bootes 0 Jul 13 21:14 ctl
|
|
|
332 |
--rw-rw---- I 0 ehg bootes 0 Jul 13 21:14 data
|
|
|
333 |
--rw-rw---- I 0 ehg bootes 0 Jul 13 21:14 listen
|
|
|
334 |
--r--r--r-- I 0 bootes bootes 0 Jul 13 21:14 local
|
|
|
335 |
--r--r--r-- I 0 bootes bootes 0 Jul 13 21:14 remote
|
|
|
336 |
--r--r--r-- I 0 bootes bootes 0 Jul 13 21:14 status
|
|
|
337 |
cpu% cat local remote status
|
|
|
338 |
135.104.9.31 5012
|
|
|
339 |
135.104.53.11 564
|
|
|
340 |
tcp/2 1 Established connect
|
|
|
341 |
cpu%
|
|
|
342 |
.P2
|
|
|
343 |
The files
|
|
|
344 |
.CW local ,
|
|
|
345 |
.CW remote ,
|
|
|
346 |
and
|
|
|
347 |
.CW status
|
|
|
348 |
supply information about the state of the connection.
|
|
|
349 |
The
|
|
|
350 |
.CW data
|
|
|
351 |
and
|
|
|
352 |
.CW ctl
|
|
|
353 |
files
|
|
|
354 |
provide access to the process end of the stream implementing the protocol.
|
|
|
355 |
The
|
|
|
356 |
.CW listen
|
|
|
357 |
file is used to accept incoming calls from the network.
|
|
|
358 |
.PP
|
|
|
359 |
The following steps establish a connection.
|
|
|
360 |
.IP 1)
|
|
|
361 |
The clone device of the
|
|
|
362 |
appropriate protocol directory is opened to reserve an unused connection.
|
|
|
363 |
.IP 2)
|
|
|
364 |
The file descriptor returned by the open points to the
|
|
|
365 |
.CW ctl
|
|
|
366 |
file of the new connection.
|
|
|
367 |
Reading that file descriptor returns an ASCII string containing
|
|
|
368 |
the connection number.
|
|
|
369 |
.IP 3)
|
|
|
370 |
A protocol/network specific ASCII address string is written to the
|
|
|
371 |
.CW ctl
|
|
|
372 |
file.
|
|
|
373 |
.IP 4)
|
|
|
374 |
The path of the
|
|
|
375 |
.CW data
|
|
|
376 |
file is constructed using the connection number.
|
|
|
377 |
When the
|
|
|
378 |
.CW data
|
|
|
379 |
file is opened the connection is established.
|
|
|
380 |
.LP
|
|
|
381 |
A process can read and write this file descriptor
|
|
|
382 |
to send and receive messages from the network.
|
|
|
383 |
If the process opens the
|
|
|
384 |
.CW listen
|
|
|
385 |
file it blocks until an incoming call is received.
|
|
|
386 |
An address string written to the
|
|
|
387 |
.CW ctl
|
|
|
388 |
file before the listen selects the
|
|
|
389 |
ports or services the process is prepared to accept.
|
|
|
390 |
When an incoming call is received, the open completes
|
|
|
391 |
and returns a file descriptor
|
|
|
392 |
pointing to the
|
|
|
393 |
.CW ctl
|
|
|
394 |
file of the new connection.
|
|
|
395 |
Reading the
|
|
|
396 |
.CW ctl
|
|
|
397 |
file yields a connection number used to construct the path of the
|
|
|
398 |
.CW data
|
|
|
399 |
file.
|
|
|
400 |
A connection remains established while any of the files in the connection directory
|
|
|
401 |
are referenced or until a close is received from the network.
|
|
|
402 |
.NH 2
|
|
|
403 |
Streams
|
|
|
404 |
.PP
|
|
|
405 |
A
|
|
|
406 |
.I stream
|
|
|
407 |
[Rit84a][Presotto] is a bidirectional channel connecting a
|
|
|
408 |
physical or pseudo-device to user processes.
|
|
|
409 |
The user processes insert and remove data at one end of the stream.
|
|
|
410 |
Kernel processes acting on behalf of a device insert data at
|
|
|
411 |
the other end.
|
|
|
412 |
Asynchronous communications channels such as pipes,
|
|
|
413 |
TCP conversations, Datakit conversations, and RS232 lines are implemented using
|
|
|
414 |
streams.
|
|
|
415 |
.PP
|
|
|
416 |
A stream comprises a linear list of
|
|
|
417 |
.I "processing modules" .
|
|
|
418 |
Each module has both an upstream (toward the process) and
|
|
|
419 |
downstream (toward the device)
|
|
|
420 |
.I "put routine" .
|
|
|
421 |
Calling the put routine of the module on either end of the stream
|
|
|
422 |
inserts data into the stream.
|
|
|
423 |
Each module calls the succeeding one to send data up or down the stream.
|
|
|
424 |
.PP
|
|
|
425 |
An instance of a processing module is represented by a pair of
|
|
|
426 |
.I queues ,
|
|
|
427 |
one for each direction.
|
|
|
428 |
The queues point to the put procedures and can be used
|
|
|
429 |
to queue information traveling along the stream.
|
|
|
430 |
Some put routines queue data locally and send it along the stream at some
|
|
|
431 |
later time, either due to a subsequent call or an asynchronous
|
|
|
432 |
event such as a retransmission timer or a device interrupt.
|
|
|
433 |
Processing modules create helper kernel processes to
|
|
|
434 |
provide a context for handling asynchronous events.
|
|
|
435 |
For example, a helper kernel process awakens periodically
|
|
|
436 |
to perform any necessary TCP retransmissions.
|
|
|
437 |
The use of kernel processes instead of serialized run-to-completion service routines
|
|
|
438 |
differs from the implementation of Unix streams.
|
|
|
439 |
Unix service routines cannot
|
|
|
440 |
use any blocking kernel resource and they lack a local long-lived state.
|
|
|
441 |
Helper kernel processes solve these problems and simplify the stream code.
|
|
|
442 |
.PP
|
|
|
443 |
There is no implicit synchronization in our streams.
|
|
|
444 |
Each processing module must ensure that concurrent processes using the stream
|
|
|
445 |
are synchronized.
|
|
|
446 |
This maximizes concurrency but introduces the
|
|
|
447 |
possibility of deadlock.
|
|
|
448 |
However, deadlocks are easily avoided by careful programming; to
|
|
|
449 |
date they have not caused us problems.
|
|
|
450 |
.PP
|
|
|
451 |
Information is represented by linked lists of kernel structures called
|
|
|
452 |
.I blocks .
|
|
|
453 |
Each block contains a type, some state flags, and pointers to
|
|
|
454 |
an optional buffer.
|
|
|
455 |
Block buffers can hold either data or control information, i.e., directives
|
|
|
456 |
to the processing modules.
|
|
|
457 |
Blocks and block buffers are dynamically allocated from kernel memory.
|
|
|
458 |
.NH 3
|
|
|
459 |
User Interface
|
|
|
460 |
.PP
|
|
|
461 |
A stream is represented at user level as two files,
|
|
|
462 |
.CW ctl
|
|
|
463 |
and
|
|
|
464 |
.CW data .
|
|
|
465 |
The actual names can be changed by the device driver using the stream,
|
|
|
466 |
as we saw earlier in the example of the UART driver.
|
|
|
467 |
The first process to open either file creates the stream automatically.
|
|
|
468 |
The last close destroys it.
|
|
|
469 |
Writing to the
|
|
|
470 |
.CW data
|
|
|
471 |
file copies the data into kernel blocks
|
|
|
472 |
and passes them to the downstream put routine of the first processing module.
|
|
|
473 |
A write of less than 32K is guaranteed to be contained by a single block.
|
|
|
474 |
Concurrent writes to the same stream are not synchronized, although the
|
|
|
475 |
32K block size assures atomic writes for most protocols.
|
|
|
476 |
The last block written is flagged with a delimiter
|
|
|
477 |
to alert downstream modules that care about write boundaries.
|
|
|
478 |
In most cases the first put routine calls the second, the second
|
|
|
479 |
calls the third, and so on until the data is output.
|
|
|
480 |
As a consequence, most data is output without context switching.
|
|
|
481 |
.PP
|
|
|
482 |
Reading from the
|
|
|
483 |
.CW data
|
|
|
484 |
file returns data queued at the top of the stream.
|
|
|
485 |
The read terminates when the read count is reached
|
|
|
486 |
or when the end of a delimited block is encountered.
|
|
|
487 |
A per stream read lock ensures only one process
|
|
|
488 |
can read from a stream at a time and guarantees
|
|
|
489 |
that the bytes read were contiguous bytes from the
|
|
|
490 |
stream.
|
|
|
491 |
.PP
|
|
|
492 |
Like UNIX streams [Rit84a],
|
|
|
493 |
Plan 9 streams can be dynamically configured.
|
|
|
494 |
The stream system intercepts and interprets
|
|
|
495 |
the following control blocks:
|
|
|
496 |
.IP "\f(CWpush\fP \fIname\fR" 15
|
|
|
497 |
adds an instance of the processing module
|
|
|
498 |
.I name
|
|
|
499 |
to the top of the stream.
|
|
|
500 |
.IP \f(CWpop\fP 15
|
|
|
501 |
removes the top module of the stream.
|
|
|
502 |
.IP \f(CWhangup\fP 15
|
|
|
503 |
sends a hangup message
|
|
|
504 |
up the stream from the device end.
|
|
|
505 |
.LP
|
|
|
506 |
Other control blocks are module-specific and are interpreted by each
|
|
|
507 |
processing module
|
|
|
508 |
as they pass.
|
|
|
509 |
.PP
|
|
|
510 |
The convoluted syntax and semantics of the UNIX
|
|
|
511 |
.CW ioctl
|
|
|
512 |
system call convinced us to leave it out of Plan 9.
|
|
|
513 |
Instead,
|
|
|
514 |
.CW ioctl
|
|
|
515 |
is replaced by the
|
|
|
516 |
.CW ctl
|
|
|
517 |
file.
|
|
|
518 |
Writing to the
|
|
|
519 |
.CW ctl
|
|
|
520 |
file
|
|
|
521 |
is identical to writing to a
|
|
|
522 |
.CW data
|
|
|
523 |
file except the blocks are of type
|
|
|
524 |
.I control .
|
|
|
525 |
A processing module parses each control block it sees.
|
|
|
526 |
Commands in control blocks are ASCII strings, so
|
|
|
527 |
byte ordering is not an issue when one system
|
|
|
528 |
controls streams in a name space implemented on another processor.
|
|
|
529 |
The time to parse control blocks is not important, since control
|
|
|
530 |
operations are rare.
|
|
|
531 |
.NH 3
|
|
|
532 |
Device Interface
|
|
|
533 |
.PP
|
|
|
534 |
The module at the downstream end of the stream is part of a device interface.
|
|
|
535 |
The particulars of the interface vary with the device.
|
|
|
536 |
Most device interfaces consist of an interrupt routine, an output
|
|
|
537 |
put routine, and a kernel process.
|
|
|
538 |
The output put routine stages data for the
|
|
|
539 |
device and starts the device if it is stopped.
|
|
|
540 |
The interrupt routine wakes up the kernel process whenever
|
|
|
541 |
the device has input to be processed or needs more output staged.
|
|
|
542 |
The kernel process puts information up the stream or stages more data for output.
|
|
|
543 |
The division of labor among the different pieces varies depending on
|
|
|
544 |
how much must be done at interrupt level.
|
|
|
545 |
However, the interrupt routine may not allocate blocks or call
|
|
|
546 |
a put routine since both actions require a process context.
|
|
|
547 |
.NH 3
|
|
|
548 |
Multiplexing
|
|
|
549 |
.PP
|
|
|
550 |
The conversations using a protocol device must be
|
|
|
551 |
multiplexed onto a single physical wire.
|
|
|
552 |
We push a multiplexer processing module
|
|
|
553 |
onto the physical device stream to group the conversations.
|
|
|
554 |
The device end modules on the conversations add the necessary header
|
|
|
555 |
onto downstream messages and then put them to the module downstream
|
|
|
556 |
of the multiplexer.
|
|
|
557 |
The multiplexing module looks at each message moving up its stream and
|
|
|
558 |
puts it to the correct conversation stream after stripping
|
|
|
559 |
the header controlling the demultiplexing.
|
|
|
560 |
.PP
|
|
|
561 |
This is similar to the Unix implementation of multiplexer streams.
|
|
|
562 |
The major difference is that we have no general structure that
|
|
|
563 |
corresponds to a multiplexer.
|
|
|
564 |
Each attempt to produce a generalized multiplexer created a more complicated
|
|
|
565 |
structure and underlined the basic difficulty of generalizing this mechanism.
|
|
|
566 |
We now code each multiplexer from scratch and favor simplicity over
|
|
|
567 |
generality.
|
|
|
568 |
.NH 3
|
|
|
569 |
Reflections
|
|
|
570 |
.PP
|
|
|
571 |
Despite five year's experience and the efforts of many programmers,
|
|
|
572 |
we remain dissatisfied with the stream mechanism.
|
|
|
573 |
Performance is not an issue;
|
|
|
574 |
the time to process protocols and drive
|
|
|
575 |
device interfaces continues to dwarf the
|
|
|
576 |
time spent allocating, freeing, and moving blocks
|
|
|
577 |
of data.
|
|
|
578 |
However the mechanism remains inordinately
|
|
|
579 |
complex.
|
|
|
580 |
Much of the complexity results from our efforts
|
|
|
581 |
to make streams dynamically configurable, to
|
|
|
582 |
reuse processing modules on different devices
|
|
|
583 |
and to provide kernel synchronization
|
|
|
584 |
to ensure data structures
|
|
|
585 |
don't disappear under foot.
|
|
|
586 |
This is particularly irritating since we seldom use these properties.
|
|
|
587 |
.PP
|
|
|
588 |
Streams remain in our kernel because we are unable to
|
|
|
589 |
devise a better alternative.
|
|
|
590 |
Larry Peterson's X-kernel [Pet89a]
|
|
|
591 |
is the closest contender but
|
|
|
592 |
doesn't offer enough advantage to switch.
|
|
|
593 |
If we were to rewrite the streams code, we would probably statically
|
|
|
594 |
allocate resources for a large fixed number of conversations and burn
|
|
|
595 |
memory in favor of less complexity.
|
|
|
596 |
.NH
|
|
|
597 |
The IL Protocol
|
|
|
598 |
.PP
|
|
|
599 |
None of the standard IP protocols is suitable for transmission of
|
|
|
600 |
9P messages over an Ethernet or the Internet.
|
|
|
601 |
TCP has a high overhead and does not preserve delimiters.
|
|
|
602 |
UDP, while cheap, does not provide reliable sequenced delivery.
|
|
|
603 |
Early versions of the system used a custom protocol that was
|
|
|
604 |
efficient but unsatisfactory for internetwork transmission.
|
|
|
605 |
When we implemented IP, TCP, and UDP we looked around for a suitable
|
|
|
606 |
replacement with the following properties:
|
|
|
607 |
.IP \(bu
|
|
|
608 |
Reliable datagram service with sequenced delivery
|
|
|
609 |
.IP \(bu
|
|
|
610 |
Runs over IP
|
|
|
611 |
.IP \(bu
|
|
|
612 |
Low complexity, high performance
|
|
|
613 |
.IP \(bu
|
|
|
614 |
Adaptive timeouts
|
|
|
615 |
.LP
|
|
|
616 |
None met our needs so a new protocol was designed.
|
|
|
617 |
IL is a lightweight protocol designed to be encapsulated by IP.
|
|
|
618 |
It is a connection-based protocol
|
|
|
619 |
providing reliable transmission of sequenced messages between machines.
|
|
|
620 |
No provision is made for flow control since the protocol is designed to transport RPC
|
|
|
621 |
messages between client and server.
|
|
|
622 |
A small outstanding message window prevents too
|
|
|
623 |
many incoming messages from being buffered;
|
|
|
624 |
messages outside the window are discarded
|
|
|
625 |
and must be retransmitted.
|
|
|
626 |
Connection setup uses a two way handshake to generate
|
|
|
627 |
initial sequence numbers at each end of the connection;
|
|
|
628 |
subsequent data messages increment the
|
|
|
629 |
sequence numbers allowing
|
|
|
630 |
the receiver to resequence out of order messages.
|
|
|
631 |
In contrast to other protocols, IL does not do blind retransmission.
|
|
|
632 |
If a message is lost and a timeout occurs, a query message is sent.
|
|
|
633 |
The query message is a small control message containing the current
|
|
|
634 |
sequence numbers as seen by the sender.
|
|
|
635 |
The receiver responds to a query by retransmitting missing messages.
|
|
|
636 |
This allows the protocol to behave well in congested networks,
|
|
|
637 |
where blind retransmission would cause further
|
|
|
638 |
congestion.
|
|
|
639 |
Like TCP, IL has adaptive timeouts.
|
|
|
640 |
A round-trip timer is used
|
|
|
641 |
to calculate acknowledge and retransmission times in terms of the network speed.
|
|
|
642 |
This allows the protocol to perform well on both the Internet and on local Ethernets.
|
|
|
643 |
.PP
|
|
|
644 |
In keeping with the minimalist design of the rest of the kernel, IL is small.
|
|
|
645 |
The entire protocol is 847 lines of code, compared to 2200 lines for TCP.
|
|
|
646 |
IL is our protocol of choice.
|
|
|
647 |
.NH
|
|
|
648 |
Network Addressing
|
|
|
649 |
.PP
|
|
|
650 |
A uniform interface to protocols and devices is not sufficient to
|
|
|
651 |
support the transparency we require.
|
|
|
652 |
Since each network uses a different
|
|
|
653 |
addressing scheme,
|
|
|
654 |
the ASCII strings written to a control file have no common format.
|
|
|
655 |
As a result, every tool must know the specifics of the networks it
|
|
|
656 |
is capable of addressing.
|
|
|
657 |
Moreover, since each machine supplies a subset
|
|
|
658 |
of the available networks, each user must be aware of the networks supported
|
|
|
659 |
by every terminal and server machine.
|
|
|
660 |
This is obviously unacceptable.
|
|
|
661 |
.PP
|
|
|
662 |
Several possible solutions were considered and rejected; one deserves
|
|
|
663 |
more discussion.
|
|
|
664 |
We could have used a user-level file server
|
|
|
665 |
to represent the network name space as a Plan 9 file tree.
|
|
|
666 |
This global naming scheme has been implemented in other distributed systems.
|
|
|
667 |
The file hierarchy provides paths to
|
|
|
668 |
directories representing network domains.
|
|
|
669 |
Each directory contains
|
|
|
670 |
files representing the names of the machines in that domain;
|
|
|
671 |
an example might be the path
|
|
|
672 |
.CW /net/name/usa/edu/mit/ai .
|
|
|
673 |
Each machine file contains information like the IP address of the machine.
|
|
|
674 |
We rejected this representation for several reasons.
|
|
|
675 |
First, it is hard to devise a hierarchy encompassing all representations
|
|
|
676 |
of the various network addressing schemes in a uniform manner.
|
|
|
677 |
Datakit and Ethernet address strings have nothing in common.
|
|
|
678 |
Second, the address of a machine is
|
|
|
679 |
often only a small part of the information required to connect to a service on
|
|
|
680 |
the machine.
|
|
|
681 |
For example, the IP protocols require symbolic service names to be mapped into
|
|
|
682 |
numeric port numbers, some of which are privileged and hence special.
|
|
|
683 |
Information of this sort is hard to represent in terms of file operations.
|
|
|
684 |
Finally, the size and number of the networks being represented burdens users with
|
|
|
685 |
an unacceptably large amount of information about the organization of the network
|
|
|
686 |
and its connectivity.
|
|
|
687 |
In this case the Plan 9 representation of a
|
|
|
688 |
resource as a file is not appropriate.
|
|
|
689 |
.PP
|
|
|
690 |
If tools are to be network independent, a third-party server must resolve
|
|
|
691 |
network names.
|
|
|
692 |
A server on each machine, with local knowledge, can select the best network
|
|
|
693 |
for any particular destination machine or service.
|
|
|
694 |
Since the network devices present a common interface,
|
|
|
695 |
the only operation which differs between networks is name resolution.
|
|
|
696 |
A symbolic name must be translated to
|
|
|
697 |
the path of the clone file of a protocol
|
|
|
698 |
device and an ASCII address string to write to the
|
|
|
699 |
.CW ctl
|
|
|
700 |
file.
|
|
|
701 |
A connection server (CS) provides this service.
|
|
|
702 |
.NH 2
|
|
|
703 |
Network Database
|
|
|
704 |
.PP
|
|
|
705 |
On most systems several
|
|
|
706 |
files such as
|
|
|
707 |
.CW /etc/hosts ,
|
|
|
708 |
.CW /etc/networks ,
|
|
|
709 |
.CW /etc/services ,
|
|
|
710 |
.CW /etc/hosts.equiv ,
|
|
|
711 |
.CW /etc/bootptab ,
|
|
|
712 |
and
|
|
|
713 |
.CW /etc/named.d
|
|
|
714 |
hold network information.
|
|
|
715 |
Much time and effort is spent
|
|
|
716 |
administering these files and keeping
|
|
|
717 |
them mutually consistent.
|
|
|
718 |
Tools attempt to
|
|
|
719 |
automatically derive one or more of the files from
|
|
|
720 |
information in other files but maintenance continues to be
|
|
|
721 |
difficult and error prone.
|
|
|
722 |
.PP
|
|
|
723 |
Since we were writing an entirely new system, we were free to
|
|
|
724 |
try a simpler approach.
|
|
|
725 |
One database on a shared server contains all the information
|
|
|
726 |
needed for network administration.
|
|
|
727 |
Two ASCII files comprise the main database:
|
|
|
728 |
.CW /lib/ndb/local
|
|
|
729 |
contains locally administered information and
|
|
|
730 |
.CW /lib/ndb/global
|
|
|
731 |
contains information imported from elsewhere.
|
|
|
732 |
The files contain sets of attribute/value pairs of the form
|
|
|
733 |
.I attr\f(CW=\fPvalue ,
|
|
|
734 |
where
|
|
|
735 |
.I attr
|
|
|
736 |
and
|
|
|
737 |
.I value
|
|
|
738 |
are alphanumeric strings.
|
|
|
739 |
Systems are described by multi-line entries;
|
|
|
740 |
a header line at the left margin begins each entry followed by zero or more
|
|
|
741 |
indented attribute/value pairs specifying
|
|
|
742 |
names, addresses, properties, etc.
|
|
|
743 |
For example, the entry for our CPU server
|
|
|
744 |
specifies a domain name, an IP address, an Ethernet address,
|
|
|
745 |
a Datakit address, a boot file, and supported protocols.
|
|
|
746 |
.P1
|
|
|
747 |
sys=helix
|
|
|
748 |
dom=helix.research.bell-labs.com
|
|
|
749 |
bootf=/mips/9power
|
|
|
750 |
ip=135.104.9.31 ether=0800690222f0
|
|
|
751 |
dk=nj/astro/helix
|
|
|
752 |
proto=il flavor=9cpu
|
|
|
753 |
.P2
|
|
|
754 |
If several systems share entries such as
|
|
|
755 |
network mask and gateway, we specify that information
|
|
|
756 |
with the network or subnetwork instead of the system.
|
|
|
757 |
The following entries define a Class B IP network and
|
|
|
758 |
a few subnets derived from it.
|
|
|
759 |
The entry for the network specifies the IP mask,
|
|
|
760 |
file system, and authentication server for all systems
|
|
|
761 |
on the network.
|
|
|
762 |
Each subnetwork specifies its default IP gateway.
|
|
|
763 |
.P1
|
|
|
764 |
ipnet=mh-astro-net ip=135.104.0.0 ipmask=255.255.255.0
|
|
|
765 |
fs=bootes.research.bell-labs.com
|
|
|
766 |
auth=1127auth
|
|
|
767 |
ipnet=unix-room ip=135.104.117.0
|
|
|
768 |
ipgw=135.104.117.1
|
|
|
769 |
ipnet=third-floor ip=135.104.51.0
|
|
|
770 |
ipgw=135.104.51.1
|
|
|
771 |
ipnet=fourth-floor ip=135.104.52.0
|
|
|
772 |
ipgw=135.104.52.1
|
|
|
773 |
.P2
|
|
|
774 |
Database entries also define the mapping of service names
|
|
|
775 |
to port numbers for TCP, UDP, and IL.
|
|
|
776 |
.P1
|
|
|
777 |
tcp=echo port=7
|
|
|
778 |
tcp=discard port=9
|
|
|
779 |
tcp=systat port=11
|
|
|
780 |
tcp=daytime port=13
|
|
|
781 |
.P2
|
|
|
782 |
.PP
|
|
|
783 |
All programs read the database directly so
|
|
|
784 |
consistency problems are rare.
|
|
|
785 |
However the database files can become large.
|
|
|
786 |
Our global file, containing all information about
|
|
|
787 |
both Datakit and Internet systems in AT&T, has 43,000
|
|
|
788 |
lines.
|
|
|
789 |
To speed searches, we build hash table files for each
|
|
|
790 |
attribute we expect to search often.
|
|
|
791 |
The hash file entries point to entries
|
|
|
792 |
in the master files.
|
|
|
793 |
Every hash file contains the modification time of its master
|
|
|
794 |
file so we can avoid using an out-of-date hash table.
|
|
|
795 |
Searches for attributes that aren't hashed or whose hash table
|
|
|
796 |
is out-of-date still work, they just take longer.
|
|
|
797 |
.NH 2
|
|
|
798 |
Connection Server
|
|
|
799 |
.PP
|
|
|
800 |
On each system a user level connection server process, CS, translates
|
|
|
801 |
symbolic names to addresses.
|
|
|
802 |
CS uses information about available networks, the network database, and
|
|
|
803 |
other servers (such as DNS) to translate names.
|
|
|
804 |
CS is a file server serving a single file,
|
|
|
805 |
.CW /net/cs .
|
|
|
806 |
A client writes a symbolic name to
|
|
|
807 |
.CW /net/cs
|
|
|
808 |
then reads one line for each matching destination reachable
|
|
|
809 |
from this system.
|
|
|
810 |
The lines are of the form
|
|
|
811 |
.I "filename message",
|
|
|
812 |
where
|
|
|
813 |
.I filename
|
|
|
814 |
is the path of the clone file to open for a new connection and
|
|
|
815 |
.I message
|
|
|
816 |
is the string to write to it to make the connection.
|
|
|
817 |
The following example illustrates this.
|
|
|
818 |
.CW Ndb/csquery
|
|
|
819 |
is a program that prompts for strings to write to
|
|
|
820 |
.CW /net/cs
|
|
|
821 |
and prints the replies.
|
|
|
822 |
.P1
|
|
|
823 |
% ndb/csquery
|
|
|
824 |
> net!helix!9fs
|
|
|
825 |
/net/il/clone 135.104.9.31!17008
|
|
|
826 |
/net/dk/clone nj/astro/helix!9fs
|
|
|
827 |
.P2
|
|
|
828 |
.PP
|
|
|
829 |
CS provides meta-name translation to perform complicated
|
|
|
830 |
searches.
|
|
|
831 |
The special network name
|
|
|
832 |
.CW net
|
|
|
833 |
selects any network in common between source and
|
|
|
834 |
destination supporting the specified service.
|
|
|
835 |
A host name of the form \f(CW$\fIattr\f1
|
|
|
836 |
is the name of an attribute in the network database.
|
|
|
837 |
The database search returns the value
|
|
|
838 |
of the matching attribute/value pair
|
|
|
839 |
most closely associated with the source host.
|
|
|
840 |
Most closely associated is defined on a per network basis.
|
|
|
841 |
For example, the symbolic name
|
|
|
842 |
.CW tcp!$auth!rexauth
|
|
|
843 |
causes CS to search for the
|
|
|
844 |
.CW auth
|
|
|
845 |
attribute in the database entry for the source system, then its
|
|
|
846 |
subnetwork (if there is one) and then its network.
|
|
|
847 |
.P1
|
|
|
848 |
% ndb/csquery
|
|
|
849 |
> net!$auth!rexauth
|
|
|
850 |
/net/il/clone 135.104.9.34!17021
|
|
|
851 |
/net/dk/clone nj/astro/p9auth!rexauth
|
|
|
852 |
/net/il/clone 135.104.9.6!17021
|
|
|
853 |
/net/dk/clone nj/astro/musca!rexauth
|
|
|
854 |
.P2
|
|
|
855 |
.PP
|
|
|
856 |
Normally CS derives naming information from its database files.
|
|
|
857 |
For domain names however, CS first consults another user level
|
|
|
858 |
process, the domain name server (DNS).
|
|
|
859 |
If no DNS is reachable, CS relies on its own tables.
|
|
|
860 |
.PP
|
|
|
861 |
Like CS, the domain name server is a user level process providing
|
|
|
862 |
one file,
|
|
|
863 |
.CW /net/dns .
|
|
|
864 |
A client writes a request of the form
|
|
|
865 |
.I "domain-name type" ,
|
|
|
866 |
where
|
|
|
867 |
.I type
|
|
|
868 |
is a domain name service resource record type.
|
|
|
869 |
DNS performs a recursive query through the
|
|
|
870 |
Internet domain name system producing one line
|
|
|
871 |
per resource record found. The client reads
|
|
|
872 |
.CW /net/dns
|
|
|
873 |
to retrieve the records.
|
|
|
874 |
Like other domain name servers, DNS caches information
|
|
|
875 |
learned from the network.
|
|
|
876 |
DNS is implemented as a multi-process shared memory application
|
|
|
877 |
with separate processes listening for network and local requests.
|
|
|
878 |
.NH
|
|
|
879 |
Library routines
|
|
|
880 |
.PP
|
|
|
881 |
The section on protocol devices described the details
|
|
|
882 |
of making and receiving connections across a network.
|
|
|
883 |
The dance is straightforward but tedious.
|
|
|
884 |
Library routines are provided to relieve
|
|
|
885 |
the programmer of the details.
|
|
|
886 |
.NH 2
|
|
|
887 |
Connecting
|
|
|
888 |
.PP
|
|
|
889 |
The
|
|
|
890 |
.CW dial
|
|
|
891 |
library call establishes a connection to a remote destination.
|
|
|
892 |
It
|
|
|
893 |
returns an open file descriptor for the
|
|
|
894 |
.CW data
|
|
|
895 |
file in the connection directory.
|
|
|
896 |
.P1
|
|
|
897 |
int dial(char *dest, char *local, char *dir, int *cfdp)
|
|
|
898 |
.P2
|
|
|
899 |
.IP \f(CWdest\fP 10
|
|
|
900 |
is the symbolic name/address of the destination.
|
|
|
901 |
.IP \f(CWlocal\fP 10
|
|
|
902 |
is the local address.
|
|
|
903 |
Since most networks do not support this, it is
|
|
|
904 |
usually zero.
|
|
|
905 |
.IP \f(CWdir\fP 10
|
|
|
906 |
is a pointer to a buffer to hold the path name of the protocol directory
|
|
|
907 |
representing this connection.
|
|
|
908 |
.CW Dial
|
|
|
909 |
fills this buffer if the pointer is non-zero.
|
|
|
910 |
.IP \f(CWcfdp\fP 10
|
|
|
911 |
is a pointer to a file descriptor for the
|
|
|
912 |
.CW ctl
|
|
|
913 |
file of the connection.
|
|
|
914 |
If the pointer is non-zero,
|
|
|
915 |
.CW dial
|
|
|
916 |
opens the control file and tucks the file descriptor here.
|
|
|
917 |
.LP
|
|
|
918 |
Most programs call
|
|
|
919 |
.CW dial
|
|
|
920 |
with a destination name and all other arguments zero.
|
|
|
921 |
.CW Dial
|
|
|
922 |
uses CS to
|
|
|
923 |
translate the symbolic name to all possible destination addresses
|
|
|
924 |
and attempts to connect to each in turn until one works.
|
|
|
925 |
Specifying the special name
|
|
|
926 |
.CW net
|
|
|
927 |
in the network portion of the destination
|
|
|
928 |
allows CS to pick a network/protocol in common
|
|
|
929 |
with the destination for which the requested service is valid.
|
|
|
930 |
For example, assume the system
|
|
|
931 |
.CW research.bell-labs.com
|
|
|
932 |
has the Datakit address
|
|
|
933 |
.CW nj/astro/research
|
|
|
934 |
and IP addresses
|
|
|
935 |
.CW 135.104.117.5
|
|
|
936 |
and
|
|
|
937 |
.CW 129.11.4.1 .
|
|
|
938 |
The call
|
|
|
939 |
.P1
|
|
|
940 |
fd = dial("net!research.bell-labs.com!login", 0, 0, 0, 0);
|
|
|
941 |
.P2
|
|
|
942 |
tries in succession to connect to
|
|
|
943 |
.CW nj/astro/research!login
|
|
|
944 |
on the Datakit and both
|
|
|
945 |
.CW 135.104.117.5!513
|
|
|
946 |
and
|
|
|
947 |
.CW 129.11.4.1!513
|
|
|
948 |
across the Internet.
|
|
|
949 |
.PP
|
|
|
950 |
.CW Dial
|
|
|
951 |
accepts addresses instead of symbolic names.
|
|
|
952 |
For example, the destinations
|
|
|
953 |
.CW tcp!135.104.117.5!513
|
|
|
954 |
and
|
|
|
955 |
.CW tcp!research.bell-labs.com!login
|
|
|
956 |
are equivalent
|
|
|
957 |
references to the same machine.
|
|
|
958 |
.NH 2
|
|
|
959 |
Listening
|
|
|
960 |
.PP
|
|
|
961 |
A program uses
|
|
|
962 |
four routines to listen for incoming connections.
|
|
|
963 |
It first
|
|
|
964 |
.CW announce() s
|
|
|
965 |
its intention to receive connections,
|
|
|
966 |
then
|
|
|
967 |
.CW listen() s
|
|
|
968 |
for calls and finally
|
|
|
969 |
.CW accept() s
|
|
|
970 |
or
|
|
|
971 |
.CW reject() s
|
|
|
972 |
them.
|
|
|
973 |
.CW Announce
|
|
|
974 |
returns an open file descriptor for the
|
|
|
975 |
.CW ctl
|
|
|
976 |
file of a connection and fills
|
|
|
977 |
.CW dir
|
|
|
978 |
with the
|
|
|
979 |
path of the protocol directory
|
|
|
980 |
for the announcement.
|
|
|
981 |
.P1
|
|
|
982 |
int announce(char *addr, char *dir)
|
|
|
983 |
.P2
|
|
|
984 |
.CW Addr
|
|
|
985 |
is the symbolic name/address announced;
|
|
|
986 |
if it does not contain a service, the announcement is for
|
|
|
987 |
all services not explicitly announced.
|
|
|
988 |
Thus, one can easily write the equivalent of the
|
|
|
989 |
.CW inetd
|
|
|
990 |
program without
|
|
|
991 |
having to announce each separate service.
|
|
|
992 |
An announcement remains in force until the control file is
|
|
|
993 |
closed.
|
|
|
994 |
.LP
|
|
|
995 |
.CW Listen
|
|
|
996 |
returns an open file descriptor for the
|
|
|
997 |
.CW ctl
|
|
|
998 |
file and fills
|
|
|
999 |
.CW ldir
|
|
|
1000 |
with the path
|
|
|
1001 |
of the protocol directory
|
|
|
1002 |
for the received connection.
|
|
|
1003 |
It is passed
|
|
|
1004 |
.CW dir
|
|
|
1005 |
from the announcement.
|
|
|
1006 |
.P1
|
|
|
1007 |
int listen(char *dir, char *ldir)
|
|
|
1008 |
.P2
|
|
|
1009 |
.LP
|
|
|
1010 |
.CW Accept
|
|
|
1011 |
and
|
|
|
1012 |
.CW reject
|
|
|
1013 |
are called with the control file descriptor and
|
|
|
1014 |
.CW ldir
|
|
|
1015 |
returned by
|
|
|
1016 |
.CW listen.
|
|
|
1017 |
Some networks such as Datakit accept a reason for a rejection;
|
|
|
1018 |
networks such as IP ignore the third argument.
|
|
|
1019 |
.P1
|
|
|
1020 |
int accept(int ctl, char *ldir)
|
|
|
1021 |
int reject(int ctl, char *ldir, char *reason)
|
|
|
1022 |
.P2
|
|
|
1023 |
.PP
|
|
|
1024 |
The following code implements a typical TCP listener.
|
|
|
1025 |
It announces itself, listens for connections, and forks a new
|
|
|
1026 |
process for each.
|
|
|
1027 |
The new process echoes data on the connection until the
|
|
|
1028 |
remote end closes it.
|
|
|
1029 |
The "*" in the symbolic name means the announcement is valid for
|
|
|
1030 |
any addresses bound to the machine the program is run on.
|
|
|
1031 |
.P1
|
|
|
1032 |
.ta 8n 16n 24n 32n 40n 48n 56n 64n
|
|
|
1033 |
int
|
|
|
1034 |
echo_server(void)
|
|
|
1035 |
{
|
|
|
1036 |
int dfd, lcfd;
|
|
|
1037 |
char adir[40], ldir[40];
|
|
|
1038 |
int n;
|
|
|
1039 |
char buf[256];
|
|
|
1040 |
|
|
|
1041 |
afd = announce("tcp!*!echo", adir);
|
|
|
1042 |
if(afd < 0)
|
|
|
1043 |
return -1;
|
|
|
1044 |
|
|
|
1045 |
for(;;){
|
|
|
1046 |
/* listen for a call */
|
|
|
1047 |
lcfd = listen(adir, ldir);
|
|
|
1048 |
if(lcfd < 0)
|
|
|
1049 |
return -1;
|
|
|
1050 |
|
|
|
1051 |
/* fork a process to echo */
|
|
|
1052 |
switch(fork()){
|
|
|
1053 |
case 0:
|
|
|
1054 |
/* accept the call and open the data file */
|
|
|
1055 |
dfd = accept(lcfd, ldir);
|
|
|
1056 |
if(dfd < 0)
|
|
|
1057 |
return -1;
|
|
|
1058 |
|
|
|
1059 |
/* echo until EOF */
|
|
|
1060 |
while((n = read(dfd, buf, sizeof(buf))) > 0)
|
|
|
1061 |
write(dfd, buf, n);
|
|
|
1062 |
exits(0);
|
|
|
1063 |
case -1:
|
|
|
1064 |
perror("forking");
|
|
|
1065 |
default:
|
|
|
1066 |
close(lcfd);
|
|
|
1067 |
break;
|
|
|
1068 |
}
|
|
|
1069 |
|
|
|
1070 |
}
|
|
|
1071 |
}
|
|
|
1072 |
.P2
|
|
|
1073 |
.NH
|
|
|
1074 |
User Level
|
|
|
1075 |
.PP
|
|
|
1076 |
Communication between Plan 9 machines is done almost exclusively in
|
|
|
1077 |
terms of 9P messages. Only the two services
|
|
|
1078 |
.CW cpu
|
|
|
1079 |
and
|
|
|
1080 |
.CW exportfs
|
|
|
1081 |
are used.
|
|
|
1082 |
The
|
|
|
1083 |
.CW cpu
|
|
|
1084 |
service is analogous to
|
|
|
1085 |
.CW rlogin .
|
|
|
1086 |
However, rather than emulating a terminal session
|
|
|
1087 |
across the network,
|
|
|
1088 |
.CW cpu
|
|
|
1089 |
creates a process on the remote machine whose name space is an analogue of the window
|
|
|
1090 |
in which it was invoked.
|
|
|
1091 |
.CW Exportfs
|
|
|
1092 |
is a user level file server which allows a piece of name space to be
|
|
|
1093 |
exported from machine to machine across a network. It is used by the
|
|
|
1094 |
.CW cpu
|
|
|
1095 |
command to serve the files in the terminal's name space when they are
|
|
|
1096 |
accessed from the
|
|
|
1097 |
cpu server.
|
|
|
1098 |
.PP
|
|
|
1099 |
By convention, the protocol and device driver file systems are mounted in a
|
|
|
1100 |
directory called
|
|
|
1101 |
.CW /net .
|
|
|
1102 |
Although the per-process name space allows users to configure an
|
|
|
1103 |
arbitrary view of the system, in practice their profiles build
|
|
|
1104 |
a conventional name space.
|
|
|
1105 |
.NH 2
|
|
|
1106 |
Exportfs
|
|
|
1107 |
.PP
|
|
|
1108 |
.CW Exportfs
|
|
|
1109 |
is invoked by an incoming network call.
|
|
|
1110 |
The
|
|
|
1111 |
.I listener
|
|
|
1112 |
(the Plan 9 equivalent of
|
|
|
1113 |
.CW inetd )
|
|
|
1114 |
runs the profile of the user
|
|
|
1115 |
requesting the service to construct a name space before starting
|
|
|
1116 |
.CW exportfs .
|
|
|
1117 |
After an initial protocol
|
|
|
1118 |
establishes the root of the file tree being
|
|
|
1119 |
exported,
|
|
|
1120 |
the remote process mounts the connection,
|
|
|
1121 |
allowing
|
|
|
1122 |
.CW exportfs
|
|
|
1123 |
to act as a relay file server. Operations in the imported file tree
|
|
|
1124 |
are executed on the remote server and the results returned.
|
|
|
1125 |
As a result
|
|
|
1126 |
the name space of the remote machine appears to be exported into a
|
|
|
1127 |
local file tree.
|
|
|
1128 |
.PP
|
|
|
1129 |
The
|
|
|
1130 |
.CW import
|
|
|
1131 |
command calls
|
|
|
1132 |
.CW exportfs
|
|
|
1133 |
on a remote machine, mounts the result in the local name space,
|
|
|
1134 |
and
|
|
|
1135 |
exits.
|
|
|
1136 |
No local process is required to serve mounts;
|
|
|
1137 |
9P messages are generated by the kernel's mount driver and sent
|
|
|
1138 |
directly over the network.
|
|
|
1139 |
.PP
|
|
|
1140 |
.CW Exportfs
|
|
|
1141 |
must be multithreaded since the system calls
|
|
|
1142 |
.CW open,
|
|
|
1143 |
.CW read
|
|
|
1144 |
and
|
|
|
1145 |
.CW write
|
|
|
1146 |
may block.
|
|
|
1147 |
Plan 9 does not implement the
|
|
|
1148 |
.CW select
|
|
|
1149 |
system call but does allow processes to share file descriptors,
|
|
|
1150 |
memory and other resources.
|
|
|
1151 |
.CW Exportfs
|
|
|
1152 |
and the configurable name space
|
|
|
1153 |
provide a means of sharing resources between machines.
|
|
|
1154 |
It is a building block for constructing complex name spaces
|
|
|
1155 |
served from many machines.
|
|
|
1156 |
.PP
|
|
|
1157 |
The simplicity of the interfaces encourages naive users to exploit the potential
|
|
|
1158 |
of a richly connected environment.
|
|
|
1159 |
Using these tools it is easy to gateway between networks.
|
|
|
1160 |
For example a terminal with only a Datakit connection can import from the server
|
|
|
1161 |
.CW helix :
|
|
|
1162 |
.P1
|
|
|
1163 |
import -a helix /net
|
|
|
1164 |
telnet ai.mit.edu
|
|
|
1165 |
.P2
|
|
|
1166 |
The
|
|
|
1167 |
.CW import
|
|
|
1168 |
command makes a Datakit connection to the machine
|
|
|
1169 |
.CW helix
|
|
|
1170 |
where
|
|
|
1171 |
it starts an instance
|
|
|
1172 |
.CW exportfs
|
|
|
1173 |
to serve
|
|
|
1174 |
.CW /net .
|
|
|
1175 |
The
|
|
|
1176 |
.CW import
|
|
|
1177 |
command mounts the remote
|
|
|
1178 |
.CW /net
|
|
|
1179 |
directory after (the
|
|
|
1180 |
.CW -a
|
|
|
1181 |
option to
|
|
|
1182 |
.CW import )
|
|
|
1183 |
the existing contents
|
|
|
1184 |
of the local
|
|
|
1185 |
.CW /net
|
|
|
1186 |
directory.
|
|
|
1187 |
The directory contains the union of the local and remote contents of
|
|
|
1188 |
.CW /net .
|
|
|
1189 |
Local entries supersede remote ones of the same name so
|
|
|
1190 |
networks on the local machine are chosen in preference
|
|
|
1191 |
to those supplied remotely.
|
|
|
1192 |
However, unique entries in the remote directory are now visible in the local
|
|
|
1193 |
.CW /net
|
|
|
1194 |
directory.
|
|
|
1195 |
All the networks connected to
|
|
|
1196 |
.CW helix ,
|
|
|
1197 |
not just Datakit,
|
|
|
1198 |
are now available in the terminal. The effect on the name space is shown by the following
|
|
|
1199 |
example:
|
|
|
1200 |
.P1
|
|
|
1201 |
philw-gnot% ls /net
|
|
|
1202 |
/net/cs
|
|
|
1203 |
/net/dk
|
|
|
1204 |
philw-gnot% import -a musca /net
|
|
|
1205 |
philw-gnot% ls /net
|
|
|
1206 |
/net/cs
|
|
|
1207 |
/net/cs
|
|
|
1208 |
/net/dk
|
|
|
1209 |
/net/dk
|
|
|
1210 |
/net/dns
|
|
|
1211 |
/net/ether
|
|
|
1212 |
/net/il
|
|
|
1213 |
/net/tcp
|
|
|
1214 |
/net/udp
|
|
|
1215 |
.P2
|
|
|
1216 |
.NH 2
|
|
|
1217 |
Ftpfs
|
|
|
1218 |
.PP
|
|
|
1219 |
We decided to make our interface to FTP
|
|
|
1220 |
a file system rather than the traditional command.
|
|
|
1221 |
Our command,
|
|
|
1222 |
.I ftpfs,
|
|
|
1223 |
dials the FTP port of a remote system, prompts for login and password, sets image mode,
|
|
|
1224 |
and mounts the remote file system onto
|
|
|
1225 |
.CW /n/ftp .
|
|
|
1226 |
Files and directories are cached to reduce traffic.
|
|
|
1227 |
The cache is updated whenever a file is created.
|
|
|
1228 |
Ftpfs works with TOPS-20, VMS, and various Unix flavors
|
|
|
1229 |
as the remote system.
|
|
|
1230 |
.NH
|
|
|
1231 |
Cyclone Fiber Links
|
|
|
1232 |
.PP
|
|
|
1233 |
The file servers and CPU servers are connected by
|
|
|
1234 |
high-bandwidth
|
|
|
1235 |
point-to-point links.
|
|
|
1236 |
A link consists of two VME cards connected by a pair of optical
|
|
|
1237 |
fibers.
|
|
|
1238 |
The VME cards use 33MHz Intel 960 processors and AMD's TAXI
|
|
|
1239 |
fiber transmitter/receivers to drive the lines at 125 Mbit/sec.
|
|
|
1240 |
Software in the VME card reduces latency by copying messages from system memory
|
|
|
1241 |
to fiber without intermediate buffering.
|
|
|
1242 |
.NH
|
|
|
1243 |
Performance
|
|
|
1244 |
.PP
|
|
|
1245 |
We measured both latency and throughput
|
|
|
1246 |
of reading and writing bytes between two processes
|
|
|
1247 |
for a number of different paths.
|
|
|
1248 |
Measurements were made on two- and four-CPU SGI Power Series processors.
|
|
|
1249 |
The CPUs are 25 MHz MIPS 3000s.
|
|
|
1250 |
The latency is measured as the round trip time
|
|
|
1251 |
for a byte sent from one process to another and
|
|
|
1252 |
back again.
|
|
|
1253 |
Throughput is measured using 16k writes from
|
|
|
1254 |
one process to another.
|
|
|
1255 |
.DS C
|
|
|
1256 |
.TS
|
|
|
1257 |
box, tab(:);
|
|
|
1258 |
c s s
|
|
|
1259 |
c | c | c
|
|
|
1260 |
l | n | n.
|
|
|
1261 |
Table 1 - Performance
|
|
|
1262 |
_
|
|
|
1263 |
test:throughput:latency
|
|
|
1264 |
:MBytes/sec:millisec
|
|
|
1265 |
_
|
|
|
1266 |
pipes:8.15:.255
|
|
|
1267 |
_
|
|
|
1268 |
IL/ether:1.02:1.42
|
|
|
1269 |
_
|
|
|
1270 |
URP/Datakit:0.22:1.75
|
|
|
1271 |
_
|
|
|
1272 |
Cyclone:3.2:0.375
|
|
|
1273 |
.TE
|
|
|
1274 |
.DE
|
|
|
1275 |
.NH
|
|
|
1276 |
Conclusion
|
|
|
1277 |
.PP
|
|
|
1278 |
The representation of all resources as file systems
|
|
|
1279 |
coupled with an ASCII interface has proved more powerful
|
|
|
1280 |
than we had originally imagined.
|
|
|
1281 |
Resources can be used by any computer in our networks
|
|
|
1282 |
independent of byte ordering or CPU type.
|
|
|
1283 |
The connection server provides an elegant means
|
|
|
1284 |
of decoupling tools from the networks they use.
|
|
|
1285 |
Users successfully use Plan 9 without knowing the
|
|
|
1286 |
topology of the system or the networks they use.
|
|
|
1287 |
More information about 9P can be found in the Section 5 of the Plan 9 Programmer's
|
|
|
1288 |
Manual, Volume I.
|
|
|
1289 |
.NH
|
|
|
1290 |
References
|
|
|
1291 |
.LP
|
|
|
1292 |
[Pike90] R. Pike, D. Presotto, K. Thompson, H. Trickey,
|
|
|
1293 |
``Plan 9 from Bell Labs'',
|
|
|
1294 |
.I
|
|
|
1295 |
UKUUG Proc. of the Summer 1990 Conf. ,
|
|
|
1296 |
London, England,
|
|
|
1297 |
1990.
|
|
|
1298 |
.LP
|
|
|
1299 |
[Needham] R. Needham, ``Names'', in
|
|
|
1300 |
.I
|
|
|
1301 |
Distributed systems,
|
|
|
1302 |
.R
|
|
|
1303 |
S. Mullender, ed.,
|
|
|
1304 |
Addison Wesley, 1989.
|
|
|
1305 |
.LP
|
|
|
1306 |
[Presotto] D. Presotto, ``Multiprocessor Streams for Plan 9'',
|
|
|
1307 |
.I
|
|
|
1308 |
UKUUG Proc. of the Summer 1990 Conf. ,
|
|
|
1309 |
.R
|
|
|
1310 |
London, England, 1990.
|
|
|
1311 |
.LP
|
|
|
1312 |
[Met80] R. Metcalfe, D. Boggs, C. Crane, E. Taf and J. Hupp, ``The
|
|
|
1313 |
Ethernet Local Network: Three reports'',
|
|
|
1314 |
.I
|
|
|
1315 |
CSL-80-2,
|
|
|
1316 |
.R
|
|
|
1317 |
XEROX Palo Alto Research Center, February 1980.
|
|
|
1318 |
.LP
|
|
|
1319 |
[Fra80] A. G. Fraser, ``Datakit - A Modular Network for Synchronous
|
|
|
1320 |
and Asynchronous Traffic'',
|
|
|
1321 |
.I
|
|
|
1322 |
Proc. Int'l Conf. on Communication,
|
|
|
1323 |
.R
|
|
|
1324 |
Boston, June 1980.
|
|
|
1325 |
.LP
|
|
|
1326 |
[Pet89a] L. Peterson, ``RPC in the X-Kernel: Evaluating new Design Techniques'',
|
|
|
1327 |
.I
|
|
|
1328 |
Proc. Twelfth Symp. on Op. Sys. Princ.,
|
|
|
1329 |
.R
|
|
|
1330 |
Litchfield Park, AZ, December 1990.
|
|
|
1331 |
.LP
|
|
|
1332 |
[Rit84a] D. M. Ritchie, ``A Stream Input-Output System'',
|
|
|
1333 |
.I
|
|
|
1334 |
AT&T Bell Laboratories Technical Journal, 68(8),
|
|
|
1335 |
.R
|
|
|
1336 |
October 1984.
|