2 |
- |
1 |
.SH
|
|
|
2 |
Block Devices
|
|
|
3 |
.PP
|
|
|
4 |
The block device I/O system is like a
|
|
|
5 |
protocol stack of filters.
|
|
|
6 |
There are a set of pseudo-devices that call
|
|
|
7 |
recursively to other pseudo-devices and real devices.
|
|
|
8 |
The protocol stack is compiled from a configuration
|
|
|
9 |
string that specifies the order of pseudo-devices and devices.
|
|
|
10 |
Each pseudo-device and device has a set of entry points
|
|
|
11 |
that corresponds to the operations that the file system
|
|
|
12 |
requires of a device.
|
|
|
13 |
The most notable operations are
|
|
|
14 |
.CW read ,
|
|
|
15 |
.CW write ,
|
|
|
16 |
and
|
|
|
17 |
.CW size .
|
|
|
18 |
.PP
|
|
|
19 |
The device stack can best be described by
|
|
|
20 |
describing the syntax of the configuration string
|
|
|
21 |
that specifies the stack.
|
|
|
22 |
Configuration strings are used
|
|
|
23 |
during the setup of the file system.
|
|
|
24 |
For a description see
|
|
|
25 |
.I fsconfig (8).
|
|
|
26 |
In the following recursive definition,
|
|
|
27 |
.I D
|
|
|
28 |
represents a
|
|
|
29 |
string that specifies a block device.
|
|
|
30 |
.IP "\fID\fP = (\fIDD\fP...)"
|
|
|
31 |
.br
|
|
|
32 |
This is a set of devices that
|
|
|
33 |
are concatenated to form a single device.
|
|
|
34 |
The size of the catenated device is the
|
|
|
35 |
sum of the sizes of each sub-device.
|
|
|
36 |
.IP "\fID\fP = [\fIDD\fP...]"
|
|
|
37 |
.br
|
|
|
38 |
This is the interleaving of the
|
|
|
39 |
individual devices.
|
|
|
40 |
If there are N devices in the list,
|
|
|
41 |
then the pseudo-device is the N-way block
|
|
|
42 |
interleaving of the sub-devices.
|
|
|
43 |
The size of the interleaved device is
|
|
|
44 |
N times the size of the smallest sub-device.
|
|
|
45 |
.IP "\fID\fP = {\fIDD\fP...}"
|
|
|
46 |
.br
|
|
|
47 |
This is a set of devices that
|
|
|
48 |
constitute a `mirror' of the first sub-device, and form a single device.
|
|
|
49 |
A write to the device is performed,
|
|
|
50 |
at the same block address,
|
|
|
51 |
on the sub-devices, in right-to-left order.
|
|
|
52 |
A read from the device is performed on each sub-device,
|
|
|
53 |
in left-to-right order, until a read succeeds without error,
|
|
|
54 |
or the set is exhausted.
|
|
|
55 |
One can think of this as a poor man's RAID 1.
|
|
|
56 |
The size of the device is the size of the smallest sub-device.
|
|
|
57 |
.IP "\fID\fP = \f(CWp\fP\fIDN1.N2\fP"
|
|
|
58 |
.br
|
|
|
59 |
This is a partition of a sub-device.
|
|
|
60 |
The sub-device is partitioned into 100 equal pieces.
|
|
|
61 |
If the size of the sub-device is not divisible by 100,
|
|
|
62 |
then there will be some slop thrown away at the top.
|
|
|
63 |
The pseudo-device starts at the N1-th piece and
|
|
|
64 |
continues for N2 pieces. Thus
|
|
|
65 |
.CW p\fID\fP67.33
|
|
|
66 |
will be the
|
|
|
67 |
last third of the device
|
|
|
68 |
.I D .
|
|
|
69 |
.IP "\fID\fP = \f(CWf\fP\fID\fP"
|
|
|
70 |
.br
|
|
|
71 |
This is a fake write-once-read-many device simulated by a
|
|
|
72 |
second read-write device.
|
|
|
73 |
This second device is partitioned
|
|
|
74 |
into a set of block flags and a set of blocks.
|
|
|
75 |
The flags are used to generate errors if a
|
|
|
76 |
block is ever written twice or read without being written first.
|
|
|
77 |
.IP "\fID\fP = \f(CWx\fP\fID\fP"
|
|
|
78 |
.br
|
|
|
79 |
This is a byte-swapped version of the file system on D.
|
|
|
80 |
Since the file server currently writes integers in metadata to disk
|
|
|
81 |
in native byte order, moving a file system to a machine of the other
|
|
|
82 |
major byte order (e.g., MIPS to Pentium)
|
|
|
83 |
requires the use of
|
|
|
84 |
.CW x .
|
|
|
85 |
It knows the sizes of the various integer fields in the file system metadata.
|
|
|
86 |
Ideally, the file server would follow the Plan 9 religion and write a consistent
|
|
|
87 |
byte order on disk, regardless of processor.
|
|
|
88 |
In the mean time, it should be possible to automatically determine the need
|
|
|
89 |
for byte-swapping by examining data in the super-block of each file system,
|
|
|
90 |
though this has not been implemented yet.
|
|
|
91 |
.IP "\fID\fP = \f(CWc\fP\fIDD\fP"
|
|
|
92 |
.br
|
|
|
93 |
This is the cache/WORM device made up of a cache (read-write)
|
|
|
94 |
device and a WORM (write-once-read-many) device.
|
|
|
95 |
More on this later.
|
|
|
96 |
.IP "\fID\fP = \f(CWo\fP"
|
|
|
97 |
.br
|
|
|
98 |
This is the dump file system that is the
|
|
|
99 |
two-level hierarchy of all dumps ever taken on a cache/WORM.
|
|
|
100 |
The read-only root of the cache/WORM file system
|
|
|
101 |
(on the dump taken Feb 18, 1995) can
|
|
|
102 |
be referenced as
|
|
|
103 |
.CW /1995/0218
|
|
|
104 |
in this pseudo device.
|
|
|
105 |
The second dump taken that day will be
|
|
|
106 |
.CW /1995/02181 .
|
|
|
107 |
.IP "\fID\fP = \f(CWw\fP\fIN1.N2.N3\fP"
|
|
|
108 |
.br
|
|
|
109 |
This is a SCSI disk on controller N1, target N2 and logical unit number N3.
|
|
|
110 |
.IP "\fID\fP = \f(CWh\fP\fIN1.N2.0\fP"
|
|
|
111 |
.br
|
|
|
112 |
This is an (E)IDE or *ATA disk on controller N1, target N2
|
|
|
113 |
(target 0 is the IDE master, 1 the slave device).
|
|
|
114 |
These disks are currently run via programmed I/O, not DMA,
|
|
|
115 |
so they tend to be slower to access than SCSI disks.
|
|
|
116 |
.IP "\fID\fP = \f(CWr\fP\fIN1\fP"
|
|
|
117 |
.br
|
|
|
118 |
This is the same as
|
|
|
119 |
.CW w ,
|
|
|
120 |
but refers to a side of a WORM disc.
|
|
|
121 |
See the
|
|
|
122 |
.I j
|
|
|
123 |
device.
|
|
|
124 |
.IP "\fID\fP = \f(CWl\fP\fIN1\fP"
|
|
|
125 |
.br
|
|
|
126 |
This is the same as
|
|
|
127 |
.CW r ,
|
|
|
128 |
but one block from the SCSI disk is removed for labeling.
|
|
|
129 |
.IP "\fID\fP = \f(CWj(\fP\fID\d\s-2\&1\s+2\u\fID\d\s-2\&2\s+2\u\f(CW*)\fID\d\s-2\&3\s+2\u\f1"
|
|
|
130 |
.br
|
|
|
131 |
.I D\d\s-2\&1\s+2\u
|
|
|
132 |
is the juke box SCSI interface.
|
|
|
133 |
The
|
|
|
134 |
.I D\d\s-2\&2\s+2\u 's
|
|
|
135 |
are the SCSI drives in the juke box
|
|
|
136 |
and the
|
|
|
137 |
.I D\d\s-2\&3\s+2\u 's
|
|
|
138 |
are the demountable platters in the juke box.
|
|
|
139 |
.I D\d\s-2\&1\s+2\u
|
|
|
140 |
and
|
|
|
141 |
.I D\d\s-2\&2\s+2\u
|
|
|
142 |
must be
|
|
|
143 |
.CW w .
|
|
|
144 |
.I D\d\s-2\&3\s+2\u
|
|
|
145 |
must be pseudo devices of
|
|
|
146 |
.CW w ,
|
|
|
147 |
.CW r ,
|
|
|
148 |
or
|
|
|
149 |
.CW l
|
|
|
150 |
devices.
|
|
|
151 |
.PP
|
|
|
152 |
For
|
|
|
153 |
.CW w ,
|
|
|
154 |
.CW h ,
|
|
|
155 |
.CW l ,
|
|
|
156 |
and
|
|
|
157 |
.CW r
|
|
|
158 |
devices any of the configuration numbers
|
|
|
159 |
can be replaced by an iterator of the form
|
|
|
160 |
.CW <\fIN1-N2\fP> .
|
|
|
161 |
N1 can be greater than N2, indicating a descending sequence.
|
|
|
162 |
Thus
|
|
|
163 |
.Ex
|
|
|
164 |
[w0.<2-6>]
|
|
|
165 |
.Ee
|
|
|
166 |
is the interleaved SCSI disks on SCSI targets
|
|
|
167 |
2 through 6 of SCSI controller 0.
|
|
|
168 |
The main file system on
|
|
|
169 |
Emelie
|
|
|
170 |
is defined by the configuration string
|
|
|
171 |
.Ex
|
|
|
172 |
c[w1.<0-5>.0]j(w6w5w4w3w2)(l<0-236>l<238-474>)
|
|
|
173 |
.Ee
|
|
|
174 |
This is a cache/WORM driver.
|
|
|
175 |
The cache is three interleaved disks on SCSI controller 1
|
|
|
176 |
targets 0, 1, 2, 3, 4, and 5.
|
|
|
177 |
The WORM half of the cache/WORM
|
|
|
178 |
is 474 jukebox disks.
|
|
|
179 |
Another file server,
|
|
|
180 |
.I choline ,
|
|
|
181 |
has a main file system defined by
|
|
|
182 |
.Ex
|
|
|
183 |
c[w<1-3>]j(w1.<6-0>.0)(l<0-124>l<128-252>)
|
|
|
184 |
.Ee
|
|
|
185 |
The order of
|
|
|
186 |
.CW w1.<6-0>.0
|
|
|
187 |
matters here, since the optical jukebox's WORM drives's
|
|
|
188 |
SCSI target ids,
|
|
|
189 |
as delivered,
|
|
|
190 |
run in descending order relative to the numbers of the drives
|
|
|
191 |
in SCSI commands
|
|
|
192 |
(e.g., the jukebox controller is SCSI target 6,
|
|
|
193 |
drive #1 is SCSI target 5,
|
|
|
194 |
and drive #6 is SCSI target 0).
|