Warning: Attempt to read property "date" on null in /usr/local/www/websvn.planix.org/blame.php on line 247

Warning: Attempt to read property "msg" on null in /usr/local/www/websvn.planix.org/blame.php on line 247
WebSVN – planix.SVN – Blame – /os/branches/feature_fixcpp/sys/doc/fs/p6 – Rev 2

Subversion Repositories planix.SVN

Rev

Go to most recent revision | Details | Last modification | View Log | RSS feed

Rev Author Line No. Line
2 - 1
.SH
2
Cache/WORM Driver
3
.PP
4
The cache/WORM (cw) driver is by far the
5
largest and most complicated device driver in the file server.
6
There are four devices involved in the cw driver.
7
It implements a read/write pseudo-device (the cw-device)
8
and a read-only pseudo-device (the dump device)
9
by performing operations on its two constituent devices
10
the read-write c-device and the write-once-read-many
11
w-device.
12
The block numbers on the four devices are distinct,
13
although the
14
.I cw
15
addresses,
16
dump addresses,
17
and the
18
.I w
19
addresses are
20
highly correlated.
21
.PP
22
The cw-driver uses the w-device as the
23
stable storage of the file system at the time of the
24
last dump.
25
All newly written and a large number of recently used
26
exact copies of blocks of the w-device are kept on the c-device.
27
The c-device is much smaller than the w-device and
28
so the subset of w-blocks that are kept on the c-device are
29
mapped through a hash table kept on a partition of the c-device.
30
.PP
31
The map portion of the c-device consists of blocks of buckets of entries.
32
The declarations follow.
33
.Ex
34
	enum
35
	{
36
		BKPERBLK = 10,
37
		CEPERBK	= (BUFSIZE - BKPERBLK*sizeof(Off)) /
38
				(sizeof(Centry)*BKPERBLK),
39
	};
40
.Ee
41
.Ex
42
	typedef
43
	struct
44
	{
45
		ushort	age;
46
		short	state;
47
		Off	waddr;
48
	} Centry;
49
.Ee
50
.Ex
51
	typedef
52
	struct
53
	{
54
		long	agegen;
55
		Centry	entry[CEPERBK];
56
	} Bucket;
57
.Ee
58
.Ex
59
	Bucket	bucket[BKPERBLK];
60
.Ee
61
There is exactly one entry structure for each block in the
62
data partition of the c-device.
63
A bucket contains all of the w-addresses that have
64
the same hash code.
65
There are as many buckets as will fit
66
in a block and enough blocks to have the required
67
number of entries.
68
The entries in the bucket are maintained
69
in FIFO order with an age variable and an incrementing age generator.
70
When the age generator is about to overflow,
71
all of the ages in the bucket are rescaled
72
from zero.
73
.PP
74
The following steps go into converting a w-address into a c-address.
75
The bucket is found by
76
.Ex
77
	bucket_number = w-address % total_buckets;
78
	getbuf(c-device, bucket_offset + bucket_number/BKPERBLK);
79
.Ee
80
After the desired bucket is found,
81
the desired entry is found by a linear search within the bucket for the
82
entry with the desired
83
.CW waddr .
84
.PP
85
The state variable in the entry is
86
one of the following.
87
.Ex
88
	enum
89
	{
90
		Cnone	= 0,
91
		Cdirty,
92
		Cdump,
93
		Cread,
94
		Cwrite,
95
		Cdump1,
96
	};
97
.Ee
98
Every w-address has a state.
99
Blocks that are not in the
100
c-device have the implied
101
state
102
.CW Cnone .
103
The
104
.CW Cread
105
state is for blocks that have the
106
same data as the corresponding block in
107
the w-device.
108
Since the c-device is much faster than the
109
w-device,
110
.CW Cread
111
blocks are kept as long as possible and
112
used in preference to reading the w-device.
113
.CW Cread
114
blocks may be discarded from the c-device
115
when the space is needed for newer data.
116
The
117
.CW Cwrite
118
state is when the c-device contains newer data
119
than the corresponding block on the w-device.
120
This happens when a
121
.CW Cnone ,
122
.CW Cread ,
123
or
124
.CW Cwrite
125
block is written.
126
The
127
.CW Cdirty
128
state
129
is when the c-device contains
130
new data and the corresponding block
131
on the w-device has never been written.
132
This happens when a new block has been
133
allocated from the free space on the w-device.
134
.PP
135
The
136
.CW Cwrite
137
and
138
.CW Cdirty
139
blocks are created and never removed.
140
Unless something is done to
141
convert these blocks,
142
the c-device will gradually
143
fill up and stop functioning.
144
Once a day,
145
or by command,
146
a
147
.I dump
148
of the cw-device
149
is taken.
150
The purpose of
151
a dump is to queue the writes that
152
have been shunted to the c-device
153
to be written to the w-device.
154
Since the w-device is a WORM,
155
blocks cannot be rewritten.
156
Blocks that have already been written to the WORM must be
157
relocated to the unused portion of the w-device.
158
These are precisely the
159
blocks with
160
.CW Cwrite
161
state.
162
.PP
163
The dump algorithm is as follows:
164
.IP a)
165
The tree on the cw-device is walked
166
as long as the blocks visited have been
167
modified since the last dump.
168
These are the blocks with state
169
.CW Cwrite
170
and
171
.CW Cdirty .
172
It is possible to restrict the search
173
to within these blocks
174
since the directory containing a modified
175
file must have been accessed to modify the
176
file and accessing a directory will set its
177
modified time thus causing the block containing it
178
to be written.
179
The directory containing that directory must be
180
modified for the same reason.
181
The tree walk is thus drastically restrained and the
182
tree walk does not take much time.
183
.IP b)
184
All
185
.CW Cwrite
186
blocks found in the tree search
187
are relocated to new blank blocks on the w-device
188
and converted to
189
.CW Cdump
190
state.
191
All
192
.CW Cdirty
193
blocks are converted to
194
.CW Cdump
195
state without relocation.
196
At this point,
197
all modified blocks in the cw-device
198
have w-addresses that point to unwritten
199
WORM blocks.
200
These blocks are marked for later
201
writing to the w-device
202
with the state
203
.CW Cdump .
204
.IP c)
205
All open files that were pointing to modified
206
blocks are reopened to point at the corresponding
207
reallocated blocks.
208
This causes the directories leading to the
209
open files to be modified.
210
Thus the invariant discussed in a) is maintained.
211
.IP d)
212
The background dumping process will slowly
213
go through the map of the c-device and write out
214
all blocks with
215
.CW Cdump
216
state.
217
.PP
218
The dump takes a few minutes to walk the tree
219
and mark the blocks.
220
It can take hours to write the marked blocks
221
to the WORM.
222
If a marked block is rewritten before the old
223
copy has been written to the WORM,
224
it must be forced to the WORM before it is rewritten.
225
There is no problem if another dump is taken before the first one
226
is finished.
227
The newly marked blocks are just added to the marked blocks
228
left from the first dump.
229
.PP
230
If there is an error writing a marked block
231
to the WORM
232
then the
233
.CW dump
234
state is converted to
235
.CW Cdump1
236
and manual intervention is needed.
237
(See the
238
.CW cwcmd
239
.CW mvstate
240
command in
241
.I fs (8)).
242
These blocks can be disposed of by converting
243
their state back to
244
.CW Cdump
245
so that they will be written again.
246
They can also be converted to
247
.CW Cwrite
248
state so that they will be allocated new
249
addresses at the next dump.
250
In most other respects,
251
a
252
.CW Cdump1
253
block behaves like a
254
.CW Cwrite
255
block.