2 |
- |
1 |
all data is big-endian on disk.
|
|
|
2 |
|
|
|
3 |
arena layout:
|
|
|
4 |
|
|
|
5 |
ArenaPart (first at offset PartBlank = 256kB in the disk file)
|
|
|
6 |
magic[4] 0xA9E4A5E7
|
|
|
7 |
version[4] 3
|
|
|
8 |
blockSize[4]
|
|
|
9 |
arenaBase[4] offset of first ArenaHead structure in the disk file
|
|
|
10 |
|
|
|
11 |
the ArenaMap starts at the first block at offset >= PartBlank+512 bytes.
|
|
|
12 |
it is a sequence of text lines
|
|
|
13 |
/*
|
|
|
14 |
* amap: n '\n' amapelem * n
|
|
|
15 |
* n: u32int
|
|
|
16 |
* amapelem: name '\t' astart '\t' asize '\n'
|
|
|
17 |
* astart, asize: u64int
|
|
|
18 |
*/
|
|
|
19 |
|
|
|
20 |
the astart and astop are byte offsets in the disk file.
|
|
|
21 |
they are the offsets to the ArenaHead and the end of the Arena block.
|
|
|
22 |
|
|
|
23 |
ArenaHead
|
|
|
24 |
[base points here in the C code]
|
|
|
25 |
size bytes
|
|
|
26 |
Clumps
|
|
|
27 |
ClumpInfo blocks
|
|
|
28 |
Arena
|
|
|
29 |
|
|
|
30 |
Arena
|
|
|
31 |
magic[4] 0xF2A14EAD
|
|
|
32 |
version[4] 4
|
|
|
33 |
name[64]
|
|
|
34 |
clumps[4]
|
|
|
35 |
cclumps[4]
|
|
|
36 |
ctime[4]
|
|
|
37 |
wtime[4]
|
|
|
38 |
used[8]
|
|
|
39 |
uncsize[8]
|
|
|
40 |
sealed[1]
|
|
|
41 |
optional score[20]
|
|
|
42 |
|
|
|
43 |
once sealed, the sha1 hash of every block from the
|
|
|
44 |
ArenaHead to the Arena is checksummed, as though
|
|
|
45 |
the final score in Arena were the zeroScore. strangely,
|
|
|
46 |
the tail of the Arena block (the last one) is not included in the checksum
|
|
|
47 |
(i.e., the unused data after the score).
|
|
|
48 |
|
|
|
49 |
clumpMax = blocksize/ClumpInfoSize = blocksize/25
|
|
|
50 |
dirsize = ((clumps/clumpMax)+1) * blocksize
|
|
|
51 |
want used+dirsize <= size
|
|
|
52 |
want cclumps <= clumps
|
|
|
53 |
want uncsize+clumps*ClumpSize+blocksize < used
|
|
|
54 |
want ctime <= wtime
|
|
|
55 |
|
|
|
56 |
clump info is stored packed into blocks in order.
|
|
|
57 |
clump info moves forward through a block but the
|
|
|
58 |
blocks themselves move backwards. so if cm=clumpMax
|
|
|
59 |
and there are two blocks worth of clumpinfo, the blocks
|
|
|
60 |
look like;
|
|
|
61 |
|
|
|
62 |
[cm..2*cm-1] [0..cm-1] [Arena]
|
|
|
63 |
|
|
|
64 |
with the blocks pushed right up against the Arena trailer.
|
|
|
65 |
|
|
|
66 |
ArenaHead
|
|
|
67 |
magic[4] 0xD15C4EAD
|
|
|
68 |
version[4] = Arena.version
|
|
|
69 |
name[64]
|
|
|
70 |
blockSize[4]
|
|
|
71 |
size[8]
|
|
|
72 |
|
|
|
73 |
Clump
|
|
|
74 |
magic[4] 0xD15CB10C (0 for an unused clump)
|
|
|
75 |
type[1]
|
|
|
76 |
size[2]
|
|
|
77 |
uncsize[2]
|
|
|
78 |
score[20]
|
|
|
79 |
encoding[1] raw=1, compress=2
|
|
|
80 |
creator[4]
|
|
|
81 |
time[4]
|
|
|
82 |
|
|
|
83 |
ClumpInfo
|
|
|
84 |
type[1]
|
|
|
85 |
size[2]
|
|
|
86 |
uncsize[2]
|
|
|
87 |
score[20]
|
|
|
88 |
|
|
|
89 |
the arenas are mapped into a single address space corresponding
|
|
|
90 |
to the index that brings them together. if each arena has 100M bytes
|
|
|
91 |
excluding the headers and there are 4 arenas, then there's 400M of
|
|
|
92 |
index address space between them. index address space starts at 1M
|
|
|
93 |
instead of 0, so the index addresses assigned to the first arena are
|
|
|
94 |
1M up to 101M, then 101M to 201M, etc.
|
|
|
95 |
|
|
|
96 |
of course, the assignment of addresses has nothing to do with the index,
|
|
|
97 |
but that's what they're called.
|
|
|
98 |
|
|
|
99 |
|
|
|
100 |
the index is split into index sections, which are put on different disks
|
|
|
101 |
to get parallelism of disk heads. each index section holds some number
|
|
|
102 |
of hash buckets, each in its own disk block. collectively the index sections
|
|
|
103 |
hold ix->buckets between them.
|
|
|
104 |
|
|
|
105 |
the top 32-bits of the score is used to assign scores to buckets.
|
|
|
106 |
div = ceil(2³² / ix->buckets) is the amount of 32-bit score space per bucket.
|
|
|
107 |
|
|
|
108 |
to look up a block, take the top 32 bits of score and divide by div
|
|
|
109 |
to get the bucket number. then look through the index section headers
|
|
|
110 |
to figure out which index section has that bucket.
|
|
|
111 |
|
|
|
112 |
then load that block from the index section. it's an IBucket.
|
|
|
113 |
|
|
|
114 |
the IBucket has ib.n IEntry structures in it, sorted by score and then by type.
|
|
|
115 |
do the lookup and get an IEntry. the ia.addr will be a logical address
|
|
|
116 |
that you then use to get the
|
|
|
117 |
|
|
|
118 |
ISect
|
|
|
119 |
magic[4] 0xD15C5EC7
|
|
|
120 |
version[4]
|
|
|
121 |
name[64]
|
|
|
122 |
index[64]
|
|
|
123 |
blockSize[4]
|
|
|
124 |
blockBase[4] address in partition where bucket blocks start
|
|
|
125 |
blocks[4]
|
|
|
126 |
start[4]
|
|
|
127 |
stop[4] stop - start <= blocks, but not necessarily ==
|
|
|
128 |
|
|
|
129 |
IEntry
|
|
|
130 |
score[20]
|
|
|
131 |
wtime[4]
|
|
|
132 |
train[2]
|
|
|
133 |
ia.addr[8] index address (see note above)
|
|
|
134 |
ia.size[2] size of uncompressed block data
|
|
|
135 |
ia.type[1]
|
|
|
136 |
ia.blocks[1] number of blocks of clump on disk
|
|
|
137 |
|
|
|
138 |
IBucket
|
|
|
139 |
n[2]
|
|
|
140 |
next[4] not sure; either 0 or inside [start,stop) for the ISect
|
|
|
141 |
data[n*IEntrySize]
|
|
|
142 |
|
|
|
143 |
final piece: all the disk partitions start with PartBlank=256kB of unused disk
|
|
|
144 |
(presumably to avoid problems with boot sectors and layout tables
|
|
|
145 |
and the like).
|
|
|
146 |
|
|
|
147 |
actually the last 8k of the 256k (that is, at offset 248kB) can hold
|
|
|
148 |
a venti config file to help during bootstrap of the venti file server.
|
|
|
149 |
|