2 |
- |
1 |
.FP palatino
|
|
|
2 |
.TM
|
|
|
3 |
.TL
|
|
|
4 |
Plan 9 on the Mikrotik RB450G Routerboard
|
|
|
5 |
.AU
|
|
|
6 |
Geoff Collyer
|
|
|
7 |
.AI
|
|
|
8 |
.MH
|
|
|
9 |
.NH 1
|
|
|
10 |
Motivation
|
|
|
11 |
.LP
|
|
|
12 |
I ported Plan 9 to the Routerboard mainly to verify
|
|
|
13 |
that Plan 9's MIPS-related code
|
|
|
14 |
(compiler, assembler, loader,
|
|
|
15 |
.CW libmach ,
|
|
|
16 |
etc.) was still in working order and would
|
|
|
17 |
work on newer machines than the 1993-era ones that we last owned
|
|
|
18 |
(MIPS Magnum, SGI Challenge, Carrera and the like).
|
|
|
19 |
The verdict is that,
|
|
|
20 |
with a few surprising exceptions, the code still works on newish machines
|
|
|
21 |
(the MIPS 24K CPU in the Routerboard dates to about 2003 originally;
|
|
|
22 |
this revision is from about 2005).
|
|
|
23 |
So we now have a
|
|
|
24 |
machine on which to test MIPS executables.
|
|
|
25 |
.LP
|
|
|
26 |
The other reason I did the port was
|
|
|
27 |
as an incremental step toward
|
|
|
28 |
running Plan 9 on a MIPS64 machine (e.g., the dual-core, dual-issue
|
|
|
29 |
Cavium CN5020 in the Ubiquiti Edgerouter Lite 3).
|
|
|
30 |
.NH 1
|
|
|
31 |
The new MIPS world
|
|
|
32 |
.LP
|
|
|
33 |
These newer MIPS systems are aimed at embedded applications, so they
|
|
|
34 |
typically lack FPUs and may also lack L2 caches or have small TLBs;
|
|
|
35 |
the MIPS 24K in the Atheros 7161 SoC lacks FPU and L2 cache, and has a
|
|
|
36 |
16-entry TLB.
|
|
|
37 |
It is a MIPS32R2 architecture system and lacks the 64-bit instructions
|
|
|
38 |
of the R4000.
|
|
|
39 |
These new MIPS systems are still big-endian,
|
|
|
40 |
so provide a useful test case to expose byte-ordering bugs.
|
|
|
41 |
.NH 1
|
|
|
42 |
Plan 9 changes and additions
|
|
|
43 |
.NH 2
|
|
|
44 |
CPU Bug Workarounds
|
|
|
45 |
.LP
|
|
|
46 |
The Linux MIPS people cite MIPS 24K erratum 48:
|
|
|
47 |
3 consecutive stores lose data.
|
|
|
48 |
MIPS only distribute their errata lists under NDA and to their
|
|
|
49 |
corporate partners, so we have only the Linux report to go on.
|
|
|
50 |
The fix requires
|
|
|
51 |
.I both
|
|
|
52 |
write-through data cache and
|
|
|
53 |
no more than two consecutive single-word stores in all executables.
|
|
|
54 |
I have made a crude optional change to
|
|
|
55 |
.I vl
|
|
|
56 |
to generate a NOP before every third consecutive store.
|
|
|
57 |
The fix could be better, in particular the technique for
|
|
|
58 |
keeping stores out of branch delay slots.
|
|
|
59 |
.NH 2
|
|
|
60 |
Driver for Undocumented Ethernet Controller
|
|
|
61 |
.LP
|
|
|
62 |
The FreeBSD Atheros
|
|
|
63 |
.I arge
|
|
|
64 |
driver
|
|
|
65 |
(in
|
|
|
66 |
.CW /usr/src/sys/mips/atheros )
|
|
|
67 |
provided inspiration for our Gigabit Ethernet driver, since the
|
|
|
68 |
hardware is otherwise largely undocumented.
|
|
|
69 |
I haven't got the second
|
|
|
70 |
Ethernet controller entirely working yet;
|
|
|
71 |
it's perhaps complicated by having a switch attached to it (the Atheros 8316).
|
|
|
72 |
At minimum, it probably needs MII or PHY initialisation.
|
|
|
73 |
.NH 2
|
|
|
74 |
Floating-point Emulation
|
|
|
75 |
.LP
|
|
|
76 |
Floating-point emulation works but is
|
|
|
77 |
.I very
|
|
|
78 |
slow:
|
|
|
79 |
.I astro
|
|
|
80 |
takes about 8 seconds.
|
|
|
81 |
I added an
|
|
|
82 |
.CW fpemudebug
|
|
|
83 |
command to
|
|
|
84 |
.CW /dev/archctl ;
|
|
|
85 |
it
|
|
|
86 |
takes a number as argument corresponding to the
|
|
|
87 |
.CW Dbg*
|
|
|
88 |
bits in
|
|
|
89 |
.CW fpimips.c ,
|
|
|
90 |
but requires the kernel to be compiled with
|
|
|
91 |
.CW FPEMUDEBUG
|
|
|
92 |
defined.
|
|
|
93 |
.NH 3
|
|
|
94 |
\&... in Locking Code
|
|
|
95 |
.LP
|
|
|
96 |
The big surprises included that
|
|
|
97 |
.CW /sys/src/libc/mips/lock.c
|
|
|
98 |
read
|
|
|
99 |
.CW FCR0
|
|
|
100 |
to
|
|
|
101 |
choose the locking style.
|
|
|
102 |
That's been broken out into
|
|
|
103 |
.CW c_fcr0.s
|
|
|
104 |
so that we can change it, but the kernel also emulates the
|
|
|
105 |
.CW MOVW
|
|
|
106 |
.CW FCR0,R1
|
|
|
107 |
(and via a fast code path), to keep alive the possibility of running
|
|
|
108 |
old binaries from the dump.
|
|
|
109 |
.NH 2
|
|
|
110 |
No 64-bit Instructions
|
|
|
111 |
.LP
|
|
|
112 |
The other big surprise was that
|
|
|
113 |
.CW /sys/src/libmp/mips/mpdigdiv.s
|
|
|
114 |
used 64-bit instructions (SLLV, SRLV, ADDVU, DIVVU).
|
|
|
115 |
For now I've resolved the problem by pushing it into a
|
|
|
116 |
subdirectory (\c
|
|
|
117 |
.CW r4k )
|
|
|
118 |
and editing the
|
|
|
119 |
.CW mkfile s
|
|
|
120 |
to use the
|
|
|
121 |
.CW port
|
|
|
122 |
version
|
|
|
123 |
(and similarly in APE).
|
|
|
124 |
.br
|
|
|
125 |
.ne 8
|
|
|
126 |
.NH 2
|
|
|
127 |
Page Size vs TLB Faults
|
|
|
128 |
.LP
|
|
|
129 |
I started out with a 4K page size and reduced the number of TLB
|
|
|
130 |
entries reserved for the kernel to 2, leaving 14 for user programs,
|
|
|
131 |
but
|
|
|
132 |
.CW /dev/sysstat
|
|
|
133 |
was reporting 6 times as many TLB faults as page
|
|
|
134 |
faults, and the number increased at a furious rate.
|
|
|
135 |
.LP
|
|
|
136 |
So I switched to
|
|
|
137 |
a 16K page size, adjusted
|
|
|
138 |
.CW vl
|
|
|
139 |
.CW -H2
|
|
|
140 |
accordingly and recompiled the
|
|
|
141 |
.CW /mips
|
|
|
142 |
world.
|
|
|
143 |
This reduced the TLB faults to just 10% more than the number of page faults.
|
|
|
144 |
(That number is now around 15% more, due to a better soft-TLB hash function
|
|
|
145 |
that makes the soft TLB more effective.)
|
|
|
146 |
16K pages also produce consecutive (even recursive) page faults
|
|
|
147 |
for the same address at the same PC
|
|
|
148 |
and the system runs at about 10% of its normal speed,
|
|
|
149 |
so 4K pages are currently the only sensible choice;
|
|
|
150 |
we'll just live with the absurdly-high number of TLB faults
|
|
|
151 |
(around 20k–30k per second).
|
|
|
152 |
It probably doesn't help that one 16K page is half of the L1 data cache
|
|
|
153 |
and one quarter of the L1 instruction cache.
|
|
|
154 |
.LP
|
|
|
155 |
Page size is controlled by
|
|
|
156 |
.CW BIGPAGES
|
|
|
157 |
in
|
|
|
158 |
.CW mem.h .
|
|
|
159 |
.NH 3
|
|
|
160 |
Combined TLB Pool
|
|
|
161 |
.LP
|
|
|
162 |
I also changed
|
|
|
163 |
.CW mmu.c
|
|
|
164 |
to collapse the separate kernel and user TLB pools into one,
|
|
|
165 |
once user processes start running,
|
|
|
166 |
but that only helps to reduce TLB faults a little.
|
|
|
167 |
.
|
|
|
168 |
.br
|
|
|
169 |
.ne 8
|
|
|
170 |
.
|
|
|
171 |
.NH 1
|
|
|
172 |
Remaining Problems
|
|
|
173 |
.LP
|
|
|
174 |
Interrupt-driven UART output isn't quite right.
|
|
|
175 |
It can get stuck and then input makes it resume.
|
|
|
176 |
The UART is apparently connected via the APB and requires
|
|
|
177 |
interrupt unmasking in the APB (which we now do).
|
|
|
178 |
There's some kludgey stuff in
|
|
|
179 |
.CW uarti8250.c
|
|
|
180 |
that makes output work most of the time
|
|
|
181 |
(characters do sometimes get dropped).
|
|
|
182 |
.LP
|
|
|
183 |
The Ethernet driver currently does not
|
|
|
184 |
dig out the MAC addresses from the hardware,
|
|
|
185 |
so you'll need to edit the
|
|
|
186 |
.CW rb
|
|
|
187 |
configuration file for each Routerboard; the format should be obvious.
|
|
|
188 |
I don't have the stomach to dig the MAC address out of the hardware
|
|
|
189 |
via SPI or whatever vile interface it requires.
|