SiFive - September 11, 2017
All Aboard, Part 4: The RISC-V Code Models
The RISC-V ISA was designed to be both simple and modular. In order to achieve these design goals, RISC-V minimizes one of the largest costs in implementing complex ISAs: addressing modes. Addressing modes are expensive both in small designs (due to decode cost) and large designs (due to implicit dependencies). RISC-V only has three addressing modes:
- PC-relative, via the
auipc
,jal
andbr*
instructions. - Register-offset, via the
jalr
,addi
and all memory instructions. - Absolute, via the
lui
instruction (though arguably this is justx0
-offset).
These addressing modes have been carefully selected in order to allow for efficient code generation with a minimum of hardware complexity. We achieve this simplicity by relying on modern toolchains to optimize addressing in software -- this stands in stark contrast to traditional ISAs, which implement a plethora of addressing modes in hardware instead. Studies have shown that the RISC-V approach is sound: we are able to achieve similar code size in benchmarks while having vastly simpler decoding rules and a significant amount of free encoding space.
All these hardware complexity reductions come at the cost of increased software complexity. This blog post introduces another bit of software complexity in RISC-V: the concept of a code model. Just like relocations and relaxations, code models are not specific to RISC-V -- in fact, the RISC-V toolchain has fewer code models than most popular ISAs, largely because we rely on software optimizations instead of wacky addressing modes, which allows our addressing modes to be significantly more flexible.
What is a Code Model
Most programs do not fill the entire address space available to them with symbols (most don't fill it at all, but those that do tend to fill their address space with heap). ISAs tend to take advantage of this locality by implementing shorter addressing modes in hardware and relying on software to provide larger address modes. The code model determines which software addressing mode is used, and, therefore, what constraints are enforced on the linked program. Software addressing modes determine how the programmer sees addresses, as opposed to hardware addressing modes which determine how address bits in instructions are handled.
Code models are necessary due to the split between the compiler and the linker: when generating an unlinked object, the compiler doesn't know the absolute address of any symbol but it still must know what addressing mode to use as some addressing modes may require scratch registers to operate. As the compiler cannot generate actual addressing code, it generates addressing templates (known as relocations) that the linker can then fix up once it knows the actual addresses of each symbol. The code model determines what these addressing templates look like, and thus which relocations are emitted.
This is probably best explained with an example. Imagine the following C code:
long global_symbol[2];
int main() {
return global_symbol[0] != 0;
}
Even though a single GCC invocation can produce a binary for this simple case,
under the covers the GCC driver script is actually running the preprocessor,
then the compiler, then the assembler and finally the linker. The
--save-temps
argument to GCC allows users to see all these intermediate
files, and is a useful argument for poking around inside the toolchain.
$ riscv64-unknown-linux-gnu-gcc cmodel.c -o cmodel -O3 --save-temps
Each step in this run of the GCC wrapper script generates a file:
cmodel.i
: The preprocessed source, which expands any preprocessor directives (things like#include
or#ifdef
).cmodel.s
: The output of the actual compiler, which is an assembly file (a text file in the RISC-V assembly format).cmodel.o
: The output of the assembler, which is an unlinked object file (an ELF file, but not an executable ELF).cmodel
: The output of the linker, which is a linked executable (an executable ELF file).
In order to understand why the code model exists, we must first examine this toolchain flow in a bit more detail. Since this is a simple source file with no preprocessor macros, the preprocessor run is pretty boring: all it does is emit some directives to be used if debugging information is later generated:
$ cat cmodel.i
# 1 "cmodel.c"
# 1 "built-in"
# 1 "command-line"
# 31 "command-line"
# 1 "/scratch/palmer/work/upstream/riscv-gnu-toolchain/build/install/sysroot/usr/include/stdc-predef.h" 1 3 4
# 32 "command-line" 2
# 1 "cmodel.c"
long global_symbol;
int main() {
return global_symbol != 0;
}
The preprocessed output is then fed through the compiler, which generates an assembly file. This file is plain text that contains RISC-V assembly code and therefore is easy to read:
$ cat cmodel.s
main:
lui a5,%hi(global_symbol)
ld a0,%lo(global_symbol)(a5)
snez a0,a0
ret
The generated assembly contains a pair of instructions to address
global_symbol
: lui
and then ld
. This imposes a constraint on the
address that global_symbol
can take on: it must be addressable by a 32-bit
signed absolute constant (not 32-bit offset from some register or the PC, but
actually a 32-bit address). Note that the restriction on symbol addresses is
not related to the size of a pointer on this architecture: specifically pointers
may still be 64 bits here, but all global symbols must be addressable by a
32-bit absolute address.
After the compiler generates assembly, the GCC wrapper script calls the
assembler to generate an object file. This file is an ELF binary, which can be
read with a variety of tools provided by Binutils. In case we'll use
objdump
to show the symbol table, disassemble the text section and show
the relocations generated by the assembler:
$ riscv64-unknown-linux-gnu-objdump -d -t -r cmodel.o
cmodel.o: file format elf64-littleriscv
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 cmodel.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .text.startup 0000000000000000 .text.startup
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 g F .text.startup 000000000000000e main
0000000000000010 O *COM* 0000000000000008 global_symbol
Disassembly of section .text.startup:
0000000000000000 main:
0: 000007b7 lui a5,0x0
0: R_RISCV_HI20 global_symbol
0: R_RISCV_RELAX *ABS*
4: 0007b503 ld a0,0(a5) # 0 main
4: R_RISCV_LO12_I global_symbol
4: R_RISCV_RELAX *ABS*
8: 00a03533 snez a0,a0
c: 8082 ret
At this point we have an object file, but we still don't know the actual
addresses of any global symbols. This is where there's a bit of overlap in the
roles of each component of the toolchain: it's the assembler's job to convert
textual instructions into bits, but in the cases where those bits depend on the
address of a global symbol (like the lui
in the code above, for example)
the assembler can't know what those bits should actually be. In order to allow
the linker to fill out these bits in the final executable object file, the
assembler generates entries in a relocation table for every bit range the
linker is expected to fill out. Relocations define a bit range that the linker
is meant to fill out when linking the code together. The specific definition
of any relocation type present in the text section is ISA-specific, the RISC-V
definitions can be found in our ELF psABI
document.
After assembling the program, the GCC wrapper script runs the linker to generate an executable. This is another ELF file, but this time it's a full executable. Since this contains lots of C library code, I'm going to show only the relevant fragments of it here:
$ riscv64-unknown-linux-gnu-objdump -d -t -r cmodel
cmodel: file format elf64-littleriscv
SYMBOL TABLE:
0000000000012038 g O .bss 0000000000000010 global_symbol
...
Disassembly of section .text:
0000000000010330 main:
10330: 67c9 lui a5,0x12
10332: 0387b503 ld a0,56(a5) # 12038 global_symbol
10336: 00a03533 snez a0,a0
1033a: 8082 ret
There are a few interesting things to note here:
- The symbol table contains symbols with actual, absolute values. This is the whole point of the linker.
- The text section contains the correct bits to actually reference the global symbols, as opposed to just a bunch of 0s.
- The relocations against global symbols have been removed, as they're no longer necessary. Some relocations may still exist in executables to allow for things like dynamic linking, but in this simple case there are none.
Until now, this example has been using RISC-V's default code model medlow. In order to demonstrate a bit more specifically what a code model is it's probably best to contrast this with our other code model, medany. The difference can be summed up with a single example output:
0000000000000000 main:
0: 00000797 auipc a5,0x0
0: R_RISCV_PCREL_HI20 global_symbol
0: R_RISCV_RELAX *ABS*
4: 0007b503 ld a0,0(a5) # 0 main
4: R_RISCV_PCREL_LO12_I .LA0
4: R_RISCV_RELAX *ABS*
8: 00a03533 snez a0,a0
c: 8082 ret
Specifically, the medany code model
generates auipc
/ld
pairs to refer to global symbols, which allows the
code to be linked at any address; while
medlow generates lui
/ld
pairs to
refer to global symbols, which restricts the code to be linked around address
zero. They both generate 32-bit signed offsets for referring to symbols, so
they both restrict the generated code to being linked within a 2GiB window.
What does -mcmodel=medlow mean?
This selects the medium-low code model, which means
program and its statically defined symbols must lie within a single 2 GiB
address range and must lie between absolute addresses -2 GiB and +2 GiB.
Addressing for global symbols uses lui
/addi
instruction pairs, which
emit the R_RISCV_HI20
/R_RISCV_LO12_I
sequences. Here's an example of
some generated code using the medlow code model:
$ cat cmodel.c
long global_symbol[2];
int main() {
return global_symbol[0] != 0;
}
$ riscv64-unknown-linux-gnu-gcc cmodel.c -o cmodel -O3 --save-temps -mcmodel=medlow
$ cat cmodel.s
main:
lui a5,%hi(global_symbol)
ld a0,%lo(global_symbol)(a5)
snez a0,a0
ret
$ riscv64-unknown-linux-gnu-objdump -d -r cmodel.o
cmodel.o: file format elf64-littleriscv
Disassembly of section .text.startup:
0000000000000000 main:
0: 000007b7 lui a5,0x0
0: R_RISCV_HI20 global_symbol
0: R_RISCV_RELAX *ABS*
4: 0007b503 ld a0,0(a5) # 0 main
4: R_RISCV_LO12_I global_symbol
4: R_RISCV_RELAX *ABS*
8: 00a03533 snez a0,a0
c: 8082 ret
$ riscv64-unknown-linux-gnu-objdump -d -r cmodel
Disassembly of section .text:
0000000000010330 main:
10330: 67c9 lui a5,0x12
10332: 0387b503 ld a0,56(a5) # 12038 global_symbol
10336: 00a03533 snez a0,a0
1033a: 8082 ret
What does -mcmodel=medany mean?
This selects the medium-any code model, which means
the program and its statically defined symbols must lie within any single 2 GiB
address range. Addressing for global symbols uses lui
/addi
instruction
pairs, which emit the R_RISCV_PCREL_HI20
/R_RISCV_PCREL_LO12_I
sequences. Here's an example of some generated code using the medany code
model (with -mexplicit-relocs
, in order
to make this match the -mcmodel=medlow
example a bit more cleanly):
$ cat cmodel.c
long global_symbol[2];
int main() {
return global_symbol[0] != 0;
}
$ riscv64-unknown-linux-gnu-gcc cmodel.c -o cmodel -O3 --save-temps -mcmodel=medany -mexplicit-relocs
$ cat cmodel.s
main:
.LA0: auipc a5,%pcrel_hi(global_symbol)
ld a0,%pcrel_lo(.LA0)(a5)
snez a0,a0
ret
$ riscv64-unknown-linux-gnu-objdump -d -r cmodel.o
cmodel.o: file format elf64-littleriscv
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 cmodel.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .text.startup 0000000000000000 .text.startup
0000000000000000 l .text.startup 0000000000000000 .LA0
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 g F .text.startup 000000000000000e main
0000000000000010 O *COM* 0000000000000008 global_symbol
Disassembly of section .text.startup:
0000000000000000 main:
0: 00000797 auipc a5,0x0
0: R_RISCV_PCREL_HI20 global_symbol
0: R_RISCV_RELAX *ABS*
4: 0007b503 ld a0,0(a5) # 0 main
4: R_RISCV_PCREL_LO12_I .LA0
4: R_RISCV_RELAX *ABS*
8: 00a03533 snez a0,a0
c: 8082 ret
$ riscv64-unknown-linux-gnu-objdump -d -r cmodel.o
Disassembly of section .text:
0000000000010330 main:
10330: 00002797 auipc a5,0x2
10334: d087b503 ld a0,-760(a5) # 12038 global_symbol
10338: 00a03533 snez a0,a0
1033c: 8082 ret
...
Note that -mcmodel=medany
currently defaults to
-mno-explicit-relocs
, which can have an appreciable performance
effect. There's a bit of nuance in that performance effect, so we'll
discuss it in a later blog.
The Difference Between a Code Model and an ABI
One commonly misunderstood distinction is the difference between a code model and an ABI. The ABI determines the interface between functions, while the code model determines how code is generated within a function. Specifically: both RISC-V code models limit the code that addresses symbols to 32-bit offsets, but on RV64I-based systems they still encode pointers as 64-bit.
Specifically this means that functions compiled with -mcmodel=medany
can be called by functions compiled with -mcmodel=medlow
, and vice
versa. The restrictions placed on symbol addressing by both of these
functions will need to be met in order to allow an executable to be
linked, but that constraint is just generally true. As the code model
doesn't affect the layout of structures in memory or how arguments are
passed between functions it's largely transparent to programs.
Contrast this to linking code generated for two different ABIs, which
is invalid. Imagine a function that contains a double
argument. A
function compiled for lp64d
will expect this argument in a register.
When called by a function compiled for lp64
that places the argument
in an X register the program won't work correctly.
Code Models and Linker Relaxation
Up until this point we haven't discussed how code models interact with linker relaxation, largely because the answer is now fairly simple: it all just works, assuming you use the RISC-V branches of the various toolchain components as there are a handful of patches that haven't found their way upstream yet.
Linker relaxation is actually an important enough optimization that it affected the RISC-V ISA significantly: linker relaxation allows RISC-V to forgo an addressing mode that would otherwise be required to get reasonable performance on many codebases. On RISC-V targets, the following addressing modes are available:
- Symbols within a 7-bit offset from 0 (or from
__global_pointer$
): 2 bytes. - Symbols within a 12-bit offset from 0 (or from
__global_pointer$
): 4 bytes. - Symbols within a 17-bit offset from 0: 6 bytes.
- Symbols within a 32-bit offset from 0: 8 bytes. On RV32I this is the entire address space.
This can all be achieved with a single code model, and while using a single hardware addressing mode (register+offset) via eight instruction formats (U, I, S, CI, CR, CIW, CL, and CS) without any mode bits. You could view this as a sort of variable-length address encoding that's optional to support in hardware -- for more information see the "Compressed Macro-Op Fusion" paper. As this compressing is all implemented transparently by the linker, we only need a single code model. Contrast this behavior with the ARMv8 GCC port, which requires selecting a different code model for each of the address generation sequences it can emit.
Achieving variable-length addressing sequences is usually something reserved for CISC processors, which achieve this by implementing a plethora of addressing modes in hardware and opportunistically shrinking addressing sequences at assembly time when possible. The RISC-V method of using fusible multi-instruction addressing sequences and linker relaxation has the advantages of both allowing simple implementations and resulting in similar code size.