SiFive - September 11, 2017

All Aboard, Part 4: The RISC-V Code Models

The RISC-V ISA was designed to be both simple and modular. In order to achieve these design goals, RISC-V minimizes one of the largest costs in implementing complex ISAs: addressing modes. Addressing modes are expensive both in small designs (due to decode cost) and large designs (due to implicit dependencies). RISC-V only has three addressing modes:

PC-relative, via the auipc, jal and br* instructions.
Register-offset, via the jalr, addi and all memory instructions.
Absolute, via the lui instruction (though arguably this is just x0-offset).

These addressing modes have been carefully selected in order to allow for efficient code generation with a minimum of hardware complexity. We achieve this simplicity by relying on modern toolchains to optimize addressing in software -- this stands in stark contrast to traditional ISAs, which implement a plethora of addressing modes in hardware instead. Studies have shown that the RISC-V approach is sound: we are able to achieve similar code size in benchmarks while having vastly simpler decoding rules and a significant amount of free encoding space.

All these hardware complexity reductions come at the cost of increased software complexity. This blog post introduces another bit of software complexity in RISC-V: the concept of a code model. Just like relocations and relaxations, code models are not specific to RISC-V -- in fact, the RISC-V toolchain has fewer code models than most popular ISAs, largely because we rely on software optimizations instead of wacky addressing modes, which allows our addressing modes to be significantly more flexible.

What is a Code Model

Most programs do not fill the entire address space available to them with symbols (most don't fill it at all, but those that do tend to fill their address space with heap). ISAs tend to take advantage of this locality by implementing shorter addressing modes in hardware and relying on software to provide larger address modes. The code model determines which software addressing mode is used, and, therefore, what constraints are enforced on the linked program. Software addressing modes determine how the programmer sees addresses, as opposed to hardware addressing modes which determine how address bits in instructions are handled.

Code models are necessary due to the split between the compiler and the linker: when generating an unlinked object, the compiler doesn't know the absolute address of any symbol but it still must know what addressing mode to use as some addressing modes may require scratch registers to operate. As the compiler cannot generate actual addressing code, it generates addressing templates (known as relocations) that the linker can then fix up once it knows the actual addresses of each symbol. The code model determines what these addressing templates look like, and thus which relocations are emitted.

This is probably best explained with an example. Imagine the following C code:

long global_symbol[2];

int main() {
  return global_symbol[0] != 0;
}

Even though a single GCC invocation can produce a binary for this simple case, under the covers the GCC driver script is actually running the preprocessor, then the compiler, then the assembler and finally the linker. The --save-temps argument to GCC allows users to see all these intermediate files, and is a useful argument for poking around inside the toolchain.

$ riscv64-unknown-linux-gnu-gcc cmodel.c -o cmodel -O3 --save-temps

Each step in this run of the GCC wrapper script generates a file:

cmodel.i: The preprocessed source, which expands any preprocessor directives (things like #include or #ifdef).
cmodel.s: The output of the actual compiler, which is an assembly file (a text file in the RISC-V assembly format).
cmodel.o: The output of the assembler, which is an unlinked object file (an ELF file, but not an executable ELF).
cmodel: The output of the linker, which is a linked executable (an executable ELF file).

In order to understand why the code model exists, we must first examine this toolchain flow in a bit more detail. Since this is a simple source file with no preprocessor macros, the preprocessor run is pretty boring: all it does is emit some directives to be used if debugging information is later generated:

$ cat cmodel.i
# 1 "cmodel.c"
# 1 "built-in"
# 1 "command-line"
# 31 "command-line"
# 1 "/scratch/palmer/work/upstream/riscv-gnu-toolchain/build/install/sysroot/usr/include/stdc-predef.h" 1 3 4
# 32 "command-line" 2
# 1 "cmodel.c"
long global_symbol;

int main() {
  return global_symbol != 0;
}

The preprocessed output is then fed through the compiler, which generates an assembly file. This file is plain text that contains RISC-V assembly code and therefore is easy to read:

$ cat cmodel.s
main:
  lui   a5,%hi(global_symbol)
  ld    a0,%lo(global_symbol)(a5)
  snez  a0,a0
  ret

The generated assembly contains a pair of instructions to address global_symbol: lui and then ld. This imposes a constraint on the address that global_symbol can take on: it must be addressable by a 32-bit signed absolute constant (not 32-bit offset from some register or the PC, but actually a 32-bit address). Note that the restriction on symbol addresses is not related to the size of a pointer on this architecture: specifically pointers may still be 64 bits here, but all global symbols must be addressable by a 32-bit absolute address.

After the compiler generates assembly, the GCC wrapper script calls the assembler to generate an object file. This file is an ELF binary, which can be read with a variety of tools provided by Binutils. In case we'll use objdump to show the symbol table, disassemble the text section and show the relocations generated by the assembler:

$ riscv64-unknown-linux-gnu-objdump -d -t -r cmodel.o

cmodel.o:     file format elf64-littleriscv

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 cmodel.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l    d  .text.startup  0000000000000000 .text.startup
0000000000000000 l    d  .comment       0000000000000000 .comment
0000000000000000 g     F .text.startup  000000000000000e main
0000000000000010       O *COM*  0000000000000008 global_symbol

Disassembly of section .text.startup:

0000000000000000 main:
   0:   000007b7                lui     a5,0x0
                        0: R_RISCV_HI20 global_symbol
                        0: R_RISCV_RELAX        *ABS*
   4:   0007b503                ld      a0,0(a5) # 0 main
                        4: R_RISCV_LO12_I       global_symbol
                        4: R_RISCV_RELAX        *ABS*
   8:   00a03533                snez    a0,a0
   c:   8082                    ret

At this point we have an object file, but we still don't know the actual addresses of any global symbols. This is where there's a bit of overlap in the roles of each component of the toolchain: it's the assembler's job to convert textual instructions into bits, but in the cases where those bits depend on the address of a global symbol (like the lui in the code above, for example) the assembler can't know what those bits should actually be. In order to allow the linker to fill out these bits in the final executable object file, the assembler generates entries in a relocation table for every bit range the linker is expected to fill out. Relocations define a bit range that the linker is meant to fill out when linking the code together. The specific definition of any relocation type present in the text section is ISA-specific, the RISC-V definitions can be found in our ELF psABI document.

After assembling the program, the GCC wrapper script runs the linker to generate an executable. This is another ELF file, but this time it's a full executable. Since this contains lots of C library code, I'm going to show only the relevant fragments of it here:

$ riscv64-unknown-linux-gnu-objdump -d -t -r cmodel
cmodel:     file format elf64-littleriscv

SYMBOL TABLE:
0000000000012038 g     O .bss    0000000000000010              global_symbol
...

Disassembly of section .text:

0000000000010330 main:
 10330:       67c9                    lui     a5,0x12
 10332:       0387b503                ld      a0,56(a5) # 12038 global_symbol
 10336:       00a03533                snez    a0,a0
 1033a:       8082                    ret

There are a few interesting things to note here:

The symbol table contains symbols with actual, absolute values. This is the whole point of the linker.
The text section contains the correct bits to actually reference the global symbols, as opposed to just a bunch of 0s.
The relocations against global symbols have been removed, as they're no longer necessary. Some relocations may still exist in executables to allow for things like dynamic linking, but in this simple case there are none.

Until now, this example has been using RISC-V's default code model medlow. In order to demonstrate a bit more specifically what a code model is it's probably best to contrast this with our other code model, medany. The difference can be summed up with a single example output:

0000000000000000 main:
   0:   00000797                auipc   a5,0x0
                        0: R_RISCV_PCREL_HI20   global_symbol
                        0: R_RISCV_RELAX        *ABS*
   4:   0007b503                ld      a0,0(a5) # 0 main
                        4: R_RISCV_PCREL_LO12_I .LA0
                        4: R_RISCV_RELAX        *ABS*
   8:   00a03533                snez    a0,a0
   c:   8082                    ret

Specifically, the medany code model generates auipc/ld pairs to refer to global symbols, which allows the code to be linked at any address; while medlow generates lui/ld pairs to refer to global symbols, which restricts the code to be linked around address zero. They both generate 32-bit signed offsets for referring to symbols, so they both restrict the generated code to being linked within a 2GiB window.

What does -mcmodel=medlow mean?

This selects the medium-low code model, which means program and its statically defined symbols must lie within a single 2 GiB address range and must lie between absolute addresses -2 GiB and +2 GiB. Addressing for global symbols uses lui/addi instruction pairs, which emit the R_RISCV_HI20/R_RISCV_LO12_I sequences. Here's an example of some generated code using the medlow code model:

$ cat cmodel.c
long global_symbol[2];

int main() {
  return global_symbol[0] != 0;
}

$ riscv64-unknown-linux-gnu-gcc cmodel.c -o cmodel -O3 --save-temps -mcmodel=medlow

$ cat cmodel.s
main:
        lui     a5,%hi(global_symbol)
        ld      a0,%lo(global_symbol)(a5)
        snez    a0,a0
        ret

$ riscv64-unknown-linux-gnu-objdump -d -r cmodel.o
cmodel.o:     file format elf64-littleriscv

Disassembly of section .text.startup:

0000000000000000 main:
   0:   000007b7                lui     a5,0x0
                        0: R_RISCV_HI20 global_symbol
                        0: R_RISCV_RELAX        *ABS*
   4:   0007b503                ld      a0,0(a5) # 0 main
                        4: R_RISCV_LO12_I       global_symbol
                        4: R_RISCV_RELAX        *ABS*
   8:   00a03533                snez    a0,a0
   c:   8082                    ret

$ riscv64-unknown-linux-gnu-objdump -d -r cmodel
Disassembly of section .text:

0000000000010330 main:
   10330:       67c9                    lui     a5,0x12
   10332:       0387b503                ld      a0,56(a5) # 12038 global_symbol
   10336:       00a03533                snez    a0,a0
   1033a:       8082                    ret

What does -mcmodel=medany mean?

This selects the medium-any code model, which means the program and its statically defined symbols must lie within any single 2 GiB address range. Addressing for global symbols uses lui/addi instruction pairs, which emit the R_RISCV_PCREL_HI20/R_RISCV_PCREL_LO12_I sequences. Here's an example of some generated code using the medany code model (with -mexplicit-relocs, in order to make this match the -mcmodel=medlow example a bit more cleanly):

$ cat cmodel.c
long global_symbol[2];

int main() {
  return global_symbol[0] != 0;
}

$ riscv64-unknown-linux-gnu-gcc cmodel.c -o cmodel -O3 --save-temps -mcmodel=medany -mexplicit-relocs

$ cat cmodel.s
main:
        .LA0: auipc     a5,%pcrel_hi(global_symbol)
        ld      a0,%pcrel_lo(.LA0)(a5)
        snez    a0,a0
        ret

$ riscv64-unknown-linux-gnu-objdump -d -r cmodel.o
cmodel.o:     file format elf64-littleriscv

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 cmodel.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l    d  .text.startup  0000000000000000 .text.startup
0000000000000000 l       .text.startup  0000000000000000 .LA0
0000000000000000 l    d  .comment       0000000000000000 .comment
0000000000000000 g     F .text.startup  000000000000000e main
0000000000000010       O *COM*  0000000000000008 global_symbol

Disassembly of section .text.startup:

0000000000000000 main:
   0:   00000797                auipc   a5,0x0
                        0: R_RISCV_PCREL_HI20   global_symbol
                        0: R_RISCV_RELAX        *ABS*
   4:   0007b503                ld      a0,0(a5) # 0 main
                        4: R_RISCV_PCREL_LO12_I .LA0
                        4: R_RISCV_RELAX        *ABS*
   8:   00a03533                snez    a0,a0
   c:   8082                    ret

$ riscv64-unknown-linux-gnu-objdump -d -r cmodel.o
Disassembly of section .text:

0000000000010330 main:
   10330:       00002797                auipc   a5,0x2
   10334:       d087b503                ld      a0,-760(a5) # 12038 global_symbol
   10338:       00a03533                snez    a0,a0
   1033c:       8082                    ret
        ...

Note that -mcmodel=medany currently defaults to -mno-explicit-relocs, which can have an appreciable performance effect. There's a bit of nuance in that performance effect, so we'll discuss it in a later blog.

The Difference Between a Code Model and an ABI

One commonly misunderstood distinction is the difference between a code model and an ABI. The ABI determines the interface between functions, while the code model determines how code is generated within a function. Specifically: both RISC-V code models limit the code that addresses symbols to 32-bit offsets, but on RV64I-based systems they still encode pointers as 64-bit.

Specifically this means that functions compiled with -mcmodel=medany can be called by functions compiled with -mcmodel=medlow, and vice versa. The restrictions placed on symbol addressing by both of these functions will need to be met in order to allow an executable to be linked, but that constraint is just generally true. As the code model doesn't affect the layout of structures in memory or how arguments are passed between functions it's largely transparent to programs.

Contrast this to linking code generated for two different ABIs, which is invalid. Imagine a function that contains a double argument. A function compiled for lp64d will expect this argument in a register. When called by a function compiled for lp64 that places the argument in an X register the program won't work correctly.

Code Models and Linker Relaxation

Up until this point we haven't discussed how code models interact with linker relaxation, largely because the answer is now fairly simple: it all just works, assuming you use the RISC-V branches of the various toolchain components as there are a handful of patches that haven't found their way upstream yet.

Linker relaxation is actually an important enough optimization that it affected the RISC-V ISA significantly: linker relaxation allows RISC-V to forgo an addressing mode that would otherwise be required to get reasonable performance on many codebases. On RISC-V targets, the following addressing modes are available:

Symbols within a 7-bit offset from 0 (or from __global_pointer$): 2 bytes.
Symbols within a 12-bit offset from 0 (or from __global_pointer$): 4 bytes.
Symbols within a 17-bit offset from 0: 6 bytes.
Symbols within a 32-bit offset from 0: 8 bytes. On RV32I this is the entire address space.

This can all be achieved with a single code model, and while using a single hardware addressing mode (register+offset) via eight instruction formats (U, I, S, CI, CR, CIW, CL, and CS) without any mode bits. You could view this as a sort of variable-length address encoding that's optional to support in hardware -- for more information see the "Compressed Macro-Op Fusion" paper. As this compressing is all implemented transparently by the linker, we only need a single code model. Contrast this behavior with the ARMv8 GCC port, which requires selecting a different code model for each of the address generation sequences it can emit.