SiFive - September 18, 2017
All Aboard, Part 5: Per-march and per-mabi Library Paths on RISC-V Systems
A previous blog
described how the
-march
and -mabi
command-line arguments to GCC can be used to
control code generation for the sources you compile as a user, but most
programs require linking against system libraries in order to function
correctly. Since users generally don't want to compile every library
along with their program, either because they're too complicated or
because they're meant to be shared, a mechanism is needed for linking against
the correct set of system libraries to match the ISA of the user's
target system and the ABI of the user's generated code.
The mechanism for handling multiple sets of system libraries is known as "multilib". Like most parts of the RISC-V toolchain, the multilib mechanism is shared between all architecture ports but the specifics of how it applies to RISC-V is specific to our ISA. As RISC-V is a modular ISA, it was natural to have extensive multilib support from the start. This allows our multilib implementation to be significantly cleaner than a lot of other architectures, which is good because the plethora of ISAs and ABIs we have necessitates good multilib support.
The GCC Compiler Wrapper
As discussed in an earlier blog post, the gcc
command that users
directly interact with is actually just a wrapper that calls each step
in the toolchain in order: preprocess, compile, assemble and link your
program. gcc
isn't actually a script, but instead a small C program
that orchestrates the compilation. The architecture-specific hooks for
this program consist of a domain-specific language that is specific to
GCC's command-line argument handling and describes how the argument to the
gcc
wrapper should be transformed as they are passed to the various
other tools that are called.
In order to ensure things are sufficiently complicated, there are three
different languages used to describe how paths are mangled between the
user's invocation of gcc
and the invocation of cc1
or collect2
that actually does the work. All of these are specific to GCC
command-line argument parsing. The gcc
command-line wrapper uses
these tools in various combinations to specify the following multilib
related arguments:
- The assembler needs to know the ELF class to generate, either ELF32 or ELF64 depending on the target processor's architecture.
- The linker needs to know the link-time paths that should be searched for libraries.
- The assembler needs to know the ABI, so it can fill out the relevant ELF flags. This lets the linker to disallow linking objects of different ABIs, which would be incompatible.
- The linker needs to know the path to the dynamic linker, so it can fill out the ELF interpreter field. The dynamic linker has paths built into it to know where to search for libraries.
- The linker needs to know the C runtime files that should be linked
into executables, as well as any additional libraries that should be
linked in by default such as
libatomic
orlibgloss
.
All these tools are somewhat coupled together, so we'll go over each below and describe which of the above arguments each tool helps specify.
*_SPEC
Domain-Specific Language
The lowest-level, and therefore most general, of the three languages used
in describing GCC command-line argument handling is the language used in
the various *_SPEC
macros that targets can define. These macros
describe the transformations used to convert the command-line arguments
for every tool GCC calls, so while they're not specific to multilib path
handling they're used to produce the full set of argument to the linker
so I felt they were at least worth mentioning. One macro is defined for
each of the target programs: for example ASM_SPEC
defines how to
transform command-line arguments for the assembler, LINK_SPEC
for the
linker, etc.
The *_SPEC
macros control a string-to-string transformation that
converts the command-line arguments of the gcc
command to those passed
to another command. While I recall at some point having seen some
documentation on what can go in these macros, the best I can find right now
lives in the Controlling the Compilation Driver
section of the
GCC documentation. Since that doesn't really specify how any of that
works, I'll try to describe the bits we actually use here -- most of our
port came from reading the code in other ports and from trial and error.
As an example of how one of these *_SPEC
lines behaves, let's look at
RISC-V's STARTFILE_PREFIX_SPEC
macro, which determines where the
linker should look for C runtime startup files like crt0.o
:
#define XLEN_SPEC \
"%{march=rv32*:32}" \
"%{march=rv64*:64}" \
#define ABI_SPEC \
"%{mabi=ilp32:ilp32}" \
"%{mabi=ilp32f:ilp32f}" \
"%{mabi=ilp32d:ilp32d}" \
"%{mabi=lp64:lp64}" \
"%{mabi=lp64f:lp64f}" \
"%{mabi=lp64d:lp64d}" \
#define STARTFILE_PREFIX_SPEC \
"/lib" XLEN_SPEC "/" ABI_SPEC "/ " \
"/usr/lib" XLEN_SPEC "/" ABI_SPEC "/ " \
"/lib/ " \
"/usr/lib/ "
This is a pretty standard *_SPEC
definition for RISC-V: they consume
the entire set of gcc
command-line arguments as a space-separated
list, filter that through some pattern matching, perform a substitution
and then pass the result as the space-separated argument list to some
other command. We only use a handful of patterns:
"STRING"
: pass"STRING"
directly into the output. Anything not wrapped in%{
and}
is passed directly to the output.%{argument}
: if-argument
is in the input as a whole word, then pass-argument
to the output.%{argument:substitution}
: if-argument
is in the input as a whole word, then pass the substitution into the output. These recurse, so something like%{arg1:%{arg2:-arg3}}
passes-arg3
if both-arg1
and-arg2
.%{glob:substitution}
: if an argument matches-glob
, passsubstitution
to the output. Like above,substitution
can be recursive. The best reference I could find for the glob syntax is that it looks like very simple shell globbing. For example,%{march=rv32*:32}
will pass32
if passed any of-march=rv32i
,-march=rv32imafdc
, or-march=rv32INVALID_ISA_STRING
(though of course GCC will catch the last one as part of command-line argument parsing).%{!glob:substitution}
: like the above, but passes substitution if-glob
isn't present.
That's about the extent of what we put in the *_SPEC
macros used by
the RISC-V port: not all that interesting, just a bit of text pattern
matching.
-march=rv32imafdc -mabi=ilp32d
:/lib32/ilp32d/ /usr/lib32/ilp32d/ /lib /usr/lib
-march=rv32imafdc -mabi=ilp32
:/lib32/ilp32/ /usr/lib32/ilp32/ /lib /usr/lib
-march=rv32i -mabi=ilp32
:/lib32/ilp32/ /usr/lib32/ilp32/ /lib /usr/lib
-march=rv64i -mabi=ilp32
:/lib64/ilp32/ /usr/lib64/ilp32/ /lib /usr/lib
Target Fragments
Since the multilib path descriptions for many targets are too complicated to be described using the spec DSL, GCC contains a second DSL that's used exclusively to specify the library paths in multilib systems. There's a bit more documentation on what this should do in GCC's target fragment section. To sum things up, there's four variables set in this file by the RISC-V port:
MULTILIB_OPTIONS
: Contains the set of command-line arguments that should be considered when expanding multilib paths. Options that are mutually exclusive are separated by slashes, and groups of those that are unrelated are separated by spaces.MULTILIB_PATHS
: A space-separated list of the path components that correspond to each of the above arguments. Multilib paths will be constructed by joining the paths that correspond to the passed arguments with slashes.MULTILIB_MATCHES
: When two multilib-related arguments are similar enough that we should use the same library paths when linking in both modes, the mappings go in here.MULTILIB_REQUIRED
: Without this argument, GCC will build libraries that cover the cartesian product of what's inMULTILIB_OPTIONS
. On systems where that's too many libraries, this variable controls the subset that's actually built.
On RISC-V we have way too many ISA/ABI combinations to build every
combination and ship it as a library, so we heavily restrict the set that
is actually built via the MULTILIB_REQUIRED
variable -- without this
we'd end up with hundreds of libraries built, the vast majority of which
would never be used because they represent systems that don't make a
whole lot of sense -- for example, who would build a system with
double-precision floating point but no integer multiplier?
These variables are then provided as arguments to the gcc/genmultilib
script, which produces both the tables to decode these arguments that
the gcc
wrapper uses and the input to various build scripts that
instruct GCC to build many copies of each library it installs (for
example, libgcc.so
).
RISC-V's multilib-generator
Script
RISC-V was designed to be a modular ISA. As a result we already have over a hundred ISA and ABI combinations supported by the toolchain, and that number will only ever increase. While we aim to support all these combinations in the toolchain, it would be unreasonable to expect users to build all of these libraries (or even to download all of them as part of a distribution).
To fit this all into GCC's target fragment framework we set
MULTILIB_OPTIONS
to contain many targets and then set
MULTILIB_REQUIRED
to the set we actually want to build. We then
slightly increase the set of supported ISA/ABI pairs by adding some
relevant entries to MULTILIB_MATCHES
. Since typing
all these in by hand is a pain, we instead use a script to generate our
target fragment (which in turn is the input to the genmultilibs
script, which then generates the input to the gcc
compiler wrapper,
which then generates command-line arguments to collect2
to actually
do the linking).
The script is called multilib-generator
and is written in Python. It
takes a list of dash separated arguments on the command line and
produces a target fragment that implements the multilib configuration
that those arguments describe. The script isn't really meant to be used
by end users so it's not well documented, but if you're trying to
produce a toolchain with a different set of multilibs than the default
set in GCC then you'll have to deal with it.
Each argument is made up of four dash-separated parts. The first two parts control the multilibs that will actually be built. For example:
# This file was generated by multilib-generator with the command:
# ./multilib-generator ARCH0-ABI0-- ARCH1-ABI1--
MULTILIB_OPTIONS = march=ARCH0/march=ARCH1 mabi=ABI0/mabi=ABI1
MULTILIB_DIRNAMES = ARCH0 \
ARCH1 ABI0 \
ABI1
MULTILIB_REQUIRED = march=ARCH0/mabi=ABI0 \
march=ARCH1/mabi=ABI1
MULTILIB_REUSE =
will generate two multilibs: "-march=ARCH0 -mabi=ABI0" and "-march=ARCH1
-mabi=ABI1". Any other march/mabi pair will result in GCC using the default
multilib (the one just installed in "lib"), which will probably cause an error
when linking. This "fallback to the default" behavior is something baked
into GCC, and while it can be a bit problematic, we don't have the time to
fix it right now. If you want to build an extra multilib, you should add
an additional argument to multilib-generator
that specifies the
ISA/ABI pair for that multilib.
# This file was generated by multilib-generator with the command:
# ./multilib-generator ARCH0-ABI0-ARCHa,ARCHb-
MULTILIB_OPTIONS = march=ARCH0/march=ARCHa/march=ARCHb mabi=ABI0
MULTILIB_DIRNAMES = ARCH0 \
ARCHa \
ARCHb ABI0
MULTILIB_REQUIRED = march=ARCH0/mabi=ABI0
MULTILIB_REUSE = march.ARCH0/mabi.ABI0=march.ARCHa/mabi.ABI0 \
march.ARCH0/mabi.ABI0=march.ARCHb/mabi.ABI0
The next two parts control MULTILIB_REUSE
, which specifies how GCC
searches for multilibs that don't exactly match those built by
MULTILIB_REQUIRED
. Both specify an additional set of comma-separated
'-march' arguments that map to the multilib specified by the first two
arguments.
Arguments of the third position are simpler: it's a comma-separated list of additional ISA values that should be mapped to the multilib specified by the first two parts. For example:
# This file was generated by multilib-generator with the command:
# ./multilib-generator ARCH0-ABI0-ARCHa,ARCHb-
MULTILIB_OPTIONS = march=ARCH0/march=ARCHa/march=ARCHb mabi=ABI0
MULTILIB_DIRNAMES = ARCH0 \
ARCHa \
ARCHb ABI0
MULTILIB_REQUIRED = march=ARCH0/mabi=ABI0
MULTILIB_REUSE = march.ARCH0/mabi.ABI0=march.ARCHa/mabi.ABI0 \
march.ARCH0/mabi.ABI0=march.ARCHb/mabi.ABI0
adds two additional ISAs that map the generated multilib: "-march=ARCH0 -mabi=ABI0" will be used when passed any of "-march=ARCH0 -mabi=ABI0", "-march=ARCHa -mabi=ABI0", or "-march=ARCHb -mabi=ABI0". You can specify these when there is more than one generated multilib, the additional ISAs apply to the multilib that's in the same argument.
The fourth argument is very similar to the first, but rather than specifying the whole ISA that should be mapped to the specified multilib, it just specifies an additional suffix that should be mapped. For example:
# This file was generated by multilib-generator with the command:
# ./multilib-generator ARCH0-ABI0--c,d
MULTILIB_OPTIONS = march=ARCH0/march=ARCH0c/march=ARCH0d mabi=ABI0
MULTILIB_DIRNAMES = ARCH0 \
ARCH0c \
ARCH0d ABI0
MULTILIB_REQUIRED = march=ARCH0/mabi=ABI0
MULTILIB_REUSE = march.ARCH0/mabi.ABI0=march.ARCH0c/mabi.ABI0 \
march.ARCH0/mabi.ABI0=march.ARCH0d/mabi.ABI0
adds two additional ISAs that map the generated multilib: "-march=ARCH0 -mabi=ABI0" will be used when passed any of "-march=ARCH0 -mabi=ABI0", "-march=ARCH0c -mabi=ABI0", or "-march=ARCH0d -mabi=ABI0" -- as you can see, largely the same as above
Other Multilib-Aware Components
While GCC handles the vast majority of the multilib support, there's a handful of other components of the system that contribute in other ways to our multilib support:
ld
, the linker, refuses to link objects with incompatible ABIs. While this doesn't directly support multilib, it does prevent it from getting screwed up silently.ld.so
, the dynamic loader, has some multilib paths baked into it so it can search for libraries correctly. We compile one dynamic loader for each multilib and then use GCC to fill out the corresponding ELF interpreter field, so there's not much going on in glibc here.
The Short Way
You might be thinking "that's super complicated, all I really want to do here is just know which library paths are used by my compiler". While you could derive this from looking at the GCC source code, it's simpler to just determine the multilib set experimentally using something like the following script:
#!/bin/bash
for abi in ilp32 ilp32f ilp32d lp64 lp64f lp64d; do
for isa in rv32e rv32i rv64i; do
for m in "" m; do
for a in "" a; do
for f in "" f fd; do
for c in "" c; do
readlink -f $(riscv64-unknown-elf-gcc -march=$isa$m$a$f$c -mabi=$abi -print-search-dirs | grep ^libraries | sed 's/:/ /g') | grep 'riscv64-unknown-elf/lib' | grep -ve 'lib$' | sed 's@^.*/lib/@@' | while read path; do
echo "riscv64-unknown-elf-gcc -march=$isa$m$a$f$c -mabi=$abi => $path"
done
done
done
done
done
done
done
which produces the entire set of multilibs we support, along with their corresponding arguments:
riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 => rv32i/ilp32
riscv64-unknown-elf-gcc -march=rv32ic -mabi=ilp32 => rv32i/ilp32
riscv64-unknown-elf-gcc -march=rv32iac -mabi=ilp32 => rv32iac/ilp32
riscv64-unknown-elf-gcc -march=rv32im -mabi=ilp32 => rv32im/ilp32
riscv64-unknown-elf-gcc -march=rv32imc -mabi=ilp32 => rv32im/ilp32
riscv64-unknown-elf-gcc -march=rv32imac -mabi=ilp32 => rv32imac/ilp32
riscv64-unknown-elf-gcc -march=rv32imafc -mabi=ilp32f => rv32imafc/ilp32f
riscv64-unknown-elf-gcc -march=rv32imafdc -mabi=ilp32f => rv32imafc/ilp32f
riscv64-unknown-elf-gcc -march=rv64imac -mabi=lp64 => rv64imac/lp64
riscv64-unknown-elf-gcc -march=rv64imafdc -mabi=lp64d => rv64imafdc/lp64d
or for the Linux toolchain:
riscv64-unknown-linux-gnu-gcc -march=rv32ima -mabi=ilp32 => lib32/ilp32
riscv64-unknown-linux-gnu-gcc -march=rv32imac -mabi=ilp32 => lib32/ilp32
riscv64-unknown-linux-gnu-gcc -march=rv32imaf -mabi=ilp32 => lib32/ilp32
riscv64-unknown-linux-gnu-gcc -march=rv32imafc -mabi=ilp32 => lib32/ilp32
riscv64-unknown-linux-gnu-gcc -march=rv32imafd -mabi=ilp32 => lib32/ilp32
riscv64-unknown-linux-gnu-gcc -march=rv32imafdc -mabi=ilp32 => lib32/ilp32
riscv64-unknown-linux-gnu-gcc -march=rv32imafd -mabi=ilp32d => lib32/ilp32d
riscv64-unknown-linux-gnu-gcc -march=rv32imafdc -mabi=ilp32d => lib32/ilp32d
riscv64-unknown-linux-gnu-gcc -march=rv64ima -mabi=lp64 => lib64/lp64
riscv64-unknown-linux-gnu-gcc -march=rv64imac -mabi=lp64 => lib64/lp64
riscv64-unknown-linux-gnu-gcc -march=rv64imaf -mabi=lp64 => lib64/lp64
riscv64-unknown-linux-gnu-gcc -march=rv64imafc -mabi=lp64 => lib64/lp64
riscv64-unknown-linux-gnu-gcc -march=rv64imafd -mabi=lp64 => lib64/lp64
riscv64-unknown-linux-gnu-gcc -march=rv64imafdc -mabi=lp64 => lib64/lp64
riscv64-unknown-linux-gnu-gcc -march=rv64imafd -mabi=lp64d => lib64/lp64d
riscv64-unknown-linux-gnu-gcc -march=rv64imafdc -mabi=lp64d => lib64/lp64d
The Rationale Behind Our Multilib Sets
While it may seem like the set of multilibs that are part of our default set is somewhat arbitrary, we actually put quite a lot of thought into each one. Most of the work here went into the embedded set, so let's just go through the list and describe why each one exists:
rv32i/ilp32
: The simplest RISC-V ISA. While we don't expect this to see much commercial use, we expect that it'll get a lot of educational and hobbyist use. Also, it seems a bit odd not to support the base ISA well -- as otherwise what's the point of one :).rv32iac/ilp32
: Despite there being lots of tricks to produce small multipliers that are arbitrarily slow, some people seem to be allergic to hardware multiplication. This target is there to satisfy those people.rv32im/ilp32
: This exists largely to support cores retrofitted from other ISAs where simple memory systems preclude the implementation of both the A and C extensions.rv32imac/ilp32
: We expect this to get lots of use, it's probably what you'd want to build if you're building a standalone microcontroller chip.rv32imafc/ilp32f
: A 32-bit, floating-point target. The other option here would have beenrv32imafdc/ilp32d
, but we chose this instead under the assumption that if you could deal with having a 64-bit FPU that you'd probably just want to build a 64-bit core.rv64imac/lp64
: This will probably be the RISC-V ISA configuration that has the largest number of cores produced for the near future, as there aren't any good options for deeply embedded cores (think power management units, IP control cores, etc) that can talk to SOCs with addresses spaces larger than 32 bits.rv64imafdc/lp64d
: The "full featured" embedded core. These probably won't be produced as embedded cores directly, but we think that people will repurpose Linux-class cores as embedded cores as Linux isn't that expensive on RISC-V.
We didn't want the list to become too large, so we decided to limit it to this set. We put less thought into the Linux configurations, as things tend to be a bit more normal in larger systems. Here we just decided to support four library configurations: the Cartesian product of 32/64 bit and soft/hard float.
Changing the Multilib Sets
While we tried to ensure that a reasonable set of libraries are built as part of the default toolchain build, you might want something slightly different. You have a few options here:
- Build a non-multilib toolchain so everything will have your ISA/ABI combination. This is the easiest option, but if you're shipping something you should at least run the GCC test suite against your combination of choice as targets outside the default multilib set get less testing.
- Petition the GCC developers to add a
MULTILIB_MATCHES
that provides a library compiled with a slightly different set of flags for the ISA you're interested in. This is ideal if your desired ISA doesn't get used much in the C library: for example, a good candidate for addition might be to make-march=rv64imafdc -mabi=lp64
match with therv64imac/lp64
libraries, as newlib doesn't do much floating-point stuff. This is low overhead, so we'll probably accept your suggestion. - Petition the GCC developers to add a
MULTILIB_REQUIRED
that provides your desired ISA/ABI combination. This is higher overhead than adding toMULTILIB_MATCHES
, as it results in a higher support burden. If there's commonly used silicon available for a ISA then we'll strongly consider adding it to the default set, as the whole point of multilib is to avoid the need for multiple toolchains. - Fork the toolchain and change the default multilib set. This isn't a
desired option, and we request if you do then you pick a different
tuple to indicate you have a non-standard build. For example, you
might pick
riscv64-my_company-elf
instead ofriscv64-unknown-elf
to indicate that "My Company" is providing a non-standard toolchain. As theunknown
field isn't really defined, no program should be looking at it so you should be safe. We'd really like to avoid toolchain forks if possible, so please at least contact us to talk first!
I think that's about all there is to the RISC-V multilib implementation, so hopefully there won't be any more coverage on it in this blog series. We'll try to get back to covering slightly more interesting topics next week :).