Moritz Systems have started a new contract with the FreeBSD Foundation
to continue our work on modernizing
the LLDB debugger’s support for FreeBSD. Throughout the previous contract we have
introduced a FreeBSD Remote Process Plugin utilizing the mode modern
client-server layout of LLDB.
We have managed to achieve the feature parity with the original FreeBSD
plugin on the x86 architecture. However, as of today other
architectures still use the original. During the next two months,
we are going to work on bringing the remaining previously supported
architectures to the new layout, with special focus on providing first
class support for ARM64 (also known as AArch64) architecture. Afterwards, we are going to
continue improving FreeBSD support in LLDB.
The complete Project Schedule is divided into four milestones, each
taking approximately one month:
- M1 Switch all the non-x86 CPUs to the LLDB FreeBSD
- M2 Iteration over regression tests on ARM64 and fixing known bugs,
marking the non-trivial ones for future work. Remove the old
- M3 Implement follow-fork and follow-vfork operations on par with
the GNU GDB support. Cover the functionality with LLDB regression
- M4 Implement SaveCore functionality for FreeBSD and enhance
the regression testing of core files in LLDB. Update the FreeBSD
Cross-compiling LLDB to other FreeBSD architectures
A short introduction to cross-compilation
Cross-compilation is a technique permitting to use a compiler running
on one platform to create executables for another platform. It can
be used to build software for another CPU architecture (e.g. ARM64
executables on an x86 system) or e.g. for another operating system
(e.g. FreeBSD packages from Linux), or both.
The most common use case for cross-compilation is to use a single
development environment to produce executables for multiple target
platforms. What’s really important for our case, it permits building
software much faster than when running a native compiler via an emulator
or on hardware that is much less performant than modern x86 PCs
(e.g. commonly available ARM boards).
An important limitation of cross-compilation is that the resulting
executables cannot be executed on the platform running the compiler.
This means that the tools needed at build-time need to be built
separately – depending on the build system, this either needs to
be done manually or is done automatically as part of cross compilation.
This also means that the build scripts cannot perform tests that
require running the test program.
There are two main prerequisites to cross-compilation:
A cross-toolchain, i.e. the compiler and link editor capable
of producing executables for the target platform.
A sysroot, i.e. the system libraries and dependencies compiled
for the target platform.
Preparing the cross-compiler and sysroot
Ordinarily in order to obtain a cross-toolchain, you need to build
the compiler for a specific target. However, the Clang compiler
that is used by default on FreeBSD has integrated cross-compilation
support. Rather than rebuilding the whole compiler for each target,
it is sufficient to ensure that appropriate target support is enabled
at build time. Therefore, the standard Clang builds on FreeBSD are
sufficient to cross-build for ARM and ARM64.
Cross-compiling a FreeBSD sysroot is very similar to building it
natively. The only difference is the necessity of passing
TARGET_ARCH variable specifying the target architecture.
For example, to build arm64 sysroot we run the following commands
make -j$(sysctl -n hw.ncpu) buildworld TARGET_ARCH=aarch64
make -j$(sysctl -n hw.ncpu) installworld TARGET_ARCH=aarch64 \
make -j$(sysctl -n hw.ncpu) distribution TARGET_ARCH=aarch64 \
LLVM is using the CMake build system. CMake partially facilitates
cross-compilation itself, while the other part is handled
in LLVM-specific CMake files.
The first step towards cross-compiling CMake-based projects is to
create a toolchain file. This file is used to set some internal CMake
variables that cannot be directly overriden via the command-line.
One toolchain file can be shared between multiple projects, so it is
also a convenient place to set standard cross-related CMake options.
For our purpose,
toolchain-arm64.cmake contained the following
# Since we are using clang as the compiler and it is the default
# on FreeBSD, we do not need to override the compiler. However,
# we do need to pass a correct -target indicating the platform we're
# build for and the path to our sysroot.
"-target aarch64-unknown-freebsd13.0 --sysroot /sysroot/arm64")
"-target aarch64-unknown-freebsd13.0 --sysroot /sysroot/arm64")
# While this may seem redundant, setting it explicitly (even to the same
# value) actually causes CMake to consider itself to be cross-compiling.
# This is important since LLVM relies on CMAKE_CROSSCOMPILING being set.
# Force library search functions to use our sysroot. Make sure it never
# uses programs from the sysroot (since we can't execute them). Headers
# and libraries have to be taken from sysroot, on the other hand.
In addition to that, a few additional options needed to be passed
to CMake. The following snippet explains them, in the Bash array form:
# Path to the source directory
# Use the Ninja generator since it's faster and has cleaner output
# than Makefiles.
# -Os builds reduce space usage while maintaining good performance.
# LLVM uses complex C++ that normally has a tendency towards
# creating huge object files.
# Enable assertions to aid debugging.
# Build LLVM, Clang and LLDB.
# Set the toolchain to built for aarch64 by default.
# Use shared libs to speed up linking and avoid huge interim static
# Use our toolchain file.
Once LLVM is configured this way, the regular
ninja calls can be used
to build the project. The build system will automatically configure
NATIVE subdirectory containing utilities that need to be executed
during the build, while the rest of the project will be built for ARM64.
Working on additional architectures
Architecture-specific code in LLDB
While a significant part of
ptrace(2) API used by debuggers,
and therefore of the debugger itself is architecture-agnostic,
it is practically impossible to debug programs without specific support
for the processor in question. For example, the debugger benefits
from support for deassembling the code, generating function calls,
inspecting registers, etc.
A large part of the architecture support is generic and shared between
different operating systems. Moreover, a part of FreeBSD-specific
architecture support is shared between the legacy and new plugins.
Therefore, in order to extend the new plugin to support additional
architectures we mostly needed to introduce the ‘glue’ binding the new
plugin with the architecture support. However, we also had to implement
support for inspecting and modifying registers (in more modern form
than used by the legacy plugin), choosing opcodes for software
breakpoints and fix existing bugs in platform support.
In the following subsections, we will discuss shortly the architectures
we were working on, and the specifics that we needed to research
in order to proceed.
ptrace(2) register groups
ptrace(2) API defines three pairs of requests
for getting and setting registers, effectively splitting the registers
visible to userland programs into three groups: General-Purpose
Registers (GPRs), Floating-Point Unit Registers (FPRs) and Debug
General-Purpose Registers are the baseline set of processor’s registers
exposed to userland programs. This group includes generic registers
that can be used to store arbitrary data by the program (usually
integers or memory addresses) and special CPU registers with predefined
meaning. Two common examples of special registers are the Program
Counter that is used to help the memory address of the code being
executed currently, and the flag register that is used to expose part
of the CPU state and boolean results of executed instructions.
The non-special registers also often have functions predefined
by the platform’s ABI. For example, often one of the registers
is dedicated to be the Stack Pointer. While the program could
technically use it on some CPUs for another purpose, it must preserve the original value
Floating-Point Unit Registers are the registers used to store
and perform computation on floating-point numbers. This class
of registers is only present on processors implementing hardware
Finally, Debug Registers are special CPU registers used to aid
debuggers. Usually, they are used to enable hardware-assisted
breakpoints and watchpoints.
ARM (and AArch64)
ARM is a family CPU architectures maintained by ARM Ltd., primarily used on embedded platforms.
The processors up to ARMv7 were pure 32-bit. ARMv8 featured an optional
(though present in the vast majority of ARMv8 processors) 64-bit
architecture, often called AArch64 or ARM64.
All 32-bit ARM processors feature 15 general purpose registers,
a program counter (that is the 16th register) and a flag register.
All of them are 32 bits wide.
The 32-bit ARM architecture did not originally feature a hardware
floating-point number support. There are two extensions remedying this:
VFP (Vector Floating Point) and Neon.
VFP is the earlier extension, existing in a number of versions. For our
purposes, it is sufficient to say that it includes either 16
-D16 versions of the VFP extension) or 32 (in the
versions) FPU registers, 64-bit wide.
Neon (also called Advanced SIMD) is focused on media and signal
processing. It shares the registers with VFP while introducing
the possibility of using them as 128-bit wide registers (with half
The AArch64 architecture features 32 general purpose registers and
a program counter (that is the 33th register), all of them 64-bit,
plus a 32-bit flag register. AArch64 also features 32 VFP/NEON-compatible
There is also a recent SVE (Scalable Vector Extension) extension that
provides for variable vector widths from 128 bits to 2048 bits. It is
not supported by FreeBSD at the moment.
a number of ARM and ARM64 boards. The FreeBSD wiki provides convenient
instructions on running AArch64 VM
and AArch64 VM images,
as well as instructions on running ARM via QEMU. We were able to successfully
use AArch64 VM but we were not able to boot one for ARMv7.
ptrace(2) API for 32-bit ARM uses
PT_SETREGS requests to work on general-purpose registers,
PT_SETVFPREGS machine-dependent requests
to work on floating-point (VFP/NEON) registers.
PT_SETDBREGS are stubs.
One interesting property of ARM is that it defines two Instruction
Set Architectures. The original ARM ISA encodes instructions in 32-bit
words. Newer processors also support Thumb ISA that uses more compact
but less flexible 16-bit encoding. The processor needs to be explicitly
switched between these two encodings. In order to insert a software
breakpoint, the debugger needs to know explicitly whether the code
is encoded using ARM or Thumb ISA, and use an appropriate opcode.
ptrace(2) API for AArch64 is covered by the standard
PT_SETREGS for the general-purpose
PT_SETFPREGS for floating-point
(VFP/NEON) registers and
PT_SETDBREGS for debug
registers (limited to hardware breakpoints at the moment).
MIPS is yet another architecture primarily
used for embedded products. MIPS I and II architectures were 32-bit,
MIPS III through V were 64-bit architectures (with 32-bit backwards
compatibility). Modern MIPS architectures are called MIPS32/MIPS64
since the specification permits both pure 32-bit and 64-bit processors.
One of the more curious features of 64-bit MIPS architecture is
the popularity of N32 ABI. This ABI combines 64-bit code with 32-bit
pointers. This makes it possible to reduce the program’s memory
footprint at the cost of limiting it to 4 GiB of memory. Given
embedded platforms often have less memory than that, it can be quite
useful. For comparison, a similar X32 ABI for x86 is barely known.
MIPS features 32 general-purpose registers that are either 32-bit
or 64-bit depending on the architecture. Of these, only 31 are actually
generally usable, while register
$0 has a constant value of zero.
Additionally, there are special HI/LO registers used to store
multiplication results, PC (Program Counter), Status Register,
Cause Register, Bad Virtual Address Register and more. MIPS also
features 32 FPU registers, each of them 64 bits wide.
Vector operations are provided by the MSA (MIPS SIMD Architecture)
extension that introduces 32 128-bit vector registers that are shared
with the FPU registers. FreeBSD’s
ptrace(2) API does not support
FreeBSD does not provide prebuilt MIPS images. However,
the QEMU recipes page
provides convenient instructions for building and booting a VM based
on the Malta Development Board.
The General Purpose Registers and FPU Registers are accessed
via the standard
ptrace(2) requests. Support for hardware-assisted
breakpoints, watchpoints or even single-stepping are not available.
MIPS support in LLDB specifically requires software single-stepping
PowerPC (in later versions called Power) architecture is found in a wide
range of products ranging from (older) gaming consoles and Macintosh
computers to servers used for HPC (High-Performance Computing).
It includes both 32-bit and 64-bit processors (PPC64).
PowerPC features 32 General Purpose Registers (32-bit for PPC, 64-bit
for PPC64) plus a few special purpose registers:
the Link Register (LR) providing the branch target address
the Condition Register (CR) that can store up to 8 results
of comparison/arithmetic operations
the XER Register (XER) used to indicate overflows and carry conditions
in integer arithmetics
the Count Register (CTR) that is used as a loop counter
the Program Counter Register (PC)
PowerPC also features 32 64-bit Floating Point Registers along with
a Floating-point Status and Control Register (FPSCR). Additionally,
the AltiVec extension provides 32 128-bit vector registers, and later
VSX extensions increase their number to 64.
FreeBSD provides install images for PPC and PPC64 hardware. However,
FreeBSD does not work on
and we have not managed to run it on qemu-system-ppc64 either although
apparently it is supposed to work.
The access to General Purpose Registers and FPU Registers is provided
via the standard
ptrace(2) requests. The AltiVec Registers
can be accessed via machine-specific
PT_SETVRREGS. The additional VSX Registers are exposed
The following table summarizes General-Purpose and Floating-Point Unit
Registers on the discussed architectures, and compares them to x86.
Comparison of GPRs (minus special CPU registers) and FPRs
on discussed architectures
up to 2048(SVE)
Of the discussed architectures, the vast majority features 32 General
Purpose Registers (32-bit or 64-bit). The only exceptions of that are
32-bit ARM processors that feature 15 GPRs (the 16th is used as Program
Counter) and… x86 variants, with 8 GPRs in 32-bit processors
and 16 GPRs in amd64.
All architectures except for old ARM processors provide a hardware
Floating Point Unit. Modern x86 systems provide two sets of FPRs:
8 80-bit x87 registers, or 8-32 64-bit registers provided by
SSE/AVX/AVX-512 extensions. On i386, only the first 8 registers are
visible. All other registers provide 32 64-bit FPRs, except for some
ARM processors with the VFP-D16 variant providing only 16 registers.
All the architectures also feature an extension for vector operations.
The baseline for that are 128-bit registers provided by the SSE
extension on x86 (8 on i386, 16 on amd64, can alternatively be used for
floating-point operations), NEON on ARM (16 128-bit registers
interchangeable for FPRs), MSA on MIPS (32 128-bit registers
interchangeable for FPRs) or AltiVec on PPC (16 128-bit registers).
The AVX extension for x86 extends them to 16 256-bit registers,
and AVX-512 extends to 32 512-bit registers (only the first 8 available
in 32-bit mode). The VSX extension on PPC increases the register number
to 32 without extending the size. The SVE extension on AArch64 extends
to 32-bit registers of implementation-defined size up to 2048 bits.
Changes merged upstream
The goal of the first milestone was to reach feature parity for non-x86
architecture support in the FreeBSD LLDB plugin, that is implement
the support for ARM, ARM64, MIPS64 and PowerPC targets with
the capabilities matching the legacy plugin. Once all the patches
are reviewed and merged, we’re going to remove the legacy plugin along
with obsolete code from LLDB.
Once the legacy plugin is gone, the way for additional enhancements
of the platform support will open. The potential enhancements include:
support for hardware breakpoints and watchpoints on ARM platforms
(pending kernel support on 32-bit ARM, and
support for Floating-Point Unit Registers on MIPS (blocked
by the necessity of breaking changes to the code shared between both
support for Vector Registers on PowerPC
We are also starting to work on the second milestone, that is running
the LLDB test suite on ARM64 and fixing test failures. The goal is to
provide first-class support for the ARM64 platform.