Triton v0.8 and ARMv7: A Guideline for Adding New Architectures
As you may have read in our previous blog post,
the release of Triton v0.8 came with a lot of features and improvements.
Support for the ARMv7 architecture is amongst the main contributions of this
new version.
This blog post provides some extra details about how we achieved it.
Furthermore, we would like to describe the process and general guidelines to
add new architectures to Triton. Contrarily to what one might think, the process
is pretty straightforward in terms of integration (the core does not need much
modifications). However, it needs some effort regarding development, which
ultimately depends on the complexity and quirks of the target architecture.
A quick introduction to the ARMv7 architecture
Let’s start with a very brief overview of the architecture. ARMv7 is a RISC
processor, with a Load/Store memory model (which means memory access is
restricted to specific instructions). It has thirteen general-purpose 32-bit
registers (R0 to R12) and three 32-bit registers which have special
uses: SP (Stack Pointer), LR (Link Register), and PC (Program
Counter) (they can also be referred to as R13, R14, and R15,
respectively). Besides, there is a 32-bit Application Program Status Register
(APSR), which holds the flags (N, Z, C and V).
One peculiar aspect of the architecture is that it has two main execution
modes: ARM and Thumb (instructions are encoded for one or the other).
Transitions between these two modes can occur anytime during execution (only
through specific instructions, though). Instructions encoded for ARM mode are
fixed in size, 4 bytes; whereas those encoded for Thumb can be 2 or 4 bytes
long. Another interesting feature, is that most instructions are conditional,
that is, they execute (or not) based on the current values of the flags.
Lastly, the memory also offers flexibility as data accesses can be either
little-endian or big-endian (just data, instructions are always
little-endian).
The ubiquity of ARM processors is one of the main reasons for adding support
for ARMv7 in Triton. ARMv7 is a widely popular architecture, particularly in
embedded devices and mobile phones. We wanted to bring the advantages of
Triton to this architecture (most tools are prepared to work on Intel
x86/x86_64 only). The other reason is to show the flexibility and
extensibility of Triton. ARMv7 poses some challenges in terms of
implementation given its many features and peculiarities (some of them quite
different from the rest of the supported architectures). Therefore, ARMv7
makes a great architecture to add to the list of supported ones.
Now without further ado, let’s describe all the necessary steps to implement a
new architecture in Triton.
Step 1: Describing registers specification and defining enums
The first step consists in describing the registers specification of the new
architecture. The description is defined in a *.spec* file and will be
interpreted as C/C++ macro definitions. The definitions are pretty
straightforward and must follow the following syntax for each register:
REG_SPEC(UPPER_NAME, LOWER_NAME, UPPER_BIT_POS, LOWER_BIT_POS, PARENT_REG, IS_MUTABLE)
UPPER_NAME and LOWER_NAME are the string name of the register (e.g:
R1 and r1). UPPER_BIT_POS and LOWER_BIT_POS are the bit
positions of the register in its bitvector. For ARMv7 these fields are mainly
used to define the size of the register. So for every ARMv7 register, their
lower bit position is 0 and their upper bit position is 31 but for
other architectures like x86, this field varies (e.g: the ah register has
an upper bit position to 15 and a lower bit position to 8). The
IS_MUTABLE field defines if the register is writable (e.g: ZXR in
AArch64 is immutable). Below the ARMv7 spec file we made for this
architecture:
// Thirteen general-purpose 32-bit registers, R0 to R12
REG_SPEC(R0, r0, triton::bitsize::dword-1, 0, R0, TT_MUTABLE_REG) // r0
REG_SPEC(R1, r1, triton::bitsize::dword-1, 0, R1, TT_MUTABLE_REG) // r1
[...]
REG_SPEC(R12, r12, triton::bitsize::dword-1, 0, R12, TT_MUTABLE_REG) // r12
REG_SPEC(SP, sp, triton::bitsize::dword-1, 0, SP, TT_MUTABLE_REG) // SP
REG_SPEC(R14, r14, triton::bitsize::dword-1, 0, R14, TT_MUTABLE_REG) // LR (r14)
REG_SPEC(PC, pc, triton::bitsize::dword-1, 0, PC, TT_MUTABLE_REG) // PC
REG_SPEC(APSR, apsr, triton::bitsize::dword-1, 0, APSR, TT_MUTABLE_REG) // APSR
REG_SPEC_NO_CAPSTONE(C, c, 0, 0, C, TT_MUTABLE_REG) // C (Carry)
REG_SPEC_NO_CAPSTONE(N, n, 0, 0, N, TT_MUTABLE_REG) // N (Negative)
REG_SPEC_NO_CAPSTONE(V, v, 0, 0, V, TT_MUTABLE_REG) // V (Overflow)
REG_SPEC_NO_CAPSTONE(Z, z, 0, 0, Z, TT_MUTABLE_REG) // Z (Zero)
As you can see, some flags are defined with REG_SPEC_NO_CAPSTONE instead of
REG_SPEC. The reason for this is the following. Capstone, the library Triton uses for disassembly,
defines the APSR register, which holds all 4 flags, as a “single”
register. However, we would like to be able to access each flag independently
from one another. REG_SPEC_NO_CAPSTONE is used for this purpose: it
defines a flag and states that it is not present in Capstone
(the values of the APSR register and each flag are “synchronized”).
Once the registers specification is done, we have to define enums for
instructions and registers. As mentioned, Triton uses Capstone to disassemble
opcodes, however, we define our own enums for things such as instructions
mnemonics. Why don’t we use Capstone enums? Our goal is to be as independent
as possible of any external library. For example, if we move away to another
disassembler, we don’t want to change the base code of our engines and
semantics. To avoid this scenario, we have to convert every Capstone enum into
a Triton enum. This is the role the following functions and they are basically
just switch cases:
- Arm32Specifications::capstoneRegisterToTritonRegister
- Arm32Specifications::capstoneInstructionToTritonInstruction
These functions are primarily used during the disassembly stage (next step).
Step 2: Creating a CPU interface
The second step consists in implementing what is called the CPU interface.
Basically, all architectures in Triton share the same interface. It provides
access to CPU registers, memory and also useful information such as which
registers are the program counter and the stack pointer. One of the most
important methods of this interface is disassembly which, as its name
clearly states, disassembles instructions provided by the user. The workflow
is the following: the user creates an instruction, sets the opcode and
address, and calls the processing method (here is where all the magic
happens). The code looks like this (using the Python bindings):
ctx = TritonContext(ARCH.ARM32)
# Set memory, PC, etc...
while pc != stop_address:
# Fetch next opcode.
opcode = ctx.getConcreteMemoryAreaValue(pc, 4)
# Create a Triton instruction.
instruction = Instruction(pc, opcode)
# Process the instruction (i.e., disassemble it and build its semantics).
ctx.processing(instruction)
# Update the program counter.
pc = ctx.getConcreteRegisterValue(ctx.registers.pc)
In turn, ctx.processing(instruction) calls the aforementioned
disassembly method. It uses Capstone to disassemble the instruction and
then uses the information supplied to fill the rest of the fields of the
instruction (basically, there is a translation from the Capstone
representation of an instruction to the Triton one, as explained in the
previous step).
For most architectures the job would be done by now. However, the ARMv7
architecture presents unique challenges (to be fair to ARM, every architecture
does). The disassembly method has to take into account the current
execution mode, which can be ARM or Thumb. Transitions between these two modes
can occur anytime in the code (although only through specific instructions,
such as branch and exchange instructions, or some selected instructions [1]
that have PC as their destination register). And when it does occur, the PC
register is updated (with the address of the next instruction to execute) and
its least significant bit is set to 0 when the target instruction is in ARM
mode or to 1 when it is in Thumb mode. Therefore, dealing with transitions in
Triton is simple. It only consists in checking when the PC register is set (it
is done in just one place) and setting a flag that states which mode it is
currently in (depending on it the instruction will be disassembled using one
mode or the other).
Besides specificities such as the one described above, the implementation of
the CPU interface is quite simple and straightforward. Anyone trying to
implement a new architecture ( 😉 ) can use any of the available ones (x86,
AArch64 and now ARMv7) as a reference.
Step 3: Describing the semantics
Each instruction modifies the state of the registers, memory and flags in a
precise way, we call this its semantics. This step shows how to write the
semantics of an instruction so every time we emulate one in Triton it does
exactly what it is supposed to do (accordingly to what the ARMv7 manual says).
Similarly to the previous step, there is a semantics interface which we have
to implement when adding a new architecture to Triton. This interface is quite
simple and has one method only, namely: buildSemantics. It is invoked
by the processing method after the disassembly of the instruction has
finished.
The method consists of a big switch statement that processes instructions
according to their mnemonics (for example: ID_INS_ADD, ID_INS_MOV,
which correspond to the ADD and MOV instructions). The handling of
each instruction is done in a separate method. The structure of such method is
roughly the following:
void Arm32Semantics::<MNEMONIC>_s(triton::arch::Instruction& inst) {
auto& dst = inst.operands[0];
auto& src1 = inst.operands[1];
auto& src2 = inst.operands[2];
/* Create symbolic operands */
auto op1 = this->symbolicEngine->getOperandAst(inst, src1);
auto op2 = this->symbolicEngine->getOperandAst(inst, src2);
/* Create the semantics */
auto node = <build semantics using the AstContext object>;
/* Create symbolic expression */
auto expr = this->symbolicEngine->createSymbolicExpression(inst, node, dst, "<MNEMONIC> operation");
/* Get condition code node (in case it is a conditional instruction) */
auto cond = node->getChildren()[0];
/* Spread taint */
this->spreadTaint(inst, ...);
/* Update symbolic flags */
if (inst.isUpdateFlag() == true) {
/* Update flags accordingly to the result of instruction. */
}
/* Update condition flag */
if (cond->evaluate() == true) {
/* In case it is a conditional execution instruction, make the
* necessary adjustments (for instance, let Triton know the instruction
* was in fact executed, switch execution modes, etc).
*/
}
/* Update the symbolic control flow */
this->controlFlow_s(inst, ...);
}
Each instruction is different and has specific needs (and/or quirks), however,
for most of them, the definition of their semantics looks similar to the
example code above. In the case of ARMv7, we had to account for various
aspects that made the implementation complex and, in some cases, even
cumbersome. Firstly, as already mentioned, ARMv7 has two main execution modes:
ARM and Thumb. Instructions are encoded for one or the other. Typically, they
look the same, nonetheless, they have some differences. Perhaps, the most
important one is the conditional execution. ARMv7 instructions are in their
vast majority conditional, that is, they execute (or not) depending on the
current values of the flags. For ARM, this information is encoded within each
instruction (enabled by a suffix, for example: ADDNE r0, r1, r2), whereas
for Thumb they require an extra instruction (IT [2]) to make it
conditional (for example: IT NE; ADD r0, r1, r2).
There are also more subtle differences which demand extra attention. For
instance, two instructions that look the same but whose operands behave
slightly differently (at least, according to Capstone). This is the case of
ASRS r0, r1, #2 (Arithmetic Shift Right, the S suffix states that the
flags should be updated), where the immediate (i.e., the #2) is
interpreted differently when encoded in ARM and Thumb (shown as Shift and
as operands[2], respectively). Below you can see the differences:
$ cstool -d arm "x41x01xb0xe1" 1000
1000 41 01 b0 e1 asrs r0, r1, #2
op_count: 2
operands[0].type: REG = r0
operands[0].access: WRITE
operands[1].type: REG = r1
operands[1].access: READ
Shift: 1 = 2
Update-flags: True
Registers read: r1
Registers modified: r0
Groups: arm
$ cstool -d thumb "x88x10" 1000
1000 88 10 asrs r0, r1, #2
op_count: 3
operands[0].type: REG = r0
operands[0].access: WRITE
operands[1].type: REG = r1
operands[1].access: READ
operands[2].type: IMM = 0x2
Update-flags: True
Registers read: r1
Registers modified: r0
Groups: thumb thumb1only
Switching between modes is another matter that required effort. It is possible
to switch modes not only using explicit instructions such as BX (Branch
and eXchange) but also through standard instructions, for instance, arithmetic
and bitwise. In the latter case, the only thing needed is the PC register to
be used as the destination operand. This didn’t pose a difficulty in itself
but took considerable amount of time when testing given the many cases to
consider.
Amongst the things that make the implementation of ARMv7 complex, we can
emphasize the many variations a single instruction can have (in terms of the
number and type of operands as well as the condition code).
The current state of the ARMv7 implementation is quite advanced. Nonetheless,
there is still some more work to do. We have implemented the most frequent
instructions and it is possible to emulate full binaries (as we’ll comment in
the next section). Adding support for new instructions is relatively easy now
as the heavy part is already done, and the testing infrastructure is in place.
We’ll be adding more instructions in future releases. We have not considered yet
support for features such as SIMD, floating-point extensions or big-endian
memory access (we’ll consider them as need arises, though).
Step 4: Testing the semantics
Implementing an instruction set can be tricky and requires a lot of attention.
Reference manuals are not always as clear as one would like. Therefore,
testing is crucial.
Testing involves processing instructions and comparing their outputs (that is,
the values of registers and memory) to a well known implementation of the
architecture under test. Triton relies on Unicorn for emulation (which is
based on QEMU, a widely known, used and tested emulator).
The development process was the following. We started by implementing scripts
to emulate an instruction using Unicorn and using Triton. Then, each time we
implemented a new instruction we emulated it using both scripts and compared
the results. In case there was a difference we investigated it and made the
necessary fixes.
Once the development was completed (that is, we implemented all the
instructions we originally planned) we included the aforementioned tests to
Triton’s CI infrastructure. We currently test many variations of the same
instruction, with different conditional codes and operands. We test
instructions encoded for ARM and Thumb as well. Additionally, we test
instructions that switch execution modes. As the number of instructions tested
is large, tests are separated by instruction category (data, branch,
load/store), encoding (ARM/Thumb) and mode switching (from ARM to Thumb and
the other way around).
As an extra step, we also test the implementation emulating entire binaries.
In this case, we have chosen a binary sample that computes the sha256 of a
string (which proved to be really useful to find some missing details in
previous tests). This sample was compiled for ARM and Thumb modes with
different optimization flags (-O0, -O1, -O2, -O3, -Os,
and -Oz), providing an extensive range of instructions and variations.
As part of its CI infrastructure, Triton collects coverage information from
its tests (you can take a better look at our Codecov page). This information helped us
guide our testing efforts during the development process. As already
mentioned, ARMv7 instructions have many variations, and it was not always
obvious which one were missing from the tests.
Files organisation
Regarding the ARMv7 architecture and the files organisation, every step is handled by the following files:
- Step 1: src/libtriton/includes/triton/arm32.spec and src/libtriton/arch/arm/arm32/arm32Specifications.cpp
- Step 2: src/libtriton/arch/arm/arm32/arm32Cpu.cpp
- Step 3: src/libtriton/arch/arm/arm32/arm32Semantics.cpp
- Step 4: src/testers/arm32/
Conclusion
Triton proved to be prepared for the addition of another architecture. ARMv7
posed some challenges, as described throughout this post. However, Triton
handled them nicely (very few changes were needed in its core). The current
implementation is quite advanced and we are going to add support for missing
instructions in future releases.
This blog post, besides describing the experience of providing support for
ARMv7, is meant to be used as a guideline for adding new architectures. As
seen in the first two steps, adding basic support for disassembly is simple
and straightforward. The heavy work resides in Step three. However, the task
can be tackled progressively, allowing you to implement only those
instructions you need for your analysis (and to have an immediate feedback of
your implementation as well). If you want to bring the benefits of Triton to
another architecture (or if you simply want to deepen your knowledge), now you
known how to proceed!
Acknowledgments
- Thanks to all our Quarkslab colleagues who proofread this article.
- Thanks to Romain for providing testing samples.
References
[1] | Check section “Changing between Thumb state and ARM state” of the reference manual (ARM Architecture Reference Manual, ARMv7-A and ARMv7-R edition). |
[2] | Currently, the IT instruction is not supported natively. However, it can be easily handled as shown in this example. |
If you like the site, please consider joining the telegram channel or supporting us on Patreon using the button below.