💪 ARM Cortex-A72
This document is not yet written.
Overview
The RaspberryPi 4's primary MCU is the Broadcom BCM2711. The BCM2711 has a ARM Cortex-A72 core and is based on the ARMv8a architecture. The Cortex-A72 is a 64 bit architecture. The 64bit ARM instruction set is referred to in different ways including: aarch64
, a64
, arm64
.
Some obligatory properties:
- The reference manual is 5158 pages. (Yes, five thousand one hundred fifty eight.)
- Bi-endian Data (instructions are always little endian).
- 2 instruction states: Aarch64 and Aarch32
- All instructions are 32 bits.
- Supports 8, 16, 32, 64, 128 data types.
- New Exception Model (from ARMv7) and Privilege Levels
Registers
In 32 bit ARM, "general" purpose registers were referred to with an r
and there were 16 of them. There are now 31 general purpose registers in ARMv8a. There are also a large number of different ways to access the registers with differently sized register aliases.
The standard general purpose 64 bit registers now start with an x
(x0-x30). The 32 bit aliases for these registers start with a w
(w0-w30). In ARMv8a, the stack pointer (SP
) and program counter (PC
) are given their own specific registers (they no longer have a GPR aliases). The CSPR register is now gone. To query the system state you should use NZCV
for condition flags.
Register Conventions
The ARMv8a reference manual does have several stated conventions for how some these registers are used, which should be preserved during a branch, and when are corrupted/clobbered during a branch. In summary:
- X0-X29 - 64 bit general purpose.
- X0-X7 - Parameters/Result
- X8 (XR) - Indirect result register (i.e. used to pass memory pointer allocated by caller.)
- X16/X17 (IP0/IP1) - Used by linkers for branch extensions.
- X18 (PR) - Platform Register
- X29 (FP) - Frame Pointer
- X30 (LR) - Link Register
- PC - 64 bit program counter.
- SP - 64 bit stack pointer.
- NZCV - Condition Flags
From the reference manual:
Other Registers
-
Floating Point support is required in ARMv8a
- V0-V31 - 128 bit FP/vector registers.
- Q0-Q31/D0-D31/S0-S31/H0-H31/B0-B31 - Variant width access of V0-V31 register data, respectively 128, 64, 32, 16, and 8 bits.
- FPCR - Floating Point Control Register
- FPSR - Floating Point Status Register
-
TPIDR_EL0 - Register used for thread local storage point (TLS) in Linux Kernel. An unofficial convention is to use X28 for TLS in user mode since TPIDR_EL0 is not accessible in user mode.
-
Many other system registers. System registers end with
_ELx
wherex
is the privilege level 0-3. -
ZXR => always returns 0 (i.e. /dev/zero)
-
WZR => always ignores input (i.e. /dev/null)
Instructions
- Instructions are word aligned.
- A64 instructions are always little endian.
Encoding
The encoding for A64 instructions is very different than A32. For starters, the cond
bits in front of every instruction are now gone. Typically, conditions are now only used on branching operations. As before, you should be able to use the reference manual (relevant pages for ADD (shifted register)) to determine the encoding of various instructions. The following show the relevant data for ADD x0, x1, x2 ; x0 = x1 + x2
.
-
The encoding pattern:
Click To See Encoding Pattern
-
The assembler syntax:
Click To See Encoding Pattern
-
The instruction syntax symbols:
Click To See Encoding Pattern
-
The instruction behavior:
Click To See Encoding Pattern
Click To See Encoding Answer
| 31 | 30 | 29 | 28:24 | 23:22 | 21 | 20:16 | 15:10 | 9:5 | 4:0 |
| -- | -- | -- | --------- | ----- | -- | --------- | ----------- | --------- | --------- |
| sf | op | S | | shift | | Rm | imm16 | Rn | Rd |
| 1 | 0 | 0 | 0 1 0 1 1 | 0 0 | 0 | 0 0 0 1 0 | 0 0 0 0 0 0 | 0 0 0 0 1 | 0 0 0 0 0 |
10001011000000100000000000100000b = 8B020020h (or "0x20 0x00 0x02 0x8b" in little endian)
Ok, lets check ourselves with an assembler:
echo 'add x0, x1, x2' | aarch64-linux-gnu-as -o /tmp/a.out ; aarch64-linux-gnu-objdump -d
/tmp/a.out
The result is:
Disassembly of section .text:
0000000000000000 <.text>:
0: 8b020020 add x0, x1, x2
Branching
When branching to an address, its expressed as a relative offset within a 128 MiB offset from the current position. The actual value given to the branch instruction is 4 byte aligned (instruction aligned), so in reality its within a 26-bit instruction offset. This is valuable because sometimes you want to update an address in a running debugger or JTAG session from its current address.
The encoding for an unconditional branch in A64:
| 31 | 30:26 | 25:0 |
| -- | --------- | -------------- |
| op | 0 0 1 0 1 | imm26 (signed) |
If op=1, X30=PC+4 (i.e. CALL)
Signed 26 bit immediate = (Absolute_Destination_Addr - Absolute_Branch_Addr - 8) >> 2
Load and Store
Loads and stores operate the same way in A64 as they do in A32 except for one notable exception: There is no PUSH
or POP
in A64. Instead, use of load multiple (LDM), store multiple (STM), load pair (LDP), store pair (STP) are used in nearly every case that PUSH
or POP
was previously used.
Exception Model
Aarch32/ARMv5/ARMv6/ARMv7 used an older exception model where the privilege level was dependent on the exception being run, but effectively 2 privilege levels (i.e. privileged and not-privileged). The new exception model includes 4 different privilege levels. As before, there is not-privileged (EL0), but now the privileged is broken into 3 new levels (EL1, EL2, EL3). The higher the number the more privilege the level has. But what are these intended for? Virtualization.
In the above diagram, you'll notice that:
- EL0 - Intended for user mode (unprivileged).
- EL1 - Intended for end user OSes.
- EL2 - (Optional) Intended for hypervisors.
- EL3 - (Optional) Intended for secure and trusted application monitoring.
Exception Vector Table (EVT)
In previous architectures, the exception/interrupt vector table was an array of 4 byte immediate branches that each represented an execution mode, including the reset vector. In the ARMv8a model, there is:
- An exception vector table for each privilege level.
- Only 4 exception types (SError, IRQ, FIQ, Synchronous)
- Each exception type entry is 128 bytes, allowing up to 32 instructions. (This is in contrast to the 4 bytes for 32 bit ARM).
- The reset vector is no longer in the EVT. The reset vector is defined by hardware signals or implementation defined by the device vendor. You can read the reset vector from
RVBAR_EL3
.
Each exception vector table includes 4 different ways to be called depending on intended privilege level transition required. Within each of these are the 4 different exception types: SError, FIQ, IRQ, and Synchronous:
-
SError - Something unexpected happened.
-
IRQ - Normal interrupt handler.
-
FIQ - High priority interrupt handler.
-
Synchronous - Intended interrupt (e.g. system call).
EVTs are initialized during boot up by the reset vector. Their locations are stored in VBAR_ELx
where x is the privilege level. Each EVT must be 2KiB aligned.
In a very simple example:
-
boot code that initializes the system so that there is just a kernel (EL1) and user space (EL0).
-
When user code makes a system call, the synchronous exception is called (
VBAR_EL1
+0x400
):- Saves condition state into SPSR
- Sets the link register ELR_EL1 to the instruction following the system call.
PSTATE.{D, A, I, F}
is set to 1- Synchronous exceptions set ESR_EL1 that indicates the reason for the exception.
- The code in the EVT is called.
-
When the kernel is complete, it returns execution to using ELR_EL1.
Click to see example Exception Vector Table code
// Typical exception vector table code.
.balign 0x800
Vector_table_el3:
curr_el_sp0_sync: // The exception handler for a synchronous
// exception from the current EL using SP0.
.balign 0x80
curr_el_sp0_irq: // The exception handler for an IRQ exception
// from the current EL using SP0.
.balign 0x80
curr_el_sp0_fiq: // The exception handler for an FIQ exception
// from the current EL using SP0.
.balign 0x80
curr_el_sp0_serror: // The exception handler for a System Error
// exception from the current EL using SP0.
.balign 0x80
curr_el_spx_sync: // The exception handler for a synchrous
// exception from the current EL using the
// current SP.
.balign 0x80
curr_el_spx_irq: // The exception handler for an IRQ exception from
// the current EL using the current SP.
.balign 0x80
curr_el_spx_fiq: // The exception handler for an FIQ from
// the current EL using the current SP.
.balign 0x80
curr_el_spx_serror: // The exception handler for a System Error
// exception from the current EL using the
// current SP.
.balign 0x80
lower_el_aarch64_sync: // The exception handler for a synchronous
// exception from a lower EL (AArch64).
.balign 0x80
lower_el_aarch64_irq: // The exception handler for an IRQ from a lower EL
// (AArch64).
.balign 0x80
lower_el_aarch64_fiq: // The exception handler for an FIQ from a lower EL
// (AArch64).
.balign 0x80
lower_el_aarch64_serror: // The exception handler for a System Error
// exception from a lower EL(AArch64).
.balign 0x80
lower_el_aarch32_sync: // The exception handler for a synchronous
// exception from a lower EL(AArch32).
.balign 0x80
lower_el_aarch32_irq: // The exception handler for an IRQ exception
// from a lower EL (AArch32).
.balign 0x80
lower_el_aarch32_fiq: // The exception handler for an FIQ exception from
// a lower EL (AArch32).
.balign 0x80
lower_el_aarch32_serror: // The exception handler for a System Error
// exception from a lower EL(AArch32).