ARM Overview: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
(added more detailed description of instructin set including major differences with CISC/X86/X64 instruction sets to help readers understand ARM better and system development under it if they like it)
m (forgot about instruction cycle times being less varied versus CISC and added note about immediate values transparent with C/C++)
Line 91: Line 91:


==Instructions==
==Instructions==
The instruction set for the ARM line of processors is reduced because the processor is considered a RISC (reduced instruction set computer). The opposite if CISC (complex instruction set computer). The complete history and differences in deep discussion can be found in many books and websites. Essentially, the ARM instruction set does less than in comparison the X86/X64 instruction set. Also, all instructions are 32-bits in length. No shorter and no longer. This is also part of the RISC versus the CISC. X86/X64 instructions will vary in length in comparison. Normally, you will see larger machine code produced in byte size versus X86/X64. Also, there are two major industry standard modes known as A32 and Thumb32. Using a special branch instruction the instruction set can be switched on the fly during execution. The Thumb32 is a special compressed instruction set. It is more limited than the A32 but is more compact in size therefore reducing code memory footprint.
The instruction set for the ARM line of processors is reduced because the processor is considered a RISC (reduced instruction set computer). The opposite if CISC (complex instruction set computer). The complete history and differences in deep discussion can be found in many books and websites. Essentially, the ARM instruction set does less than in comparison the X86/X64 instruction set. Also, all instructions are 32-bits in length. No shorter and no longer. This is also part of the RISC versus the CISC. X86/X64 instructions will vary in length in comparison. Normally, you will see larger machine code produced in byte size versus X86/X64. Also, there are two major industry standard modes known as A32 and Thumb32. Using a special branch instruction the instruction set can be switched on the fly during execution. The Thumb32 is a special compressed instruction set. It is more limited than the A32 but is more compact in size therefore reducing code memory footprint. Also, the execution time for ARM instructions is more constant. The instruction can vary in cycles but mostly are kept fairly constant making programming of real-time software easier because one can determine more precisely the exact amount of time execution of a series of instructions will take.

It does not supported loading of large immediate values into registers. A immediate value being a value that is literally encoded into the instruction. For example the X86/X64 compatible processors support loading a 32-bit immediate (also called a constant) into a arbitrary register. The ARM's A32 and Thumb32 instruction sets do not. Instead one must use a series instructions to gain the same effect or store the value in memory outside of the instruction stream. In essence one may consider the storage of the value outside of the instruction stream the same as storage inside but the does exist a difference in many ways and this is something to keep in mind about the ARM. But, any compiler or assembler can and may take care of this problem for you. If you are compiling C/C++ code then these problems will be transparent to you for example, but as I said this is good information to know and understand especially with system software development.



It does not supported loading of large immediate values into registers. A immediate value being a value that is literally encoded into the instruction. For example the X86/X64 compatible processors support loading a 32-bit immediate (also called a constant) into a arbitrary register. The ARM's A32 and Thumb32 instruction sets do not. Instead one must use a series instructions to gain the same effect or store the value in memory outside of the instruction stream. In essence one may consider the storage of the value outside of the instruction stream the same as storage inside but the does exist a difference in many ways and this is something to keep in mind about the ARM.
<pre>
<pre>
# Code example of loading the value 0x200000 into the register R1 for
# Code example of loading the value 0x200000 into the register R1 for

Revision as of 21:02, 19 March 2012

This page is a stub.
You can help the wiki by accurately adding more contents to it.

ARM is a 32-bit, RISC computer architecture that is often found in smaller, portable electronics. (eg. video game systems, calculators, etc.)

Overview

ARM processors operate in various modes. User, Fast Interrupt, Interrupt, Supervisor, Abort, and System. Each mode has its own stack and environment that it lives in.

While the ARM manuals distinguish between these modes, the demarcation is made not in their associated privilege levels, but their purposes. There are essentially two privilege levels in ARM: the manuals refer to them as either 'Privileged modes' or 'Non-privileged modes'.

Of the modes listed above, 'User' is the only non-privileged mode, while the others are collectively the privileged modes for the architecture.

Why are there so many modes? And what's the difference between the 'Interrupt' mode and the 'Fast Interrupt' mode? Will I have to write a special sort of abstraction which splits my kernel's interrupt handling away from the rest of its operation? No.

These modes are simply the state the processor is left in when a particular type of exception occurs, or a particular type of IRQ.

Exceptions, IRQs and Software Interrupts on ARMv4 and up

The ARMv4+ architectures define seven vectors for the architecture, contrasted with the x86's grand total of 256 vectors. Also, note well that the kernel does not have the choice of which interrupt vector to store the syscall entry to: ARM statically provides a 'SWI' instruction for software interrupts, and no 'INT N' instruction, so syscalls are statically fixed to a particular vector.

Now we get to go into the idea of the various privileged modes of the processor: When the SWI instruction is executed, the ARM processor vectors into the kernel provided vector table, and jumps to the hardware fixed vector which is set aside for the SWI instruction. The kernel is expected to install the syscall strapping code on this vector, since there is no other logical choice.

However, instead of, like x86, reading which privilege level or privilege settings to apply for a particular vector (I'm referring to the x86's GDT selector which is present in each IDT entry), the idea of modes are employed, such that for an SWI instruction, the ARM CPU will automatically enter the 'Supervisor' mode.

Please remember from now that the 'Supervisor' mode is the standard mode which the kernel is expected to operate from. 'System' mode is not switched to on any public vector; It is like, based on the dodgy way the manual refers to it, and the fact that none of the defined interrupts actually switch to 'System' mode, a #SMI (System Management Interrupt, which switched to System Management Mode) on the PC.

From 'System' mode, the kernel would process the SWI, then return to userspace.

Abort mode switched to by the processor on encountering the equivalent of a Page fault on x86. The processor switches to privileged mode, but the specific mode 'Abort', one of those listed above. Most ARM kernels just take note of the fact that the processor was in Abort mode on kernel enter, then switch to 'Supervisor' mode and service the exception.

Interrupt Mode and Fast Interrupt Mode are almost the same, except that FIQ mode is given its own set of registers (the registers are actually switched from those you refer to, although they still have the same names (r8, for example is still, in your assembly, referred to as r8; however, in FIQ mode, you must be aware that if you had information in r8 in usermode, you should not expect to find that same data under the register named r8 in FIQ mode), so that it can execute IRQ handlers without having to save context too much.

Essentially, like x86, you would have a piece of hardware which could be configured by the kernel. The kernel, knowing that it has only 7 vectors, and two of them are designated for IRQs only, would give out the IRQ number of one of the 2 IRQ vectors (IRQ or FIQ vectors). For a device configured to signal an IRQ on the IRQ vector, the processor, on receiving an interrupt signal on the IRQ vector will enter privileged mode, in the IRQ mode state.

For an interrupt received on the FIQ vector, the processor would enter privileged mode, but in the FIQ mode state.

The 'Undefined' Mode is switched to on the encounter by the CPU of an undefined exception. However, based on the tone used in the ARMv4 manual, and the fact that they blatantly imply this, the exception mode was really meant for the kernel to emulate instructions for the usermode process, and then return.

Registers

ARM processors have a somewhat large number of registers. The ARM7, for example, has 37 registers, 31 of those being 32-bit general registers, and 6 of those being status registers. Some are only usable by certain modes.

Unlike the x86, important operating registers are clearly visible through general use registers. For example, r15 is 'pc', or the 'program counter', and r13 is the 'stack pointer', or 'sp'.

Along with the general purpose registers, there is also the CPSR register, or, the 'Current Program Status Register'. This registers keeps track of the current operating mode, whether interrupts are enabled or not, etc. The operating system can read and write to this register using the MSR\MRS instructions. (See Here)

Mode R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 (SP) R14 (LR) R15 (PC)
User32/System R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15
FIQ32 R0 R1 R2 R3 R4 R5 R6 R7 R8FIQ R9FIQ R10FIQ R11FIQ R12FIQ R13FIQ R14FIQ R15FIQ
Supervisor32 R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13SVC R14SVC R15SVC
Abort32 R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13ABT R14ABT R15ABT
IRQ32 R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13IRQ R14IRQ R15IRQ
Undefined32 R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13UNDEF R14UNDEF R15UNDEF

Each mode shares some registers with other modes. Normally registers are specified in ARM instructions they use 4 bits. Which can represent 16 registers. As you can see there are exactly 16 registers which you can reference using instructions in each mode. The memonic names are specified across the top header as R0 through R15. An alias to R13, R14, and R15 is specified in parathesis which is SP, LR, and PC. So using the memonic R15 is the same as using the memonic PC, but keep in mind that this is only relavent to the assembler/compiler. The ARM processor only understands the value given using 4 bits in most instructions. The exactly value represented using these 4 bits for each memonic can be gained by simply removing the prefixed letter R. So that R5 is represented by 4 bits as 0101, R6 as 0110, and R7 as 0111. The denotation which postfix IRQ, UNDEF, ABT, SVC, and FIQ is simply for display relevant only to this table and are not considered valid with any assembler/compiler for example R13ABT, R13IRQ, R13UNDEF, R13SVC, and R13FIQ all simply show that internally the processor maps R13 (SP) to a different address in it's register file. And, all can be represented by the value 0xD in hexadecimal or 1101 in binary.


Instructions

The instruction set for the ARM line of processors is reduced because the processor is considered a RISC (reduced instruction set computer). The opposite if CISC (complex instruction set computer). The complete history and differences in deep discussion can be found in many books and websites. Essentially, the ARM instruction set does less than in comparison the X86/X64 instruction set. Also, all instructions are 32-bits in length. No shorter and no longer. This is also part of the RISC versus the CISC. X86/X64 instructions will vary in length in comparison. Normally, you will see larger machine code produced in byte size versus X86/X64. Also, there are two major industry standard modes known as A32 and Thumb32. Using a special branch instruction the instruction set can be switched on the fly during execution. The Thumb32 is a special compressed instruction set. It is more limited than the A32 but is more compact in size therefore reducing code memory footprint. Also, the execution time for ARM instructions is more constant. The instruction can vary in cycles but mostly are kept fairly constant making programming of real-time software easier because one can determine more precisely the exact amount of time execution of a series of instructions will take.

It does not supported loading of large immediate values into registers. A immediate value being a value that is literally encoded into the instruction. For example the X86/X64 compatible processors support loading a 32-bit immediate (also called a constant) into a arbitrary register. The ARM's A32 and Thumb32 instruction sets do not. Instead one must use a series instructions to gain the same effect or store the value in memory outside of the instruction stream. In essence one may consider the storage of the value outside of the instruction stream the same as storage inside but the does exist a difference in many ways and this is something to keep in mind about the ARM. But, any compiler or assembler can and may take care of this problem for you. If you are compiling C/C++ code then these problems will be transparent to you for example, but as I said this is good information to know and understand especially with system software development.


# Code example of loading the value 0x200000 into the register R1 for
# setting the UPBASE register for the PrimeCell display on QEMU.
	eor	r1, r1, r1              # clear R1
	add	r1, r1, #0x20           # load immediant value 0x20
	lsl	r1, r1, #16             # shift left 16 bits to create 0x200000
	str	r1, [r0], #10           # write into memory specified by r0 offset by 10 bytes the value in R1


Almost all instruction support conditional execution which is directly encoded into the instruction. On a X86/X64 architecture when a series of instruction need to be conditionally executed you will likely find a branching instruction which creates a completely separate code path for the processor to follow is the branch happens. The ARM supports branching, but also almost all individual instructions support conditional execution so that instead of causing the processor to potentially flush and refill the pipeline because a branch occurred instead it will load instructions into the pipeline which have a special conditional field. There are a maximum of 16 specified conditional codes with one reserved condition making 15 usable conditions. So instead of making a branch only to execute a few instructions those few instructions can remain in the execution path and will not be executed or will be executed simply using the conditional codes.

Here are the conditional codes for example:

Memonic Binary Description
EQ 0000 Z flag set (equal)
NE 0001 Z flag clear (not equal)
HS 0010 C flag set (unsigned higher or same)
LO 0011 C flag clear (unsigned lower)
MI 0100 N flag set (negative)
PL 0101 N flag clear (non-negative)
VS 0110 V flag set (overflow)
VC 0111 V flag clear (no overflow)
HI 1000 C flag set and Z flag clear
LS 1001 C flag clear or Z flag set
GE 1010 N flag set and V set or N clear and V clear
LT 1011 N set and V clear or N clear and V set
GT 1100 Z clear with (either N or V set), or N clear and V set
LE 1101 Z set or (N set and V clear), or N clear and V set
AL 1110 always (no condition really)
NV 1111 reserved condition

Memory

Many ARM processors come equipped with MMUs, and have full memory protection schemes for their 4GB address space, including a TLB.

The paging scheme used by ARM processors is similar to that of the x86. Two paging structures are used, with the second being optional. Moreover, page sizes can vary anywhere from 1kb to 1mb.

ARM defines however, a set of 'page' size, and a set of 'section' sizes; There essentially two Page sizes: 4KB pages and 64KB pages.

These may be, on ARMv4 and ARMv5, divided into what the manual calls 'subpages', of, for a 4KB page, 1KB subpages, and for a 64KB page, 16KB subpages. (I assume you realize the fact that the subpages in both cases are 1/4 the size of the main page size).

/* Insert information on 1MB and 16MB Sections here */

The ARM manuals state clearly, multiple times, that from ARMv6 and up, subpages are not supported. That is, 1KB and 16KB page sizes are not supported on ARMv6 and above. So for maximum portability, you may decide to stick to 4KB and 64KB page sizes.

Or you can have two sets of abstractions for your ARM Memory Manager port: One MM for ARMv4 and v5, and one for ARMv6 and up. It's all up to you.

Exceptions

Note: For some reason, the ARM people use the terms 'interrupt' and 'exception' as if they were the same.

For exceptions, ARM uses a table similar to the IVT of the real mode x86. The table consists of a number of 32-bit entries. Each entry is an instruction (ARM instructions are 4bytes in length.) that jumps to the appropriate handler.

Take note of that, and understand the design impact it imposes: On x86, the hardware vector table holds the addresses of handler routines. On ARM, the hardware vector table holds actual instructions. These instructions must fit into 4 bytes. This is actually not a big deal since all ARM instructions (assuming ARM mode and not Thumb, or Jazelle) are actually 4Bytes anyway. The general idea is to emulate the behaviour of something like the x86 and simply place a jump instruction into the actual vector table, so that upon indexing into the table, the ARM processor is made to act as if it jumped to an address contained in the jump instruction. From there, consistency is obtained, and portability is eased.

Also used are various devices to 'vector' interrupts. Two such are the Generic Interrupt Controller and the Vectored Interrupt Controller.

Emulators

Links