ARM Overview: Difference between revisions

m
→‎Memory Detection: fixed a typo (double word)
[unchecked revision][unchecked revision]
m (added link to page about targeting multiple ARM based devices (if anyone ever reads this page LOL))
m (→‎Memory Detection: fixed a typo (double word))
 
(48 intermediate revisions by 17 users not shown)
Line 1:
ARM is a family of instruction set architectures based on RISC architecture developed by a single company - ARM Holdings.
{{Stub}}
 
Because ARM is a 32-bit,family RISCof computerarchitectures and not a single architecture, thatit iscan oftenbe found in smaller,large portablescale electronics.of electronic devices (eg.from videosimple gamesmall embedded systems with ARM MCUs, calculatorsthrough smartphones, etc.tablets and MP3 players to low-power servers).
When you look at a device with ARM processing unit (term processing unit is more accurate because ARM can be found in microcontrollers as well as in microprocessors), there are two things that matter: architecture and core.
 
So far (Q3 2014) there have been 8 ARM architectures released or announced (where some of them have their extended versions), 7 of them being 32-bit, and the last one being 64-bit, but user-space compatibile with 32-bit instruction set (therefore making it possible to run 32-bit user processes, yet not 32-bit operating systems without virtualization). Very broadly said, with every new architecture version there have been added some new features to all cores (with exceptions) which had been already tried in some cores of a previous architecture. Not all features of previous architecture must be again available in the next one, and not all new versions of technologies added in previous architectures must be compatibile with the old ones. The simplest reason for this one could think of is that designing processors isn't like writing software: when a program decides whether to process old file format or a new one, it makes its decision and then exectues only one code, but backward compatibility of (for example instruction) formats may mean more transistors and more transistors usually (always?) result in more heat.
 
You must know you can't buy an ARM processor just like you would buy Intel abc or AMD xyz. ARM Holdings is a company that designs architecture, writes an '''Architecture Reference Manual''', then designs a core and writes a '''Technical Reference Manual'''. Finally, it sells designed core to a chip-making-company and releases manuals to public. From there, a chip-making-company designs a processing unit - a microcontroller (MCU), a System On a Chip (SoC), an FPGA, whatever - and it may manufacture the silicon itself or order n-thousands of chips from a semiconductor manufacturer.
All of processors you'll see in tablets or smartphones are SoCs which act as sort of motherboard with a processor. They contain logic for driving peripherals (ethernet, USB, SD/MMC cards, SPI, I2C, audio), they may contain GPU for graphics coprocessing or FPGA for custom logic.
 
Many uses means either few multifunctional (and theferefore complex and power-consuming) devices, or many simple devices suited just for that kind of operation. ARM processors as RISC devices chose simplicity over complexity, and therefore they are way too many cores with different instructions used. To have just ''one assembler to rule them all'', ARM defined Unified Assembly Language which can be translated for '''any''' of ARM cores.
 
ARM cores are divided in lastest versions to three main lines:
* Cortex-M cores, used for really small devices, usually with on-chip memory and simpler operations
* Cortex-R cores, used for real-time devices
* Cortex-A cores, used for applications in multifunctional devices like smartphones, TVs or maybe computers.
Apple machines use custom ARM cores, and so do some Nvidia boards.
 
==Overview==
Line 55 ⟶ 69:
 
The 'Undefined' Mode is switched to on the encounter by the CPU of an undefined exception. However, based on the tone used in the ARMv4 manual, and the fact that they blatantly imply this, the exception mode was really meant for the kernel to emulate instructions for the usermode process, and then return.
 
=== Note regarding the 'Spectre' exploit ===
 
ARM have recently added a new CPU instruction to their specification that mitigates cache speculation side-channel exploits. You can read more about this design patch and recommended action to take regarding Spectre (https://developer.arm.com/-/media/Files/pdf/Cache_Speculation_Side-channels.pdf?revision=8b5a5f33-c686-4b00-8186-187dd2910355 here).
 
{| class="wikitable"
! Page
! Description
|-
| http://www.embedded.com/design/prototyping-and-development/4006695/How-to-use-ARM-s-data-abort-exception || Good article on the data and instruction ABORT exceptions
|}
 
==Registers==
Line 61 ⟶ 86:
Unlike the x86, important operating registers are clearly visible through general use registers. For example, r15 is 'pc', or the 'program counter', and r13 is the 'stack pointer', or 'sp'.
 
Along with the general purpose registers, there is also the CPSR register, or, the 'Current Program Status Register'. This registers keeps track of the current operating mode, whether interrupts are enabled or not, etc. The operating system can read and write to this register using the MSR\MRS instructions. ([http://www.arm.com/support/faqdev/1472.html See Here]) There are several other system registers which can be used to [[Detecting_Raspberry_Pi_Board|detect the ARM board]] for example.
 
{| class="wikitable"
Line 102 ⟶ 127:
|}
 
Each mode shares some registers with other modes. Normally registers are specified in ARM instructions they use 4 bits. Which can represent 16 registers. As you can see there are exactly 16 registers which you can reference using instructions in each mode. The memonicmnemonic names are specified
across the top header as R0 through R15. An alias to R13, R14, and R15 is
specified in parathesisparenthesis which is SP, LR, and PC. So using the memonicmnemonic R15
is the same as using the memonicmnemonic PC, but keep in mind that this is only relaventrelevant to the assembler/compiler. The ARM processor only understands the value given using 4 bits in most instructions. The exactly value represented
using these 4 bits for each memonicmnemonic can be gained by simply removing the prefixed letter R. So that R5 is represented by 4 bits as 0101, R6 as 0110, and R7 as 0111. The denotationdenotations whichthat postfix ''IRQ'', ''UNDEF'', ''ABT'', ''SVC'', and ''FIQ'' isare simply for display, and are relevant only to this table, and are notaren't considered valid withby any assembler/compiler. forFor example ''R13ABT'', ''R13IRQ'', ''R13UNDEF'', ''R13SVC'', and ''R13FIQ'' all simply show that
internally the processor maps R13 (SP) to a different address in it's register file. And, all can be represented by the value 0xD in hexadecimal or 1101 in binary.
 
=== Calling Convention Cheat Sheets ===
Line 125 ⟶ 150:
|}
 
<small id="Note1">Note 1: Stack is 8 byte aligned at all times outside of prologue/epilogue of non-leaf function. Leaf functions can have 4 byte aligned stacks. 8 byte alignment is required for ldrd/strd to function on the stack, which mostly only comes into play when using varargs with 64-bit integers. Care must be taken in interrupts and exceptions to align the stack to 8 byte before calling C code.</small>
 
For details look at the [http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf EABI specs].
Line 147 ⟶ 172:
|}
 
Loading of large immediate values into registers can be interestingly different from X86x86/X64x64. A immediate value being a value that is literally encoded into the instruction. For example the X86x86/X64x64 compatible processors support loading a 32-bit immediate (also called a constant) into aan arbitrary register. The ARM's A32 and Thumb32 instruction sets do not. Instead one must use a series instructions to gain the same effect or store the value in memory outside of the instruction stream. In essence one may consider the storage of the value outside of the instruction stream the same as storage inside but the does exist a difference in many ways and this is something to keep in mind about the ARM. But, any compiler or assembler can and may take care of this problem for you. If you are compiling C/C++ code then these problems will be transparent to you for example, but as I said this is good information to know and understand especially with system software development.
 
Example, of machine code produced by GCC to load a register with a 32-bit value. As you can note the immediate value is technically outside of the instruction stream. On the X86x86/X64x64 the complete
value would have been encoded into the instruction. Also, take note how each instruction is constant in length. The instruction doing the work is LDR using the PC register and a immediate offset in order to place the value 0x12345678 into the register R3.
<pre>
Line 173 ⟶ 198:
 
 
Almost all instructioninstructions support conditional execution which is directly encoded into the instruction. On a X86x86/X64x64 architecture when a series of instructioninstructions need to be conditionally executed you will likely find a branching instruction which creates a completely separate code path for the processor to follow is the branch happens. The ARM supports branching, but also almost all individual instructions support conditional execution so that instead of causing the processor to potentially flush and refill the pipeline because a branch occurred instead it will load instructions into the pipeline which have a special conditional field. There are a maximum of 16 specified conditional codes with one reserved condition making 15 usable conditions. So instead of making a branch only to execute a few instructions those few instructions can remain in the execution path and will not be executed or will be executed simply using the conditional codes.
 
Here are the conditional codes for example:
{| class="wikitable"
|-
! Mnemonic
! Memonic
! Binary
! Description
Line 230 ⟶ 255:
 
===== Memory Detection =====
Memory detection is much different if you are coming from a background in the X86x86/X64x64 architecture. The ARM cores are used in many embedded applications and therefore the system board which the core resides on does not need to be overly complicated in order to be compatible with others boards. This is because almost any production board with an ARM core on it was likely custom designed just for that purpose. It is quite possible to use some generic board, but for lots of embedded applications there may not even be an operating system.
 
Therefore memory detection mechanisms may be non-existent and instead your operating system may opt for a value to be encoded into it at compile for for a specific system (board) which is known to have a certain amount of memory, or you may have a specific driver that your operating system loads just for that board (or is compiled in) where the driver can be queried by your kernel for the amount of memory installed. There are lots of ways and these were just two ideas, but the important part is to understand that unlike the X86x86/X64x64 compatible systems you will likely find absolutely no mechanisms to ''properly'' poll the amount of memory.
 
It may however be possible to probe memory and recover using processor exceptions. This still may not provide information about if a region of memory is FLASH, memory-mapped I/O, RAM, or ROM depending on how the system board was designed as I do suspect it could be quite possible for some ROM to be external to the core and allow writes to silently fail, and this coupled with the possibility of a region of memory to need a special unlock sequence in order to write to it will render your memory auto-detection code into a potential corner-case.
Line 241 ⟶ 266:
''Note: For some reason, the ARM people use the terms 'interrupt' and 'exception' as if they were the same.''
 
For exceptions, ARM uses a table similar to the IVT of the real mode x86. The table consists of a number of 32-bit entries. Each entry is an instruction (ARM instructions are 4bytes4 bytes in length.) that jumps to the appropriate handler.
 
Take note of that, and understand the design impact it imposes: On x86, the hardware vector table holds the addresses of handler routines. On ARM, the hardware vector table holds actual instructions. These instructions must fit into 4 bytes. This is actually not a big deal since all ARM instructions (assuming ARM mode and not Thumb, or Jazelle) are actually 4Bytes4 bytes anyway. The general idea is to emulate the behaviour of something like the x86 and simply place a jump instruction into the actual vector table, so that upon indexing into the table, the ARM processor is made to act as if it jumped to an address contained in the jump instruction. From there, consistency is obtained, and portability is eased.
 
Also used are various devices to ''vector'' interrupts. Two such are the Generic Interrupt Controller and the Vectored Interrupt Controller.
 
==Coding Gotchas==
===== Heap Pointers Needs To Be Aligned =====
I can not state at this time how many processors support unaligned memory access natively, but from what I know unaligned memory access is more expensive than aligned. And, a good many structures which have fields greater than one byte in size a lot of times are allocated on the heap. So from this you can see how big of a performance hit you could take. Not to mention the bug that would be introduced if running under a processor which does not even handle unaligned memory access gracefully, and by that I meant at the very least raising an exception of some sort so the code can crash and show the problem.
===== Missing Division Functions (__aeabi_uidivmod, __aeabi_idiv)/No Division Support =====
This is caused by using GCC and not linking with ''libgcc''. You ''NEED TO'' link with ''libgcc'' when using GCC. For information about why you should link and information about ''libgcc'', read [[Libgcc]] and [[GCC_Cross-Compiler]].
 
You must also make sure you use the ''libgcc'' that has been compiled for your target machine/architecture/platform. See [[Libgcc]] for more information. You can produce the correct library for GCC by reading [[GCC_Cross-Compiler]].
 
====== This section is just if you want a more in depth explanation and discussion. ======
This issue is actually complicated because different CPUs may not support hardware division, support it only through the floating point operations, in thumb mode, in native mode, or a combination. Also, in certain situations you may find it faster or more compact to use different methods because you may be interested in code size or performance. The ''libgcc'' will handle all of these situations and provide the needed symbols, however here are some various sources of information that can give you a more in-depth understanding:
<pre>
The following links talk about different methods in handling this problem:
http://forum.osdev.org/viewtopic.php?f=1&t=23857&p=212244
http://stackoverflow.com/questions/8348030/how-does-one-do-integer-signed-or-unsigned-division-on-arm
Also, some extra information that maybe useful:
http://www.linkedin.com/groups/ARM-cores-hardware-division-85447.S.242517259
A discussion about this section, and also at the end an example of libgcc's version of the divide function:
http://forum.osdev.org/viewtopic.php?f=8&t=27767
The source for libgcc and source of the division emulation function:
https://github.com/mirrors/gcc/blob/master/libgcc/udivmodsi4.c
https://github.com/mirrors/gcc/blob/master/libgcc/
</pre>
 
===== Unaligned Memory Access And Byte Order =====
{| border="1px"
Line 290 ⟶ 337:
 
{| class="wikitable"
! colspan="8" | Little Endian Word Size (32Bit32-bit)
|-
| ED || CB || A9 || 87 || 78 || 9A || BC || DE
Line 313 ⟶ 360:
 
{| class="wikitable"
! colspan="8" | Little Endian Half-Word Size (16Bit16-bit)
|-
| ED || CB || A9 || 87 || 78 || 9A || BC || DE
Line 337 ⟶ 384:
 
{| class="wikitable"
! colspan="8" | Byte Size (8Bit8-bit)
|-
| ED || CB || A9 || 87 || 78 || 9A || BC || DE
Line 362 ⟶ 409:
|}
 
''From reading the data sheet it appears that if operating in big endian mode the half-word access would be reversed to be more natural as you would expect on the X86x86/64x64 architecture. But, since I can not actually test it at the moment I am hoping I got it right.''
 
''The word access with offset of ''10b (0x2)'' may be defined, but I am not sure because it does not really state. However, it may employ some of the mechanisms for loading half-words. (Need someone to come through and correct this if it is wrong)''
 
The reason memory access has to be aligned is because unaligned access requires additional access cycles, due tueto the way memory modules work (see DDR datasheet below), and would increase the complexity of load / store operations (and the processor core).
 
You could simulate an unaligned memory access on the ARM7TDMI-S, but you would have to make separate loads from two memory locations. A compiler could probably emit code to do this automatically, but the checks whether an access is unaligned or not would slow down ''all'' memory accesses. Some code is provided in the ''ARM7TDMI-S Data Sheet'' on page ''4-35'' for such "checked" memory access, when you do not know if the address will be aligned or non-aligned.
Line 383 ⟶ 430:
<pre>
#define ARM4_XRQ_RESET 0x00
#define ARM4_XRQ_UNDEF 0x040x01
#define ARM4_XRQ_SWINT 0x080x02
#define ARM4_XRQ_ABRTP 0x0c0x03
#define ARM4_XRQ_ABRTD 0x100x04
#define ARM4_XRQ_RESV1 0x140x05
#define ARM4_XRQ_IRQ 0x180x06
#define ARM4_XRQ_FIQ 0x1c0x07
 
/*
Line 404 ⟶ 451:
</pre>
 
''It can be a wise idea to populate all entries of the table. If the table is populated with zeros each vector will hold the instruction "andeq r0, r0, r0" which does nothing. Meaning if the CPU jumps to an unpopulated vector it will effectively execute a NOP and move to the next vector which can cause a confusing bug in your code. At least point any unused vectors to a dummy function that will notify you an unhandled exception has occured!''
===== Specifying CPSR using AS (binutils/GCC) Syntax =====
 
===== Specifying CPSR using AS (binutilsBinutils/GCC) Syntax =====
You have to use ''cpsr'' not ''%%cpsr'' or any other form.
 
Line 420 ⟶ 469:
}
 
/* Bit 7 and 6 have to be logic-low/unset/zero/cleared to enable the interrupt type. */
void arm4_xrqenable_fiq()
{
arm4_cpsrset(arm4_cpsrget() |& ~(1 << 6));
}
 
void arm4_xrqenable_irq()
{
arm4_cpsrset(arm4_cpsrget() |& ~(1 << 7));
}
</pre>
 
===== GCC (Chars Not Signed) =====
In some cases GCC when targeting ARM architecture may not handle signed values as you may expected.
Line 462 ⟶ 513:
5c: e0823003 add r3, r2, r3
</pre>
The X86x86/X64x64 target architecture will cause GCC to treat ''char'' as ''signed char'' [expected behavior by most], but when targeting the
ARM you may be surprised to find that ''char'' is treated as ''unsigned char''. You must specify ''signed''
before ''char'', and then the compiler will generate code to correct perform the addition.
Line 485 ⟶ 536:
! Brief Description
|-
| [[ARM_Beagleboard|BeagleboardBeagleBoard]]
| Tutorial on bare-metal [OS] development on the Texas Instruments BeagleboardBeagleBoard. Written specifically for the BeagleboardBeagleBoard-xM Rev C.
|-
| [[ARM_Integrator-CP_Bare_Bones|Integrator Barebones]]
Line 495 ⟶ 546:
|-
| [[PL050_PS/2_Controller|PL050 PS/2 Controller]]
| Information about interfacing a PS2PS/2 device specifically the mouse, and starts with using the PL050. See links at bottom if PL050 is not of interest.
|-
| [[ARM_Integrator-CP_IRQTimerAndPIC|IRQ, Timer, And PIC]]
| This demonstrates using exceptions (specifically the IRQ exception), Timer, And PIC. It also provides a decent base for hacking with the ARM and/or QEMU.
|-
| [[ARM_Integrator-CP_ITPTMME_Main|ELK Pages (Thin ARM)]]
| The experimental learning kernel pages aimed to take someone gradually through the process of building a functional kernel using possibly (in later part of series) experimental designs and implementations that differ from the standard and conventional design in certain areas.
|-
| [[ARM_RaspberryPi]]
| Description and details on the commonly used Raspberry Pi boards
|-
| [[User:Pancakes/ARM_QEMU_REALVIEW-PB-A|QEMU realview-pb-a board]]
| This page gives you some information about programming for the realview-pb-a board under QEMU. Actual hardware ''could''be different. It also contains a link to the datasheet (which might be hard to find).
|}
 
==Highly Useful External Resources==
*[http://infocenter.arm.com/help/index.jsp ArmARM Infocenter]
*[http://www.arm.com/documentation/Software_Development_Tools/ More ARM Documentation]
*[http://www.coranac.com/tonc/text/asm.htm Whirlwind Tour of ARM Assembly]
*[http://re-eject.gbadev.org/files/GasARMRef.pdf GasGAS ARM Reference]
*[http://re-eject.gbadev.org/files/armref.pdf ARM Instruction Reference]
*[http://www.arm.com/miscPDFs/9658.pdf ArmARM Assembly Language Programming]
*[http://infocenter.arm.com/help/topic/com.arm.doc.dui0159b/DUI0159B_integratorcp_1_0_ug.pdf Integrator/CP Reference Manual]
*[http://www.amazon.com/ARM-System-Developers-Guide-ebookArchitecture/dp/B00146BS7I/ref=sr_1_1?ie=UTF8&m=AGFP5ZROMRZFO&s=digital-text&qid=1274118271&sr=1-11558608745 Amazon: ARM System Developer's Guide: Designing and Optimizing System Software]
*[http://www.embedded-bits.co.uk/2011/mmucode/ Turning on an ARM MMU and Living to tell the tale: The code]
 
[[Category:ARM]]
[[Category:Instruction Set Architecture]]
[[de:ARM]]
Anonymous user