ARMv7 Common Memory System Architecture

From OSDev.wiki
Revision as of 05:36, 17 January 2017 by osdev>Gravaera
Jump to navigation Jump to search

Introduction

This article seeks to provide a concise summary of the ARM version 7 Common Memory System Architecture, which is a very convoluted name for the ARM Caching and Branch Prediction architecture. ARM has 3 main "Memory Systems Architectures" which form the components of its memory access scheme:

  • CMSA: Common Memory System Architecture: Caches and Branch Prediction. CMSA is common to all profiles, with adjustments as needed for the particular implementation and intended use of the processor.
  • VMSA: Virtual Memory System Architecture: TLBs and Virtual Memory as a memory protection and isolation, and linearization scheme. VMSA is generally implemented by ARM profile-A processor implementations.
  • PMSA: Protected Memory System Architecture: a simplified memory protection scheme, also known in general CPU design parlance as an MPU (Memory Protection Unit), often held in contrast to an MMU (Memory Management Unit). An MPU is generally preferred to the memory and speed overheads of an MMU, in very resource-constrained, or performance critical microcontrollers. PMSA is generally implemented by ARM profile-R processor implementations.

Any particular ARM implementation may have one or more of these "Memory Systems" (CMSA, VMSA and PMSA).

This article is not current with the ARM version 8 CMSA.

Components of the CMSA

Caches

Cache policies

ARMv7 CMSA compliant processors have buffering and caching policies which, when combined in a non-LPAE-mode processor, give the following cache policies:

  • Non-cacheable
  • Write-through cacheable
  • Write-back Write-allocate cacheable (aka, read-around or read-behind caching)
  • Write-back no-write-allocate cacheable (aka write-around or write-behind caching)

An ARMv7 processor with the LPAE extensions enabled allows for more cache policies, as well as a cache-allocation hint.

Cache allocation hints

ARM processors' memory attributes include an allocation hint for the processor. This enables software to cue the processor into whether or not it should allocate cache lines for accesses to that memory area. The ARM specification does not require implementations to respect these hints. The hints are:

  • Allocate
  • Do-not allocate.

Cache topology

In ARM processors, the instruction and data/unified caches can be separately enabled and disabled. The Data/unified cache(s) are enabled as a group, and the instruction cache(s) are separately enabled as their own group. ARMv7 provides a way for software to reason about caches in a uniform manner across implementations. Prior to ARMv7, ARM only architecturally specified one level of cache and support for and management of all other levels of cache was <IMPLEMENTATION DEFINED>. ARMv7 introduces the concepts of Level of Unification and Level of Coherency in order for software to interact robustly with diverse caches.

Level of Unification

Level of Coherency

Enumerating processor-controlled caches

Notice that the title of this section heading is "Enumerating processor-controlled caches". This is because on any ARM platform, not all caches may be under the control of the processor. In an ARMv7 implementation where some cache is under the control of the processor however, this is how that cache is specified to be interacted with by ARM.

Cache behaviour

Cache state on #RESET

Branch predictors

The ARM architecturally visible Branch Prediction architecture consists of the Branch Target Buffer (BTB), the Pipeline, and the Instruction Prefetch Buffer (IPB). The following sections explain why software may need to care about these components.

The BTB

The IPB

Translation Regimes

In ARM lingo, the current translation regime of the processor refers to whether or not the processor is doing single-stage or 2-stage MMU translation. In other words, it refers to whether or not the processor is using two-dimensional address space lookup from Guest virtual addresses to Host physical addresses. In x86 parlance this is called "Extended Page Tables".

2-level translation schemes are used by Hypervisors to support multiple Guest Operating Systems.