Detecting Memory (x86)

From OSDev.wiki
Revision as of 20:24, 17 May 2008 by osdev>Bewing (Rewrite of "How do I detect memory" article)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

One of the most vital pieces of information that an OS needs in order to initialize itself is a map of the available RAM on a machine. Fundamentally, the best way (and really the only way) an OS can get that information is by using the BIOS. There may be rare machines where you have no other choice but to try to detect memory yourself -- however, doing so is unwise in any other situation.

It is perfectly reasonable to say to yourself, "How does the BIOS detect RAM? I'll just do it that way." Unfortunately, the answer is disappointing:
Most BIOSes can't use any RAM until they detect the type of RAM installed, then detect the size of each memory module, then configure the chipset to use the detected RAM. All of this depends on chipset specific methods, and is usually documented in the datasheets for the memory controller (northbridge). The RAM is unusable for running programs during this process. The BIOS initially is running from ROM, so it can play the necessary games with the RAM chips. But it is completely impossible to do this from inside any other program.

It is also reasonable to wish to reclaim the memory from 0xA0000 to 0xFFFFF and make your RAM contiguous. Again the answer is disappointing:
Forget about it. It is likely that some of it is being used regularly by SMM or ACPI. Some of it you will probably need again, even after the machine is booted. Reclaiming pieces of it would require a significant amount of motherboard- or chipset-specific control. It is possible to write a "chipset driver" that may allow you to reclaim a little of it. However, it is almost certainly impossible to reclaim it all. The miniscule results will not be worth the effort, in general.

It might be important to note that all PCs require a memory hole just below 4GB for additional memory mapped hardware (including the actual BIOS ROM). So, for a machine with >3G of RAM, the motherboard / chipset / BIOS may map some of that RAM (that would overlap mapped hardware) above 4G -- use PAE or Long Mode to access it.

See Memory Map (x86) for a generic layout of memory.


Detecting Low Memory

"Low memory" is the available RAM below 1Mb, and usually below 640Kb. There are two BIOS functions to get the size of it.

INT 0x12: The INT 0x12 call will return AX = total number of Kb (or an error). The AX value measures from 0, up to the bottom of the EBDA (of course, you probably shouldn't use the first 0x500 bytes of the space either -- i.e. the IVT or BDA).

Usage:

	XOR AX, AX
	INT 0x12		; request low memory size
	JC SHORT .ERR
	TEST AX, AX		; size = 0 is an error
	JE SHORT .ERR
	CMP AH, 0x86		; unsupported function
	JE SHORT .ERR
	CMP AH, 0x80		; invalid command
	JE SHORT .ERR
	; AX should contain a valid size in K.

Alternately, you can just use INT 0x15, EAX = 0xE820 (see below).


Detecting Upper Memory

BIOS Function: INT 0x15, EAX = 0xE820

By far the best way to detect the memory of a PC is by using the INT 0x15, EAX = 0xE820 command. This function is available on all PCs built since 2002, and on most existing PCs before then. It is the only BIOS function that can detect memory areas above 4G. It is meant to be the ultimate memory detection BIOS function.

In reality, this function returns an unsorted list that may contain unused entries and (in rare/dodgy cases) may return overlapping areas. Each list entry is stored in memory at ES:DI, and DI is not incremented for you. The format of an entry is 2 qwords and a dword in the 20b version, plus one additional dword in the 24 byte ACPI 3.0 version (but nobody has ever seen a 24 byte one). It is probably best to always store the list entries as 24 byte quantities -- to preserve qword alignments, if nothing else. (Make sure to set that last dword to 1 before each call, to make your map compatible with ACPI).

  • First qword = Base address
  • Second qword = Length of "region" (if this value is 0, ignore the entry)
  • Next dword = Region "type"
    • Type 1: Usable (normal) RAM
    • Type 2: Reserved - unusable
    • Type 3: ACPI reclaimable memory
    • Type 4: ACPI NVS memory
    • Type 5: Area containing bad memory
  • Next dword = ACPI 3.0 Extended Attributes bitfield (if 24 bytes are returned, instead of 20)
    • Bit 0 of the Extended Attributes indicates if the entire entry should be ignored (if the bit is clear). This is going to be a huge compatibility problem because most current OSs won't read this bit and won't ignore the entry.
    • Bit 1 of the Extended Attributes indicates if the entry is non-volatile (if the bit is set) or not. The standard states that "Memory reported as non-volatile may require characterization to determine its suitability for use as conventional RAM."
    • The remaining 30 bits of the Extended Attributes are currently undefined.

Basic Usage:
For the first call to the function, point ES:DI at the destination buffer for the list. Clear EBX. Set EDX to the magic number 0x0534D4150. Set EAX to 0xE820 (note that the upper word of EAX should be set to 0). Set ECX to 24. Do an INT 0x15.

If the first call to the function is successful, EAX will be set to 0x0534D4150, and the Carry flag will be clear. EBX will be set to some non-zero value, which must be preserved for the next call to the function. CL will contain the number of bytes actually stored at ES:DI (probably 20).

For the subsequent calls to the function: increment DI by your list entry size, reset EAX to 0xE820, and ECX to 24. When you reach the end of the list, EBX may reset to 0. If you call the function again with EBX = 0, the list will start over. If EBX does not reset to 0, the function will return with Carry set when you try to access the entry after the last valid entry.

(See the x86 code examples below for an ASM example, implementing the algorithm.)

Notes:

  • After getting the list, it may be desirable to: sort the list, combine adjacent ranges of the same type, change any overlapping areas to the most restrictive type, and change any unrecognised "type" values to type 2.
  • Type 3 "ACPI reclaimable" memory regions may be used like (and combined with) normal "available RAM" areas as long as you're finished using the ACPI tables that are stored there (i.e. it can be "reclaimed").
  • Types 2, 4, 5 (reserved, ACPI non-volatile, bad) mark areas that should be avoided when you are allocating physical memory.
  • Treat unlisted regions as Type 2 -- reserved.
  • Your code must be able to handle areas that don't start or end on any sort of "page boundary".

Typical Output by a call to INT 15h, EAX=E820 in Bochs:

  Base Address       | Length             | Type
  0x0000000000000000 | 0x000000000009FC00 | Free Memory (1)
  0x000000000009FC00 | 0x0000000000000400 | Reserved Memory (2)
  0x00000000000E8000 | 0x0000000000018000 | Reserved Memory (2)
  0x0000000000100000 | 0x0000000001F00000 | Free Memory (1)
  0x00000000FFFC0000 | 0x0000000000040000 | Reserved Memory (2)


Other Methods

PnP

It is possible to get a fairly good memory map using Plug 'n Play (PnP) calls. {Need description and code.}


SMBios

SMBIOS is designed to allow "administrators" to assess hardware upgrade options or maintain a catalogue of what hardware a company current has in use (ie. it provides information for use by humans, rather than for use by software). It may not give reliable results on many computers -- see: [1].

SMBIOS will try to tell you the number of memory sticks installed and their size in MB. SMBIOS can be called from protected mode. However, some manufacturers don't make their systems fully compliant. e.g. HP Itanium goes by the DIG64 specification so their SMBIOS doesn't return all the required device types.

Detecting memory with these functions completely ignores the concept of memory holes / memory mapped devices / reserved areas.


BIOS Function: INT 0x15, AX = 0xE881

Note: This function is identical to the E801 function, except that is uses extended registers (EAX/EBX/ECX/EDX). It only reports contiguous (usable) RAM. It cannot detect any more memory than the E801 function, but there is a chance that this function may succeed on BIOSes where E801 fails.

BIOS Function: INT 0x15, AX = 0xE801

This function has been around since about 1994, so all systems from after then up to now should have this function. It is built to handle the 15M memory hole, but stops at the next hole / memory mapped device / reserved area above that. That is, it is only designed to handle contiguous memory above 16M.

Typical Output:
AX = CX = extended memory between 1M and 16M, in K (max 3C00h = 15MB) BX = DX = extended memory above 16M, in 64K blocks

There are some BIOSes that always return with AX = BX = 0. Use the CX/DX pair in that case. Some other BIOSes will return CX = DX = 0. Linux initializes the CX/DX pair to 0 before the INT opcode, and then uses CX/DX, unless they are still 0 (when it will use AX/BX). In any case, it is best to do a sanity check on the values in the registers that you use before you trust the results. (GRUB just trusts AX/BX -- this is not good.)

Linux Usage:

	XOR CX, CX
	XOR DX, DX
	MOV AX, 0xE801
	INT 0x15		; request upper memory size
	JC SHORT .ERR
	JCXZ .USEAX
	CMP AH, 0x86		; unsupported function
	JE SHORT .ERR
	CMP AH, 0x80		; invalid command
	JE SHORT .ERR
	JMP SHORT .USECX
.USEAX:
	MOV CX, AX
	MOV DX, BX
.USECX:
	; CX = number of contiguous K, 1M to 16M
	; DX = contiguous 64K pages above 16M


BIOS Function: INT 0x15, AH = 0x88

Note: This function may limit itself to reporting 15M (for Windows legacy reasons) even if BIOS detects more memory than that. It may also report up to 64M. It only reports contiguous (usable) RAM.

Usage:

	MOV AH, 0x88
	INT 0x15		; request upper memory size
	JC SHORT .ERR
	TEST AX, AX		; size = 0 is an error
	JE SHORT .ERR
	CMP AH, 0x86		; unsupported function
	JE SHORT .ERR
	CMP AH, 0x80		; invalid command
	JE SHORT .ERR
	; AX = number of contiguous K above 1M


CMOS

The CMOS memory size information may ignore the standard memory hole at 15M. If you use the CMOS size, you may want to simply assume that this memory hole exists. Of course, it also has no information about any other reserved regions.

Usage:

	unsigned short total;
	unsigned byte lowmem, highmem;

	outportb(0x70, 0x30);
	lowmem = inportb(0x71);
	outportb(0x70, 0x31);
	highmem = inportb(0x71);

	asm mov ah, highmem;
	asm mov al, lowmem;
	asm mov total, ax;	// 'total' = contiguous mem above 1Mb, Max. approx. 64 Mb

	return total;


Manual Probing

WE DISCOURAGE YOU FROM DIRECTLY PROBING MEMORY

Use BIOS to get a memory map, or use GRUB (which calls BIOS for you).

When perfectly implemented, directly probing memory may allow you to detect extended memory even on systems where the BIOS fails to provide the appropriate support (or without even worrying about whether your BIOS can do it or not). The algorithm may or may not take into account potential holes in system memory or previously detected memory mapped devices, such as frame buffering SVGA cards, etc.

However, the BIOS knows things you ignore about your motherboard and PCI devices. Probing memory-mapped PCI devices may have *unpredictable results* and may theoretically *damage your system*, so once again we discourage its use.

Note: You will never get an error from trying to read/write memory that does not exist -- this is important to understand: you will not get valid results, but you won't get an error, either.

Theoretical obstacles to probing
  • There can be a memory mapped device from 15 MB to 16 MB (typically "VESA local bus" video cards, or older ISA cards that aren't limited to just video).
  • There can also be an (extremely rare) "memory hole" at 0x00080000 for some sort of compatability with ancient cards.
  • On modern systems there can also be faulty RAM that INT 0x15, eax=0xE820 says not to use, that can be anywhere (except in the first 1 MB, probably).
  • There can also be large arbitrary memory holes. (e.g. a NUMA system with RAM up to 0x1FFFFFFF, a hole from 0x20000000 to 0x3FFFFFFF, then more RAM from 0x40000000 to 0x5FFFFFFF.)
  • It's theoretically possible for the memory controller to truncate physical addresses, so that accessing the address 0x80000123 on a motherboard that only supports 2 GB of RAM will wrap around and access RAM at 0x00000123 because the highest addressing bit/s aren't connected. In this case (for a sequential search) you won't notice the problem if the motherboard has the maximum amount of RAM installed.
  • There are memory mapped devices (PCI video cards, HPET, PCI express configuration space, APICs, etc) with addresses that must be avoided.
  • There are also (typically older) motherboards where you can write a value to "nothing" and read the same value back due to bus capacitance; there are motherboards where you write a value to cache and read the same value back from cache even though no RAM is at that address.
  • There are (older, mostly 80386) motherboards that remap the RAM underneath the option ROMs and BIOS to the end of RAM. (e.g. with 4 MB of RAM installed you get RAM from 0x00000000 to 0x000A0000 and more RAM from 0x00100000 to 0x00460000, which causes problems if you test each MB of RAM because you get the wrong answer -- either under-counting RAM up to 0x00400000, or over-counting RAM up to 0x00500000).
  • There can be important data structures left in RAM by the BIOS (e.g. the ACPI tables) that you'd trash.
  • If you write the code properly (ie. to avoid as many of the problems as you can), then it is insanely slow.
  • Lastly, testing for RAM (if it actually works) will only tell you where RAM is - it doesn't give you a complete map of the physical address space. You won't know where you can safely put memory mapped PCI devices because you won't know which areas are reserved for chipset things (e.g. SMM, ROMs), etc.

In contrast to this, using the BIOS functions isn't too hard, is much more reliable, gives complete information, and is extremely fast in comparison.


Memory Map Via GRUB

GRUB, or any bootloader implementing The Multiboot Specification provides a convenient way of detecting the amount of RAM your machine has. Rather than re-invent the wheel, you can ride on the hard work that others have done by utilizing the multiboot_info structure. When GRUB runs, it loads this structure into memory and leaves the address of this structure in the EBX register.

The methods it uses are:

  • Try BIOS Int 0x15, eax = 0xE820
  • If that didn't work, try BIOS Int 0x15, ax = 0xE801 and BIOS Int 0x12
  • If that didn't work, try BIOS Int 0x15, ah = 0x88 and BIOS Int 0x12

However, it does not take into account any bugs that are known to effect some BIOSs (see entries in RBIL). It does not check "E801" and/or "88" returning with carry set.


To utilize the information that GRUB passes to you, first include the file multiboot.h in your kernel's main file. Then, make sure that when you load your _main function from your assembly loader, you push EBX onto the stack. The Bare bones tutorials have already done this for you.

The key for memory detection lies in the multiboot_info struct. To get access to it, you've already pushed it onto the stack...define your start function as such:

_main (multiboot_info_t* mbd, unsigned int magic) {...}

To determine the contiguous memory size, you may simply check mbd->flags to verify bit 0 is set, and then you can safely refer to mbd->mem_lower for conventional memory (e.g. physical addresses ranging between 0 and 640KB) and mbd->mem_upper for high memory (e.g. from 1MB). Both are given in "real" kilobytes, i.e. blocks of 1024 bytes each.

To get a complete memory map, check bit 6 of mbd->flags and use mbd->mmap_addr to access the BIOS-provided memory map. Quoting specifications,

If bit 6 in the flags word is set, then the mmap_* fields are valid, and indicate the address and length of a buffer containing a memory map of the machine provided by the BIOS. mmap_addr is the address, and mmap_length is the total size of the buffer. The buffer consists of one or more of the following size/structure pairs (size is really used for skipping to the next pair):
-4 size
0 base_addr_low
4 base_addr_high
8 length_low
12 length_high
16 type
  • "size" is the size of the associated structure in bytes, which can be greater than the minimum of 20 bytes. base_addr_low is the lower 32 bits of the starting address, and base_addr_high is the upper 32 bits, for a total of a 64-bit starting address. length_low is the lower 32 bits of the size of the memory region in bytes, and length_high is the upper 32 bits, for a total of a 64-bit length. type is the variety of address range represented, where a value of 1 indicates available RAM, and all other values currently indicated a reserved area.
  • GRUB simply uses INT 15h, EAX=E820 to get the detailed memory map, and does not verify the "sanity" of the map. It also will not sort the entries, retrieve any available ACPI 3.0 extended DWORD (with the "ignore this entry" bit), or clean up the table in any other way.
  • One of the problems you have to deal with is that grub can theoretically place its multiboot information, and all the tables it references (elf sections, mmap and modules) anywhere in memory, according to the multiboot specification. In reality, in current grub legacy, they are allocated as parts of the grub program itself, below 1MB, but that is not guaranteed to remain the same. For that reason, you should try to protect these tables before you start using a certain bit of memory. (You might scan the tables to make sure their addresses are all below 1M.)


Memory Detection in Emulators

When you tell an emulator how much memory you want emulated, the concept is a little "fuzzy" because of the emulated missing bits of RAM below 1M. If you tell an emulator to emulate 32M, does that mean your address space definitely goes from 0 to 32M -1, with missing bits? Not necessarily. The emulator might assume that you mean 32M of contiguous memory above 1M, so it might end at 33M -1. Or it might assume that you mean 32M of total usable RAM, going from 0 to 32M + 384K -1. So don't be surprised if you see a "detected memory size" that does not exactly match your expectations.


x86 Examples

Manual Probing in C

  • Note the interrupt disable and the cache invalidation using inline Assembly to keep memory consistent.
  • Also note that that using Manual Probing is strongly discouraged in general.
 /*
  * void count_memory (void)
  *
  * probes memory above 1mb
  *
  * last mod : 05sep98 - stuart george
  *            08dec98 - ""     ""
  *            21feb99 - removed dummy calls
  *
  */
void count_memory(void)
{
	register ULONG *mem;
	ULONG	mem_count, a;
	USHORT	memkb;
	UCHAR	irq1, irq2;
	ULONG	cr0;

	/* save IRQ's */
	irq1=inb(0x21);
	irq2=inb(0xA1);

	/* kill all irq's */
	outb(0x21, 0xFF);
	outb(0xA1, 0xFF);

	mem_count=0;
	memkb=0;

	// store a copy of CR0
	__asm__ __volatile("movl %%cr0, %%eax":"=a"(cr0))::"eax");

	// invalidate the cache
	// write-back and invalidate the cache
	__asm__ __volatile__ ("wbinvd");

	// plug cr0 with just PE/CD/NW
	// cache disable(486+), no-writeback(486+), 32bit mode(386+)
	__asm__ __volatile__("movl %%eax, %%cr0", ::
		"a" (cr0 | 0x00000001 | 0x40000000 | 0x20000000) : "eax");

	do {
		memkb++;
		mem_count += 1024*1024;
		mem= (ULONG*) mem_count;

		a= *mem;
		*mem= 0x55AA55AA;

          // the empty asm calls tell gcc not to rely on what's in its registers
          // as saved variables (this avoids GCC optimisations)
		asm("":::"memory");
		if (*mem!=0x55AA55AA) mem_count=0;
		else {
			*mem=0xAA55AA55;
			asm("":::"memory");
			if(*mem!=0xAA55AA55)
			mem_count=0;
		}

		asm("":::"memory");
		*mem=a;

	} while (memkb<4096 && mem_count!=0);

	__asm__ __volatile__("movl %%eax, %%cr0", :: "a" (cr0) : "eax");

	mem_end = memkb<<20;
	mem = (ULONG*) 0x413;
	bse_end= (*mem & 0xFFFF) <<6;

	outb(0x21, irq1);
	outb(0xA1, irq2);
}


Getting a GRUB Memory Map

Declare the appropriate structure, get the pointer to the first instance, grab whatever address and length information you want, and finally skip to the next memory map instance by adding size+4 to the pointer, tacking on the 4 to account for GRUB treating base_addr_low as offset 0 in the structure. You must also use mmap_length to make sure you don't overshoot the entire buffer.

typedef struct multiboot_memory_map {
	unsigned int size;
	unsigned int base_addr_low,base_addr_high;
// You can also use: unsigned long long int base_addr; if supported.
	unsigned int length_low,length_high;
// You can also use: unsigned long long int length; if supported.
	unsigned int type;
} multiboot_memory_map_t;

int main(multiboot_info* mbt, unsigned int magic) {
	...
	multiboot_memory_map_t* mmap = mbt->mmap_addr;
	while(mmap < mbt->mmap_addr + mbt->mmap_length) {
		...
		mmap = (multiboot_memory_map_t*) ( (unsigned int)mmap + mmap->size + sizeof(unsigned int) );
	}
	...
}


Getting an E820 Memory Map

; use the INT 0x15, eax= 0xE820 BIOS function to get a memory map
; inputs: es:di -> destination buffer for 24 byte entries
; outputs: bp = entry count, trashes all registers except esi
do_e820:
	xor ebx, ebx		; ebx must be 0 to start
	xor bp, bp		; keep an entry count in bp
	mov edx, 0x0534D4150	; Place "SMAP" into edx
	mov eax, 0xe820
	mov [es:di + 20], dword 1	; force a valid ACPI 3.X entry
	mov ecx, 24		; ask for 24 bytes
	int 0x15
	jc short .failed	; carry set on first call means "unsupported function"
	cmp eax, edx		; on success, eax must have been reset to "SMAP"
	jne short .failed
	test ebx, ebx		; ebx = 0 implies list is only 1 entry long (worthless)
	je short .failed
	jmp short .jmpin
.e820lp:
	mov eax, 0xe820		; eax, ecx get trashed on every int 0x15 call
	mov [es:di + 20], dword 1	; force a valid ACPI 3.X entry
	mov ecx, 24		; ask for 24 bytes again
	int 0x15
	jc short .e820f		; carry set means "end of list already reached"
.jmpin:
	jcxz .skipent		; skip any 0 length entries
	cmp cl, 20		; got a 24 byte ACPI 3.X response?
	jbe short .notext
	test byte [es:di + 20], 1	; if so: is the "ignore this data" bit clear?
	je short .skipent
.notext:
	mov ecx, [es:di + 8]	; get lower dword of memory region length
	test ecx, ecx		; is the qword == 0?
	jne short .goodent
	mov ecx, [es:di + 12]	; get upper dword of memory region length
	jecxz .skipent		; if length qword is 0, skip entry
.goodent:
	inc bp			; got a good entry: ++count, move to next storage spot
	add di, 24
.skipent:
	test ebx, ebx		; if ebx resets to 0, list is complete
	jne short .e820lp
.e820f:
	mov [mmap_ent], bp	; store the entry count
	ret			; test opcode cleared carry flag
.failed:
	stc			; "function unsupported" error exit
	ret

See Also

Threads