User:Pancakes/ARM Integrator-CP IRQTimerPICTasksAndMM: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
Content added Content deleted
m (work in progress)
m (finished section (more to add))
Line 197: Line 197:
orr r0, r0, #0x1 \n\
orr r0, r0, #0x1 \n\
mcr p15, 0, r0, c1, c0, 0");
mcr p15, 0, r0, c1, c0, 0");
}
uint32 *kalloc1MBTLB() {
KSTATE *ks;
uint32 *tlb;
int x;
ks = (KSTATE*)KSTATEADDR;
tlb = (uint32*)k_heapBMAllocBound(&ks->hphy, 1024 * 16, 14); // 14
if (!tlb)
return 0;
/* initialize it with nothing mapped */
for (x = 0; x < 4096; ++x) {
tlb[x] = 0;
}
return (uint32*)tlb;
}
}
</source>
</source>


You might notice the word TLB and this stands for transition look-aside buffer. Which, is what we will be working with to implement paging. If you are coming from X86/X64 architecture you might have already read or tinkered with paging and it is basically the same on the ARM. One major difference is ARM can work with two paging tables at the same time. This is intended for one to be used for the kernel and another one to be process specific. I am going to use them backwards because the docs state for TTBR0 to be process specific and TTBR1 to be operating system specific, but TTBR1 gets mapped into the upper 2GB of the 32-bit address space and TTBR0 in the lower and since our kernel lives low I am going....
You might notice the word TLB and this stands for transition look-aside buffer. Which, is what we will be working with to implement paging. If you are coming from X86/X64 architecture you might have already read or tinkered with paging and it is basically the same on the ARM. One major difference is ARM can work with two paging tables at the same time. This is intended for one to be used for the kernel and another one to be process specific. I am going to use them backwards because the docs state for TTBR0 to be process specific and TTBR1 to be operating system specific, but TTBR1 gets mapped into the upper 2GB of the 32-bit address space and TTBR0 in the lower and since our kernel lives low I am going to use TTBR0 for it.


You can also further divide the space as follows (not evenly):
[work in progress]
{| class="wikitable"
! TTBR0
! TTBR1
|-
| 2GB
| 2GB
|-
| 1GB
| 3GB
|-
| 512MB
| 3584MB
|-
| ...
| ...
|}

It can actually reach 32MB for the TTBR0 and 4064MB for TTBR1. We are just going to divide it with 2GB for our initial tinkering.

<source lang="c">
ktlb = kalloc1MBTLB();
utlb = kalloc1MBTLB();
/*
map 4GB of space - each entry is 1MB

we actually only need to set 2048 entries, because the remaining entries
are ignored in the ktlb, but AFAIK the utlb still needs an entire 16KB which
appears to me to be proved by the code below...
*/
for (x = 0; x < 4096; ++x) {
ktlb[x] = (x << 20) | TLB_ACCESS | TLB_SECTION;
}
/*
do a little trick to map the same memory twice but in both
page tables (TTBR0 and TTBR1)
*/
utlb[0x800] = (1 << 20) | TLB_ACCESS | TLB_SECTION;
utlb[0x801] = (1 << 20) | TLB_ACCESS | TLB_SECTION;
ktlb[1] = (2 << 20) | TLB_ACCESS | TLB_SECTION;
ktlb[2] = (2 << 20) | TLB_ACCESS | TLB_SECTION;
kserdbg_puts("OK\n");
/* enable the usage of utlb when bit 31 of virtual address is not zero */
arm4_tlbsetmode(1); /* 1 sets a 2GB to 2GB divide */
/* load location of TLB */
arm4_tlbset1((uintptr)utlb);
arm4_tlbset0((uintptr)ktlb);
/* set that all domains are checked against the TLB entry access permissions */
arm4_tlbsetdom(0x55555555);
/* enable TLB */
arm4_tlbenable();
kserdbg_puts("OK2\n");
/* check trick - as you can see the same 1MB chunk is mapped twice */
a = (uint32*)0x80000000;
b = (uint32*)0x80100000;
a[0] = 0x12345678;
ksprintf(buf, "utlb:%x\n", b[0]);
kserdbg_puts(buf);

a = (uint32*)0x100000;
b = (uint32*)0x200000;
a[0] = 0x87654321;
ksprintf(buf, "ktlb:%x\n", b[0]);
kserdbg_puts(buf);

kserdbg_puts("DONE\n");
for(;;);
</source>
We have mapped the one region twice at 1MB and 2MB, and mapped the another region twice at 2048MB and 2049MB. We verify it by writing to one region and reading it from another region. ''Now, this has been tested in QEMU only. On read hardware we could have missed a thing or two. But, this is the basics and if you have done it correctly it should work! Congratulations, you have enabled paging and not crashed your emulator (or maybe real hardware)!

The CPU actually supports second level tables for a finer grain of mapping, and even some other configurations that provide mapping of 16MB regions. I just used 1MB sections because it is easier to understand at first.

==== In Progress ====
We still need to explore other configurations and possibilities with virtual memory and even including it with our tasks!

Revision as of 05:35, 25 March 2014

IRQ, Timer, PIC, Tasks, And MM On The Integrator-CP

Well, hopefully, you are coming from some of the earlier examples. But, if not here are the links.

Page Description
IRQ, Timer, And PIC This demonstration just uses the IRQ, Timer, And PIC.
IRQ, Timer, PIC, And Tasks This shows how to switch between tasks using the timer, and builds from the previous page.

I am going to extend the later demonstration of task switching. But, first we are going to need to add in some basic memory management, and then progress to virtual memory afterwards.

Adding A Heap Implementation

I am going to use a fairly simple heap implementation, Bitmap Heap Implementation Enhanced. I would recommend that you read about it because we will be using all of it's features to implement a memory management system, and it will be needed to implement virtual memory. Also, even if you use your own implementation you might find it helpful to have a general idea of what the codes does. It might even be self explanatory for some readers.

To compile and link I would recommend that you place the structures and prototypes in a header called kheap_bm.h and place the implementation into kheap_bm.c. Then when compiling the kernel code just include the header and link with the kheap_bm.o that you should produce and everything should work great.

Adding The Basic Memory Initialization

We are going to be basing everything from one of the previous demonstrations shown in the links at the top of this page. To make this page easier to read I am only going to duplicate the code needed to demonstrate. The first modification we are going to make is to the KSTATE structure.

typedef struct _KSTATE {
	KTHREAD			threads[0x10];
	uint8			threadndx;	
	uint8			iswitch;
	KHEAPBM			hphy;		/* physical page heap */
	KHEAPBM			hchk;		/* data chunk heap */
} KSTATE;

I have added two new fields hphy and hchk. The hphy will serve as our physical memory manager working with 4KB pages while kchk will work with smaller allocations. If you are using the older demonstration you may be missing the KSTATE structure completely.

Lets make some defines:

/* the intial kernel stack and the exception stack */
#define KSTACKSTART 0x2000		/* descending */
#define KSTACKEXC   0x3000		/* descending */
/* somewhere to place the kernel state structure */
#define KSTATEADDR	0x3000
/* the address of the start of usable RAM and it's length in bytes */
#define KRAMADDR	0x4000
#define KRAMSIZE	(1024 * 1024 * 8)
/* the size of a physical memory page */
#define KPHYPAGESIZE		4096
/* the block size of the chunk heap (kmalloc, kfree) */
#define KCHKHEAPBSIZE		16

You may already have some defines. Just replace them and add any new ones. One important one is KSTATEADDR which defines where our kernel state resides.

Now, let us add some entries into our main function of our play kernel.

void start() {
        .........
        KSTATE      *ks;
        uint8       *bm;

        ks = (KSTATE*)KSTATEADDR;

	/* create physical page heap */
	k_heapBMInit(&ks->hphy);
	k_heapBMInit(&ks->hchk);
	/* get a bit of memory to start with for small chunk and skip exception table */
	k_heapBMAddBlock(&ks->hchk, 4 * 7, KRAMADDR - (4 * 7), KCHKHEAPBSIZE);

	/* remove some memory from our chk heap */

	/* state structure */
	k_heapBMSet(&ks->hchk, KSTATEADDR, sizeof(KSTATE), 5);
	/* stacks (can free KSTACKSTART later) */
	k_heapBMSet(&ks->hchk, KSTACKSTART - 0x1000, 0x1000, 6);
	k_heapBMSet(&ks->hchk, KSTACKEXC - 0x1000, 0x1000, 7);
	
	/* add block but place header in chunk heap to keep alignment */
	bm = (uint8*)k_heapBMAlloc(&ks->hchk, k_heapBMGetBMSize(KRAMSIZE - KRAMADDR, KPHYPAGESIZE));
	k_heapBMAddBlockEx(&ks->hphy, KRAMADDR, KRAMSIZE - KRAMADDR, KPHYPAGESIZE, (KHEAPBLOCKBM*)k_heapBMAlloc(&ks->hchk, sizeof(KHEAPBLOCKBM)), bm, 0);
	
	/* remove kernel image region */
	k_heapBMSet(&ks->hphy, 0x10000, 0x4000, 8);

        ..............

I prefer to have the exception handlers setup before this code. I do not show that above, but you can simply move their initialization code to above this code block. The first objective is to create a small allocation heap, hchk, to handle small allocations then place the remaining RAM as 4KB pages into our second heap known as hphy. The phy part stands for physical, and the chk stands for chunk. The hchk heap uses 16-byte blocks and the hphy uses 4K blocks.

You should notice that I first:

	k_heapBMAddBlock(&ks->hchk, 4 * 7, KRAMADDR - (4 * 7), KCHKHEAPBSIZE);

This adds a region of memory starting after the exception table (4*7) and extends to the beginning of what I define as remaining RAM with KRAMADDR. RAM actually starts at 0x0, so do not let it be misleading. It is just that I had no better name for it.

Next:

	/* state structure */
	k_heapBMSet(&ks->hchk, KSTATEADDR, sizeof(KSTATE), 5);
	/* stacks (can free KSTACKSTART later) */
	k_heapBMSet(&ks->hchk, KSTACKSTART - 0x1000, 0x1000, 6);
	k_heapBMSet(&ks->hchk, KSTACKEXC - 0x1000, 0x1000, 7);

Here we set some regions as used in the bitmap. They contain the kernel's initial stack and the exception stack. My exceptions are non-re-entrant so I can get away with one stack for now. Also notice that our heap grows downward which is why I subtract 0x1000. It just makes the code loading the sp register simpler. Also, I consider each stack to be 4KB, but you could change this to a lower amount if you desired. Also, if you have read the Bitmap Heap Implementation Enhanced page you will know that I am distinguishing each specific region with a different identifier in the bitmap by using 5, 6, and 7.

Finally:

	/* add block but place header in chunk heap to keep alignment */
	bm = (uint8*)k_heapBMAlloc(&ks->hchk, k_heapBMGetBMSize(KRAMSIZE - KRAMADDR, KPHYPAGESIZE));
	k_heapBMAddBlockEx(&ks->hphy, KRAMADDR, KRAMSIZE - KRAMADDR, KPHYPAGESIZE, (KHEAPBLOCKBM*)k_heapBMAlloc(&ks->hchk, sizeof(KHEAPBLOCKBM)), bm, 0);
	
	/* remove kernel image region */
	k_heapBMSet(&ks->hphy, 0x10000 , 0x4000, 8);

What I do first is add a region to the hphy heap containing everything not added to hchk and extend it to the end of RAM. But, one very important difference is I allocate the block header and bitmap outside of that memory region using k_heapBMAddBlockEx. The Ex function adds more control over where the structures are allocated.

Then I remove the region of memory where the kernel image is loaded to which is 0x10000 or 65K. I just assume the kernel is less than 16KB (0x4000) in size. A more elegant method can be used but just to keep things simple I am opting for the easier broken method! So beware if your kernel image gets larger than 16KB you might find the end of it getting corrupted and experience some weird bugs.

If your linker script contains some symbols (which it should if you have followed from previous pages) you could do something like:

        k_heapBMSet(&ks->hphy, (uintptr)&_BOI, (uintptr)&_EOI - (uintptr)&_BOI, 8);

This would account for the kernel image and might be a better solution. But never the less we set that region of memory as used in the bitmap for physical pages (hphy). The hchk stopped before it reached the kernel image with KRAMADDR which is set to 0x4000 (above a ways).

kmalloc and kfree

At this point you can now implement a kmalloc and kfree if you like:

void kfree(void *ptr) {
	KSTATE			*ks;
	
	ks = (KSTATE*)KSTATEADDR;

	k_heapBMFree(&ks->hchk, ptr);
}

void* kmalloc(uint32 size) {
	void			*ptr;
	KSTATE			*ks;
	uint32			_size;	
	
	ks = (KSTATE*)KSTATEADDR;
	
	/* attempt first allocation try (will fail on very first) */
	ptr = k_heapBMAlloc(&ks->hchk, size);
	
	/* try adding more memory if failed */
	if (!ptr) {
		if (size < KCHKMINBLOCKSZ / 2) {
			/* we need to allocate blocks at least this size */
			_size = KCHKMINBLOCKSZ;
		} else {
			/* this is bigger than KCHKMINBLOCKSZ, so lets double it to be safe */
			/* round up allocation to use all the blocks taken */
			_size = size * 2;
			_size = (_size / KPHYPAGESIZE) * KPHYPAGESIZE < _size ? _size / KPHYPAGESIZE + 1 : _size / KPHYPAGESIZE;
			_size = _size * KPHYPAGESIZE;
		}
		ptr = k_heapBMAlloc(&ks->hphy, _size);
		if (!ptr) {
			/* no more physical pages */
			return 0;
		}
		/* try allocation once more, should succeed */
		k_heapBMAddBlock(&ks->hchk, (uintptr)ptr, _size, KCHKHEAPBSIZE);
		ptr = k_heapBMAlloc(&ks->hchk, size);
	}
	
	return ptr;
}

Tinkering With Virtual Memory

Now that we have some basic memory management we can proceed to the fun stuff. Lets start really small until we get a grasp of what we are working with.

We need a few utility functions to make life easier:

void arm4_tlbset0(uint32 base) {
	asm("mcr p15, 0, %[tlb], c2, c0, 0" : : [tlb]"r" (base));
}

void arm4_tlbset1(uint32 base) {
	asm("mcr p15, 0, %[tlb], c2, c0, 1" : : [tlb]"r" (base));
}

void arm4_tlbsetmode(uint32 val) {
	asm("mcr p15, 0, %[tlb], c2, c0, 2" : : [tlb]"r" (val));
}

void arm4_tlbsetdom(uint32 val) {
	asm("mcr p15, 0, %[val], c3, c0, 0" : : [val]"r" (val));
}

uint32 arm4_tlbenable() {
	asm(" \
			mrc p15, 0, r0, c1, c0, 0  \n\
			orr r0, r0, #0x1 \n\
			mcr p15, 0, r0, c1, c0, 0");
}
uint32 *kalloc1MBTLB() {
	KSTATE			*ks;
	uint32			*tlb;
	int				x;
	
	ks = (KSTATE*)KSTATEADDR;
	tlb = (uint32*)k_heapBMAllocBound(&ks->hphy, 1024 * 16, 14);  // 14
	
	if (!tlb)
		return 0;
	
	/* initialize it with nothing mapped */
	for (x = 0; x < 4096; ++x) {
		tlb[x] = 0;
	}
	
	return (uint32*)tlb;
}

You might notice the word TLB and this stands for transition look-aside buffer. Which, is what we will be working with to implement paging. If you are coming from X86/X64 architecture you might have already read or tinkered with paging and it is basically the same on the ARM. One major difference is ARM can work with two paging tables at the same time. This is intended for one to be used for the kernel and another one to be process specific. I am going to use them backwards because the docs state for TTBR0 to be process specific and TTBR1 to be operating system specific, but TTBR1 gets mapped into the upper 2GB of the 32-bit address space and TTBR0 in the lower and since our kernel lives low I am going to use TTBR0 for it.

You can also further divide the space as follows (not evenly):

TTBR0 TTBR1
2GB 2GB
1GB 3GB
512MB 3584MB
... ...

It can actually reach 32MB for the TTBR0 and 4064MB for TTBR1. We are just going to divide it with 2GB for our initial tinkering.

	ktlb = kalloc1MBTLB();
	utlb = kalloc1MBTLB();
	
	/* 
            map 4GB of space - each entry is 1MB 

           we actually only need to set 2048 entries, because the remaining entries
           are ignored in the ktlb, but AFAIK the utlb still needs an entire 16KB which
           appears to me to be proved by the code below...
        */
	for (x = 0; x < 4096; ++x) {
		ktlb[x] = (x << 20) | TLB_ACCESS | TLB_SECTION;
	}
	
	/* 
		do a little trick to map the same memory twice but in both
		page tables (TTBR0 and TTBR1)
	*/
	utlb[0x800] = (1 << 20) | TLB_ACCESS | TLB_SECTION;
	utlb[0x801] = (1 << 20) | TLB_ACCESS | TLB_SECTION;
	ktlb[1] = (2 << 20) | TLB_ACCESS | TLB_SECTION;
	ktlb[2] = (2 << 20) | TLB_ACCESS | TLB_SECTION;
	
	kserdbg_puts("OK\n");
	
	/* enable the usage of utlb when bit 31 of virtual address is not zero */
	arm4_tlbsetmode(1); /* 1 sets a 2GB to 2GB divide */
	/* load location of TLB */
	arm4_tlbset1((uintptr)utlb);
	arm4_tlbset0((uintptr)ktlb);
	/* set that all domains are checked against the TLB entry access permissions */
	arm4_tlbsetdom(0x55555555);
	/* enable TLB */
	arm4_tlbenable();
	
	kserdbg_puts("OK2\n");
	
	/* check trick - as you can see the same 1MB chunk is mapped twice */
	a = (uint32*)0x80000000;
	b = (uint32*)0x80100000;
	a[0] = 0x12345678;
	ksprintf(buf, "utlb:%x\n", b[0]);
	kserdbg_puts(buf);

	a = (uint32*)0x100000;
	b = (uint32*)0x200000;
	a[0] = 0x87654321;
	ksprintf(buf, "ktlb:%x\n", b[0]);
	kserdbg_puts(buf);

	kserdbg_puts("DONE\n");
	for(;;);

We have mapped the one region twice at 1MB and 2MB, and mapped the another region twice at 2048MB and 2049MB. We verify it by writing to one region and reading it from another region. Now, this has been tested in QEMU only. On read hardware we could have missed a thing or two. But, this is the basics and if you have done it correctly it should work! Congratulations, you have enabled paging and not crashed your emulator (or maybe real hardware)!

The CPU actually supports second level tables for a finer grain of mapping, and even some other configurations that provide mapping of 16MB regions. I just used 1MB sections because it is easier to understand at first.

In Progress

We still need to explore other configurations and possibilities with virtual memory and even including it with our tasks!