Segment Limits

From OSDev.wiki
Revision as of 09:32, 13 April 2014 by osdev>Johnburger (→‎32-Bit Entry: Added Code Segment Default explanation for the Big flag)
Jump to navigation Jump to search

Protected Mode

In x86 Protected Mode, a Segment is described by two parameters, the Base address and the Limit:

  • The Base address is where in the CPU-addressable space the Segment starts;
  • The Limit is the last Segment-relative address that can be accessed offset from the Base.

For example, a Memory Descriptor entry in the Global Descriptor Table may describe an Expand Up Data Descriptor starting at 0005_0000h for 0000_C000h - in other words, 48 kiB (+1 byte) in the middle of Conventional RAM.

Given the above Descriptor assigned to the DS segment register, accessing a Byte at [DS:0000_0000h] or [DS:0000_C000h] will work, but [DS:0000_C001h] or above won't. The biggest advantage of this mechanism is that various offsets embedded in a program don't need to be "fixed up" at load time to cater for the load address of the program: since every offset is relative to a Segment's Base, the fix-ups are performed by the CPU at run time.

While the Base address is affected by things like Paging (if enabled), the Limit is simply the number of contiguous bytes after that: any access higher than the Limit will cause a General Protection Fault. Unless the Descriptor is for an Expand Down segment, in which case everything changes (see below)...

16-Bit Entry

When Intel introduced Descriptor Tables in the 80286 (with 16-bit data but a 24-bit address bus), they defined one Entry to be 8 bytes. That made it easy to use a Segment Register as a Selector, with a couple of leftover bits as a GDT/LDT selector and Privilege Level selector. Of the 8 bytes, they only needed 6: 3 for the (24-bit maximum) Base, 2 for the (16-bit maximum) Limit, and 1 for the Entry's Type (Data vs Code vs System). They specified that the last two bytes of the eight were reserved, and must be zero.

Since the Limit was 16 bits, Intel needed to decide how to interpret the edge conditions of 0 kiB and 64 kiB: would a Limit of 0000h mean no access (what's the point of a zero-length Segment?) or full access (all 64 kiB available)? So (of course) they chose the third option: the Limit would be the last accessible byte. A Limit of 0000h allowed only 1 accessible address, while FFFFh allowed all of them.

32-Bit Entry

When Intel introduced the 32-bit 80386 (with both 32-bit data and address buses), they had a problem. They needed one more byte to hold the new 32-bit Base, but two more bytes to hold the new 32-bit Limit - and they only had two bytes spare. They still wanted to be backward-compatible and run existing 80286 software unchanged, but they also wanted to take full advantage of the new 32-bit addresses.

But Intel realised that when dealing with really large Segments, a programmer wouldn't be agonizing over whether it should be 12,086,384 or 12,086,385 bytes in size. At those sorts of sizes - especially with the '386's new paging functionality - a programmer would be working at a higher granularity: in 4 kiB Pages rather than bytes. A Page needs 12 bits to address into, so if the Limit field was marked as Page-granular rather than byte-granular, it would only need to be (32-12=) 20 bits in size - and 16 were already defined!

So they made a compromise in the Descriptor Table Entry. They did add an extra byte for the Base, making it fully 32-bit addressable, but they turned the last available byte into a compound Limit-with-Flags record. The last four bits required for the maximum-sized Limit was the low nybble of that byte, and the high nybble was used for two flags, called Gran and Big:

  • Gran is obvious - it indicates whether to use byte-granular (=0) or Page-granular (=1) calculations on the Limit. With a 20-bit Limit in byte-granular mode, it is possible to fine-tune a Segment to anything from 1 byte to 1 MiB. In Page-granular mode, you can specify anything from 4 kiB to 4 GiB in 4 kiB jumps. The 20-bit Limit is shifted 12 bits to the left, and 1s are shifted in. A Page-granular Limit of 0_0000h actually means 0000_0FFFh.
  • Big is less obvious - it indicates whether addresses used by implicit registers will be 16-bit (=0) or 32-bit (=1). What are implicit registers? There are only two: (E)IP and (E)SP. When used for a Code segment, instructions are fetched by either IP or EIP. When used for a Stack, values are pushed using either SP or ESP. The Big flag indicates which one.
    Incidentally, when used in a Code Descriptor, the Big flag is also known as the Default flag. It sets the Default 16- or 32-bit mode that instructions will be interpreted by, when executed in this Segment. For example, the B8h op-code under a Default=0 Code Segment means MOV AX,... and needs to be followed by 2 bytes for the Word to load. Under a Default=1 Code Segment it means MOV EAX,... and needs to be followed by 4 bytes for the DWord to load. The OpSiz (66h) and AdSiz (67h) instruction prefixes toggle the current Default interpretation.

The effect of the Big flag is subtle: it influences things like in Code Segments the size of the value pushed during a CALL, and in Data Segments the Segment wrap for Stacks - if ESP is 0000_0000h, a PUSH EAX will make ESP either 0000_FFFCh or FFFF_FFFCh depending on that flag - which doesn't mean much until you get to Expand Down Segments.

Segmentation

Segmentation has its detractors. Indeed, most Operating Systems today don't use it - or at least only the minimum vestigial references as required by the architecture. However, the OS is then forced to perform address fix-ups at load time and shared code becomes a nightmare. Segmentation solves all of this - abandoning it means that the OS has to duplicate the work.

An example might make things clearer.

Heap

Imagine that you define a Heap to be 4 kiB in size. You put it in its own Segment, and allow the program to alloc() and free() chunks of memory from it. If an alloc() comes through for more memory than is available, the code has two choices:

  • It can deny the request;
  • It can increase the Segment size, which may require moving the whole Segment to a new area of RAM.

This second option is where the beauty of Segmentation comes in: after the move, the Segment's Base address can be modified to point to the new memory and the calling program doesn't know anything happened. It still accesses the Heap with the same Selector value, and unless it uses the LSL (Load Segment Limit) instruction, it won't even know that the Segment grew.

Stack

The point is that growing a Heap doesn't affect any of the offsets in the original data: even the stored Selector value doesn't change. Unfortunately, the same is not true for Stacks.

Now imagine that you define a 4 kiB Stack. Again it is in its own Segment, and SP starts at 1000h and works its way downwards. When it hits 0000h it's a valid address - but one more PUSH will make SP overflow the Limit. At this point the code has two choices:

  • It can abort the program due to lack of Stack space;
  • It can increase the Stack's size, which again may require moving the whole Segment to a new area of RAM.

The move is again not a problem: change the Segment's Base and it's done. But the Limit is the problem: there's nowhere for the Stack to grow to! Just increasing the Limit won't do anything: the Stack is at the bottom of the Segment, not the top. And moving all the data inside the Stack to the top of the new Segment really won't work: it will muck up all the saved offsets, since (for example) 1000h needs to become 2000h - but which values need to be changed? And what about the values stored in the program's registers? Which of those should be changed?

Expand Down

So in short, growing a Stack like you can a Heap (described above) can't be done. Not using traditional Data Segments anyway, which is why Intel designed Expand Down Segments. As it sounds, they're designed to be used with Stacks to permit them to be expanded during run-time. Just by setting a single bit in the Descriptor Table Entry's Type field changes the interpretation of the whole Data Segment completely.

There are two ways to look at Expand Down Segments:

  • Valid vs Invalid:
    • An Expand Up Segment uses the Limit to define valid and invalid addresses within the Segment;
    • The Expand Down flag swaps them, making the once-valid addresses invalid, and vice versa.
  • Arithmetically:
    • An Expand Up Segment defines the Base and the largest possible offset you're allowed to add to that Base:
    • The Expand Down flag makes the Base actually the top of the accessed memory, and the Limit (+1) becomes the lowest possible offset you can access below that.

The second interpretation is slightly incorrect since the CPU is still adding the offset to the Base - but since the offset is so large it's effectively negative, resulting in a smaller final address.

16-Bit Example

So a 16-bit Stack of 1 kiB with a Base address of 05_0000h and a Limit of FC00h (marked Expand Down, of course) would start with an SP of 0000h - which may look worrying to some programmers. But it's perfectly safe: the first PUSH AX or CALL would make SP FFFEh and store the value there, which would actually store the value at RAM location 0004_FFFEh. Subsequent Stack operations would continue to decrease SP until it hit the value FC00h - where an access would cause a Stack Fault.

The sharp-eyed among you will have seen the difference: an Expand Up Segment's Limit is inclusive - the specified Limit is still an accessible byte. That means that in an Expand Down Segment, the Limit is exclusive - it is effectively the highest address that cannot be accessed. So a better Limit for the above example would be FBFFh.

And now "growing" the Stack is easy: decrease the Limit as desired, and move the memory if necessary (not forgetting that Base effectively defines the top of the memory area). Again, none of the offsets inside the Stack need to be modified.

32-Bit Example

A 32-bit Expand Down Segment follows on from the 16-bit version exactly: the Base is the top of the accessed memory, and the Limit is the highest inaccessible location below that. The problem comes from the fact that the Limit only has 20 bits of resolution, and the Gran flag can't help us now.

The 16-bit example above specified a 1 kiB Stack with a starting Limit of FC00h (later adjusted to FBFFh as explained). The 32-bit equivalent would be FFFF_FBFF - but that won't fit inside the 20-bit Limit field! And specifying it as Page-granular with the Gran flag won't help either - 20 bits means either F_FFFFh or F_FFFEh, which when shifted would become FFFF_FFFFh and FFFF_EFFFh respectively.

I find everything to do with 32-bit (Big) Expand Down Segments humorous:

  • A Page-granular Limit of F_FFFFh defines a Segment whose highest inaccessible address is FFFF_FFFFh - in other words, nothing is accessible!
  • A byte-granular Limit of 0_FFFFh defines a Segment whose lowest accessible address is 64 kiB - and everything above!

In other words, you can specify the size of a Stack to the byte - as long as you want it somewhere between 3.999 and 4.0 GiB in size. If you only want a 6 kiB Stack though, you either have to:

  • Use the Gran bit, rounding off the size to 4 kiB chunks (4 or 8 kiB instead of 6); or
  • Use a 16-bit entry (Big=0, using SP rather then ESP) with a maximum Limit of only 64 kiB. Sizes between 64 kiB and 1 MiB that can be specified to the byte in Expand Up Segments aren't available with Expand Down ones.

You can define a higher Limit with Big=0 Segments, but that is counter-productive, since SP won't store anything there. (Hey, but wait: that's a perfect place for Thread-Local Storage! Hmmm.... I'm going to have to experiment!)

Related Articles