Babystep3: Difference between revisions
[unchecked revision] | [unchecked revision] |
No edit summary |
m (Bot: Replace deprecated source tag with syntaxhighlight) |
||
(7 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
__NOTOC__ |
|||
{{Infobox Tutorial |
|||
| name=Babystep3: A look at machine code |
|||
| prev=[[Babystep2]] |
|||
| next=[[Babystep4]] |
|||
}} |
|||
=== A look at machine code (opcodes, prefix, etc) === |
=== A look at machine code (opcodes, prefix, etc) === |
||
⚫ | |||
<syntaxhighlight lang="asm"> |
|||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
</syntaxhighlight> |
|||
Don't partycopy to disk. Just open this in DEBUG (for Windows, Hexdump will be nice for Linux users) |
Don't partycopy to disk. Just open this in DEBUG (for Windows, Hexdump will be nice for Linux users) |
||
Line 21: | Line 31: | ||
In other words, there is a unique register number (CX=1) added to the base opcode value 'B8' to give 'B9', which you see in the dump. |
In other words, there is a unique register number (CX=1) added to the base opcode value 'B8' to give 'B9', which you see in the dump. |
||
But watch what happens when you replace CX with |
But watch what happens when you replace CX with ECX: |
||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
<syntaxhighlight lang="asm"> |
|||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
</syntaxhighlight> |
|||
0AE3:0100 66 B9 FF 00 00 00 00 etc... |
0AE3:0100 66 B9 FF 00 00 00 00 etc... |
||
The '66' is an Operand Size Override Prefix generated by the assembler when there is a discrepancy with the default mode, which when NASM assembles binary files, it is 16-bit. The same thing happens if you use the BITS directive to change the mode, but it differs from the size of the operand: |
The '66' is an Operand Size Override Prefix generated by the assembler when there is a discrepancy with the default mode, which when NASM assembles binary files, it is 16-bit. The same thing happens if you use the BITS directive to change the mode, but it differs from the size of the operand: |
||
<syntaxhighlight lang="asm"> |
|||
<pre> |
|||
[BITS 32] |
[BITS 32] |
||
mov cx, 0xFF |
mov cx, 0xFF |
||
Line 38: | Line 49: | ||
db 0x55 |
db 0x55 |
||
db 0xAA |
db 0xAA |
||
</syntaxhighlight> |
|||
</pre> |
|||
This doesn't actually change the mode of the processor, but it does help it interpret the subsequent bytes. |
This doesn't actually change the mode of the processor, but it does help it interpret the subsequent bytes. |
||
Line 46: | Line 57: | ||
Address encoding is a bit more complicated |
Address encoding is a bit more complicated |
||
<syntaxhighlight lang="asm"> |
|||
<pre> |
|||
mov cx, [temp] |
mov cx, [temp] |
||
Line 53: | Line 64: | ||
db 0x55 |
db 0x55 |
||
db 0xAA |
db 0xAA |
||
</syntaxhighlight> |
|||
</pre> |
|||
0AE3:0100 8B 0E 04 00 99 00 00 00 etc... |
0AE3:0100 8B 0E 04 00 99 00 00 00 etc... |
||
Line 62: | Line 73: | ||
See Section "17.2.1 ModR/M and SIB Bytes" here: [http://www.baldwin.cx/386htm/s17_02.htm http://www.baldwin.cx/386htm/s17_02.htm] |
See Section "17.2.1 ModR/M and SIB Bytes" here: [http://www.baldwin.cx/386htm/s17_02.htm http://www.baldwin.cx/386htm/s17_02.htm] |
||
The rules for interpreting this byte, |
The rules for interpreting this byte are complicated, because it contains different fields (see Fig. 17-2), but fortunately Table 17-2 makes it easier. Look up '0E' and you will see at the left it says "disp16" which means that the operand will be interpreted as a 16-bit offset. |
||
'04 00' is the 16-bit offset. If you are confused why 0x0004 is backwards, it's because the Intel processor is "little endian". The "little" end of the number comes first. |
'04 00' is the 16-bit offset. If you are confused why 0x0004 is backwards, it's because the Intel processor is "little endian". The "little" end of the number comes first. |
||
Line 71: | Line 82: | ||
This stuff matters for a bunch of reasons, but since we will be making the switch from 16-bit real mode to 32-bit protected mode, our code is going to also change. And being aware of what a dump looks like can prevent a lot of grief. |
This stuff matters for a bunch of reasons, but since we will be making the switch from 16-bit real mode to 32-bit protected mode, our code is going to also change. And being aware of what a dump looks like can prevent a lot of grief. |
||
[[Category:Babystep]] |
Latest revision as of 04:12, 9 June 2024
Babystep3: A look at machine code | |
Tutorial | |
Previous | Next |
Babystep2 | Babystep4 |
A look at machine code (opcodes, prefix, etc)
; nasmw encode.asm -f bin -o encode.bin
mov cx, 0xFF
times 510-($-$$) db 0
db 0x55
db 0xAA
Don't partycopy to disk. Just open this in DEBUG (for Windows, Hexdump will be nice for Linux users)
C:\osdev\debug encode.bin
Type in 'd' after the '-' to see the binary file. ('?' will give you help, 'q' will quit). You will see something like this:
0AE3:0100 B9 FF 00 00 00 00 etc...
Look up the opcode for MOV here: http://www.baldwin.cx/386htm/MOV.htm See Section "17.2.2.1 Opcode" here: http://www.baldwin.cx/386htm/s17_02.htm
In other words, there is a unique register number (CX=1) added to the base opcode value 'B8' to give 'B9', which you see in the dump.
But watch what happens when you replace CX with ECX:
mov ecx, 0xFF
times 510-($-$$) db 0
db 0x55
db 0xAA
0AE3:0100 66 B9 FF 00 00 00 00 etc...
The '66' is an Operand Size Override Prefix generated by the assembler when there is a discrepancy with the default mode, which when NASM assembles binary files, it is 16-bit. The same thing happens if you use the BITS directive to change the mode, but it differs from the size of the operand:
[BITS 32]
mov cx, 0xFF
times 510-($-$$) db 0
db 0x55
db 0xAA
This doesn't actually change the mode of the processor, but it does help it interpret the subsequent bytes.
Addresses
Address encoding is a bit more complicated
mov cx, [temp]
temp db 0x99
times 510-($-$$) db 0
db 0x55
db 0xAA
0AE3:0100 8B 0E 04 00 99 00 00 00 etc...
- '8B' is the opcode
- '0E' is a ModR/M byte which help the opcode interpretation
See Section "17.2.1 ModR/M and SIB Bytes" here: http://www.baldwin.cx/386htm/s17_02.htm
The rules for interpreting this byte are complicated, because it contains different fields (see Fig. 17-2), but fortunately Table 17-2 makes it easier. Look up '0E' and you will see at the left it says "disp16" which means that the operand will be interpreted as a 16-bit offset.
'04 00' is the 16-bit offset. If you are confused why 0x0004 is backwards, it's because the Intel processor is "little endian". The "little" end of the number comes first.
'99' is of course the value of the byte at 0x0004 (8B is at 0x0000)
Be aware of another prefix called the Address size Override Prefix '67' which the assembler generates when there is a discrepancy just like with '66' above.
This stuff matters for a bunch of reasons, but since we will be making the switch from 16-bit real mode to 32-bit protected mode, our code is going to also change. And being aware of what a dump looks like can prevent a lot of grief.