Assembly: Difference between revisions
[unchecked revision] | [unchecked revision] |
m (Source syntax highlighting) |
m (→External Links) |
||
(27 intermediate revisions by 18 users not shown) | |||
Line 1: | Line 1: | ||
⚫ | Down to the very core of computers, every signal sent to any part of the machine is a series of electrical pulses. If the electrical pulse is high, for example 5 volts, that means a binary 1 digit is sent. If the electrical pulse is low, for example 0 volts, that means a binary 0 digit is sent. Put 4 binary digits together and you get a nibble. Put 8 together, you get a byte. Put 16 together, you get a word. Computers understand commands in such segments, but humans would rather not. So, they developed a better way of viewing such information and that was with hexadecimal. First of all, it didn't take 16 characters to show one command and in general, it became more convenient to remember commands like: |
||
== Assembly Language == |
|||
⚫ | Down to the very core of computers, every signal sent to any part of the machine is a series of electrical pulses. If the electrical pulse is high, for example 5 volts, that means a binary 1 digit is sent. If the electrical pulse is low, for example 0 volts, that means a binary 0 digit is sent. Put 4 binary digits together and you get a nibble. Put 8 together, you get a byte. Put 16 together, you get a word. Computers understand commands in such segments, but humans would rather not. So, they developed a better way of viewing such information and that was with hexadecimal. First of all, it didn't take 16 characters to show one command and in general, it became more |
||
90 - No Operation |
90 - No Operation |
||
CC - Break |
CC - Break |
||
Line 6: | Line 5: | ||
CD - Interrupt |
CD - Interrupt |
||
etc. |
etc. |
||
But soon, it was understood that such codes needed to be user |
But soon, it was understood that such codes needed to be user-friendly, so they invented assembly code. With assembly code, you can represent operations with special codes that are more human-readable. For example, JMP jumped, INT interrupted, NOP did no operation, etc. |
||
An assembler is the program that reads an assembly file (usually .asm) and converts it to a binary executable. This is very much akin to a compiler generating a executable program. However, assemblers are considered separately from compilers because compilers hide the low-level details of your system, while assemblers expose you to these details. |
An assembler is the program that reads an assembly file (usually with the extension .asm) and converts it to a binary executable. This is very much akin to a compiler generating a executable program. However, assemblers are considered separately from compilers because compilers hide the low-level details of your system, while assemblers expose you to these details. |
||
The source code written for any assembler is defined to be " |
The source code written for any assembler is defined to be "Assembly language." But that doesn't mean much, since each assembler may require different source code for exactly the same program on an exactly identical platform. |
||
== Assembly Syntax == |
== Assembly Syntax == |
||
In assembly, every instruction (or operation/operator/opcode) is followed by a comma-separated list of parameters (or operands). These operands can be either immediate values (i.e. numbers such as 1, 2, [[Hexadecimal Notation|0x]]16, 0101[[Binary Notation|b]]), or registers, or a memory location. For example, |
In assembly, every instruction (or operation/operator/opcode) is followed by a comma-separated list of parameters (or operands). These operands can be either immediate values (i.e. numbers such as 1, 2, [[Hexadecimal Notation|0x]]16, 0101[[Binary Notation|b]]), or registers, or a memory location. For example, |
||
< |
<syntaxhighlight lang="asm"> |
||
mov eax, 123 |
mov eax, 123 |
||
</syntaxhighlight> |
|||
</source> |
|||
The instruction is mov, the operands are eax and 123. Here eax is a register and 123 is an immediate value. |
The instruction is mov, the operands are eax and 123. Here eax is a register and 123 is an immediate value. |
||
There are two main |
There are two main syntaxes for assembly: the Intel syntax, and the AT&T syntax. |
||
=== Intel Syntax === |
=== Intel Syntax === |
||
Generally speaking, the first operand of an operation is the destination operand, and the other operand (or operands) is the source operand. For example: |
Generally speaking, the first operand of an operation is the destination operand, and the other operand (or operands) is the source operand. For example: |
||
< |
<syntaxhighlight lang="asm"> |
||
mov eax, 123 |
mov eax, 123 |
||
</syntaxhighlight> |
|||
</source> |
|||
The mov instruction takes a value from the source operand, and places it in the destination operand, so the value 123 would be placed into the register eax. The Intel syntax is |
The mov instruction takes a value from the source operand, and places it in the destination operand, so the value 123 would be placed into the register eax. The original Intel syntax is that of the Intel ASM386 assembler (later licensed out to RadiSys but which eventually died off). Since then, many assemblers invented many variations of it. Also, it is used in most online assembly tutorials because the majority of those tutorials were written when [[TASM]] (which uses the Intel syntax) was the dominant assembler. |
||
''Could probably use more information. -- [[User:FMota|FMota]]'' |
|||
=== AT&T Syntax === |
=== AT&T Syntax === |
||
AT&T syntax is the reverse of Intel syntax. The data operated on moves from left to right. Here is the same statement as above in AT&T syntax. |
AT&T syntax is the reverse of Intel syntax. The data operated on moves from left to right. Here is the same statement as above in AT&T syntax. |
||
< |
<syntaxhighlight lang="asm"> |
||
mov $123, %eax |
mov $123, %eax |
||
</syntaxhighlight> |
|||
</source> |
|||
There are also some other differences with operands: |
There are also some other differences with operands: |
||
*Literal values such as 123 are prefixed with '$' (see example above). |
* Literal values such as 123 are prefixed with '$' (see example above). |
||
*Memory locations have no prefix: mov 123, %eax |
* Memory locations have no prefix: ''mov 123, %eax'' moves the value stored at 123 to eax. |
||
*Registers are prefixed with '%' |
* Registers are prefixed with '%' |
||
*When using a register as a pointer, it is put in parenthesis: mov (%eax), %ebx |
* When using a register as a pointer, it is put in parenthesis: ''mov (%eax), %ebx'' moves the value stored at the pointer stored in eax to ebx. |
||
*When using a statement with memory locations (with no register present), a suffix is needed after the instruction. For example: movl $123, 123. The 'l' (stands for long) is needed to tell the assembler that the values are 32 bits. |
* When using a statement with memory locations (with no register present), a suffix is needed after the instruction. For example: movl $123, 123. The 'l' (stands for long) and is needed to tell the assembler that the values are 32 bits. |
||
*Pointers can be offset by constants by prefixing them: 4(%eax) points to 4+%eax. |
* Pointers can be offset by constants by prefixing them: 4(%eax) points to 4+%eax. |
||
*Pointers can be offset by another register by writing them as: (%eax,%ebx) points to %eax+%ebx |
* Pointers can be offset by another register by writing them as: (%eax,%ebx) which points to %eax+%ebx |
||
*Finally you can combine it all and throw in a scale if you want: 4(%eax,%ebx,2) would be 4+%eax+%ebx*2 |
* Finally you can combine it all and throw in a scale if you want: 4(%eax,%ebx,2) would be 4+%eax+%ebx*2 |
||
== |
== See Also == |
||
* [[GAS]] (AT&T or Intel syntax) |
|||
* [[NASM]] (Intel syntax) |
|||
* [[FASM]] (Intel syntax) |
|||
* [[MASM]] |
|||
* [[TASM]] |
|||
The two most widely used assemblers nowadays are [[GAS]] and [[NASM]]. See [[Tool Comparison]] for comparisons between common OSDev tools such as these. |
|||
== |
=== Articles === |
||
* [[:Category:Assemblers|Assemblers]] |
|||
===Articles=== |
|||
*[[Inline Assembly]] |
* [[Inline Assembly]] |
||
*[[Opcode syntax]] for more detail on AT&T syntax differences |
* [[Opcode syntax]] for more detail on AT&T syntax differences |
||
* [[Learning 80x86 Assembly]]: list of freely available online resources to help with learning 80x86 assembly language programming |
|||
*[[Real mode assembly bare bones]] for a quick starter on writing a kernel in assembly |
|||
*[[32-bit assembler bare bones]]: same as above, but for 32-bit protected mode. |
|||
===External Links=== |
=== External Links === |
||
*[http://www.gnu.org/software/binutils/ GNU Binutils] - |
* [http://www.gnu.org/software/binutils/ GNU Binutils] - A collection of free softwares including the GNU assembler (gas) |
||
*[http:// |
* [http://sourceware.org/binutils/docs/as gas manual] - Official manual for the GNU assembler (gas) |
||
*[http:// |
* [http://www.nasm.us/ NASM] - Official site of Netwide Assembler:NASM |
||
*[[wikibooks: |
* [[wikibooks:X86 Assembly|x86 Assembly]] - A wikibook on writing assembly for x86 based PCs |
||
*[http:// |
* [http://www.plantation-productions.com/Webster/www.artofasm.com/index.html The Art of Assembly Language Programming], online book, 16-bit DOS edition. (Later editions focus on a high-level assembler dialect created by the author.) |
||
*[http://www.drpaulcarter.com/pcasm/ PC ASM], online book, 32 bit protected mode assembly on x86 |
* [http://www.drpaulcarter.com/pcasm/ PC ASM], online book, 32 bit protected mode assembly on x86 |
||
* [http://ref.x86asm.net x86 Opcode and Instruction Reference], a database of the instructions and opcodes as free viewable, searchable, and downloadable XML files. They also sell a hardcopy version. |
|||
*[http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471375233.html Assembly Language Step by Step], book |
|||
[[Category:Assembly]] |
[[Category:Assembly]] |
||
[[Category:Languages]] |
|||
[[de:Assembler]] |
Latest revision as of 00:46, 16 June 2024
Down to the very core of computers, every signal sent to any part of the machine is a series of electrical pulses. If the electrical pulse is high, for example 5 volts, that means a binary 1 digit is sent. If the electrical pulse is low, for example 0 volts, that means a binary 0 digit is sent. Put 4 binary digits together and you get a nibble. Put 8 together, you get a byte. Put 16 together, you get a word. Computers understand commands in such segments, but humans would rather not. So, they developed a better way of viewing such information and that was with hexadecimal. First of all, it didn't take 16 characters to show one command and in general, it became more convenient to remember commands like:
90 - No Operation CC - Break E9 - 16 bit jump CD - Interrupt etc.
But soon, it was understood that such codes needed to be user-friendly, so they invented assembly code. With assembly code, you can represent operations with special codes that are more human-readable. For example, JMP jumped, INT interrupted, NOP did no operation, etc.
An assembler is the program that reads an assembly file (usually with the extension .asm) and converts it to a binary executable. This is very much akin to a compiler generating a executable program. However, assemblers are considered separately from compilers because compilers hide the low-level details of your system, while assemblers expose you to these details.
The source code written for any assembler is defined to be "Assembly language." But that doesn't mean much, since each assembler may require different source code for exactly the same program on an exactly identical platform.
Assembly Syntax
In assembly, every instruction (or operation/operator/opcode) is followed by a comma-separated list of parameters (or operands). These operands can be either immediate values (i.e. numbers such as 1, 2, 0x16, 0101b), or registers, or a memory location. For example,
mov eax, 123
The instruction is mov, the operands are eax and 123. Here eax is a register and 123 is an immediate value.
There are two main syntaxes for assembly: the Intel syntax, and the AT&T syntax.
Intel Syntax
Generally speaking, the first operand of an operation is the destination operand, and the other operand (or operands) is the source operand. For example:
mov eax, 123
The mov instruction takes a value from the source operand, and places it in the destination operand, so the value 123 would be placed into the register eax. The original Intel syntax is that of the Intel ASM386 assembler (later licensed out to RadiSys but which eventually died off). Since then, many assemblers invented many variations of it. Also, it is used in most online assembly tutorials because the majority of those tutorials were written when TASM (which uses the Intel syntax) was the dominant assembler.
AT&T Syntax
AT&T syntax is the reverse of Intel syntax. The data operated on moves from left to right. Here is the same statement as above in AT&T syntax.
mov $123, %eax
There are also some other differences with operands:
- Literal values such as 123 are prefixed with '$' (see example above).
- Memory locations have no prefix: mov 123, %eax moves the value stored at 123 to eax.
- Registers are prefixed with '%'
- When using a register as a pointer, it is put in parenthesis: mov (%eax), %ebx moves the value stored at the pointer stored in eax to ebx.
- When using a statement with memory locations (with no register present), a suffix is needed after the instruction. For example: movl $123, 123. The 'l' (stands for long) and is needed to tell the assembler that the values are 32 bits.
- Pointers can be offset by constants by prefixing them: 4(%eax) points to 4+%eax.
- Pointers can be offset by another register by writing them as: (%eax,%ebx) which points to %eax+%ebx
- Finally you can combine it all and throw in a scale if you want: 4(%eax,%ebx,2) would be 4+%eax+%ebx*2
See Also
Articles
- Assemblers
- Inline Assembly
- Opcode syntax for more detail on AT&T syntax differences
- Learning 80x86 Assembly: list of freely available online resources to help with learning 80x86 assembly language programming
External Links
- GNU Binutils - A collection of free softwares including the GNU assembler (gas)
- gas manual - Official manual for the GNU assembler (gas)
- NASM - Official site of Netwide Assembler:NASM
- x86 Assembly - A wikibook on writing assembly for x86 based PCs
- The Art of Assembly Language Programming, online book, 16-bit DOS edition. (Later editions focus on a high-level assembler dialect created by the author.)
- PC ASM, online book, 32 bit protected mode assembly on x86
- x86 Opcode and Instruction Reference, a database of the instructions and opcodes as free viewable, searchable, and downloadable XML files. They also sell a hardcopy version.