JWasm

From OSDev.wiki
Jump to navigation Jump to search

JWasm Macro Assembler is an x86 assembler that targets 16, 32 and 64 bit platforms. JWasm is designed as a MASM-compatible assembler and is available under the Sybase Open Watcom Public License. It produces binaries for the DOS, Windows, Linux, OS/2 and FreeBSD operating systems. JWasm is an almost complete rewrite of the earlier Watcom assembler Wasm. JWasm is written in portable C and has been successfully tested with the Open Watcom development environment, the Microsoft Visual Studio family of development tools, the GNU (GCC) compiler and others. It is currently being upgraded by Japheth.

History

JWasm is an upgrade of the earlier Open Watcom assembler Wasm. JWasm has been extensively rewritten to modernize, extend capacity and add additional platform support to it. Among its design targets is a very high level of MASM compatibility. Its initial release is dated 05/20/2008 as v1.7. The current version as of 1/19/2010 is v2.02, adding 64 bit capabilities. It is actively being updated to support the latest operating systems.

Usage

JWasm conforms to the Microsoft Macro Assembler notation and uses the standard MASM documentation and later as a technical reference.

Abbreviated Notation

This notation is a fully specified format which occurs in the following form:

    mov eax, DWORD PTR [edi]

Over time, the parsers in assemblers have improved to the stage where if the assembler can recognize the size of the data then the SIZE specifier may be omitted as such.

    mov eax, [edi]

This allows for clearer code that is easier to read. However, there are some contexts where the assembler cannot independently determine the data size; For example, if the source operand is a memory operand. In this situation the historical data SIZE specifiers must be used. The following is an example of this situation.

    movzx eax, [esi]            ; generates an error - data SIZE cannot be determined by the assembler
    movzx eax, BYTE PTR [esi]   ; zero extend a BYTE into the 32 bit EAX register

OFFSET Operator

JWasm's syntax makes a distinction between fixed and transient addressing using the OFFSET operator. Data written in either the initialised or uninitialised data sections is a known ADDRESS at assembly time, as are code labels, all of which are referenced by the OFFSET operator. Transient addressing is performed with the normal Intel mnemonics for reading the stack within a procedure.

For a corresponding data entry in the initialised data section,

    textitem db "This is a text item",0

This data entry can be addressed in the following manner.

    mov eax, OFFSET textitem

Transient Stack Addressing

Operating systems provide memory for the area of memory referred to as the stack. Under x86 hardware, the stack is the main method of transferring arguments to procedures. Arguments are normally placed on the stack by the PUSH mnemonic in the following form. This example assumes the STDCALL calling convention and 32 bit data size.

    push arg3
    push arg2
    push arg1
    call FunctionName

The CALL mnemonic pushed the return address onto the stack then branches to the address of the named procedure. If the procedure has a stack frame where the stack pointer register ESP is stored in the base pointer register EBP the first argument for the procedure occurs at address [ebp+8]. While this form of mnemonic notation can be written by experienced assembler programmers, the assembler provides a naming method to remove an un-necessary level of abstraction from writing code of this type.

The programmer can use the name of the argument in the place of the direct [EBP+displacement] notation to make the code more readable with no loss of performance. When the programmer needs to use the ADDRESS of a transient stack variable (normally referred to as a LOCAL variable) they have a number of methods. In a prototyped function call they can use the ADDR operator to obtain the address of a LOCAL variable. Alternatively they can use the direct Intel mnemonic LEA to load the effective address of the variable into a register:

    lea eax, named_local_variable

Square Brackets

JWasm, like MASM, uses named variables to represent both fixed and transient addresses. Square brackets are used around expressions to denote that the contents are a memory operand. Programmers coming from a different background where square brackets are used as general ADDRESS indicators have at time had problems with this notation difference but JWasm tolerates the use of square brackets around named variables by simply ignoring them.

There is some flexibility in how square brackets can be used in historical Intel notation compatible assemblers.

    mov eax, [ecx+edx]
    mov eax, [ecx][edx]

Both notations are correct here and in the second example the extra pair of square brackets function as an ADDITION operator.

Limited Type Checking

JWasm supports a pseudo high level notation for creating procedures that perform argument size and count checking. It is part of a system using the PROC ENDP PROTO and INVOKE operators. The PROTO operator is used to define a function prototype that has a matching PROC that is terminated with the ENDP operator. The prototyped procedure can then be called with the INVOKE operator which is protected by the limited size and argument count checking. There is additional notation at a more advanced level for turning off the automatically generated stack frame for the procedure where stack overhead in the procedure call may have an effect with very small procedures. JWasm is also capable of being written completely free of the pseudo high level notation using only bare Intel mnemonics.

Using an example prototype from the 32 bit Windows API function set,

    SendMessage PROTO STDCALL :DWORD,:DWORD,:DWORD,:DWORD
    SendMessage equ <SendMessageA>

The code to call this function using the INVOKE notation is as follows.

    invoke SendMessage,hWin,WM_COMMAND,wParam,lParam

Which is translated exactly to,

    push lParam
    push wParam
    push WM_COMMAND
    push hWin
    call SendMessage

The advantage of the INVOKE method is that it tests the size of the data types and the argument count and generates an assembly time error if the arguments do not match the prototype.

Pseudo High Level Emulation

JWasm conforms to the historical MASM notation in terms of emulating high level control and loop structures.
It supports the .IF block structure,

  .if
    ; ...
  .elseif
    ; ...
  .else
    ; ...
  .endif

It also supports the .WHILE loop structure,

  .while eax > 0
    sub eax, 1
  .endw

And the .REPEAT loop structure.

  .repeat
    sub eax, 1
  .until eax < 1

The high level emulation also supports C runtime comparison operators that work according to the same rules as Intel mnemonic comparisons. For the .IF block notation the distinction between SIGNED and UNSIGNED data is handles with a minor data type notation variation where the storage size DWORD which is by default UNSIGNED can also be specified as SDWORD for SIGNED comparison. This data type distinction is only appropriate for the pseudo high level notation as it is unused at the mnemonic level of code where the distinction is determined by the range of conditional evaluation techniques available in the Intel mnemonics.

The combined pseudo high level emulation allows JWasm to more easily interface with the later current operating systems that use a C style application programming interface. Generally the pseudo high level interface is used for non-speed critical code where clarity and readability are the most important factors, speed critical code is usually written in directly in mnemonics.

Pre-processor

The pre-processor in JWasm emulates the capacity in the Microsoft Macro Assembler and for most practical purposes it is near enough to identical. It is an old design dating back to about 1990 When Microsoft introduced the MASM 6.00 series of assemblers that is known to experienced users as quirky and complicated to use for advanced macro designs. Notwithstanding its archaic format it is a reasonably powerful pre-processor with loop techniques, conditional testing, text manipulation commands and the normal text substitution methods associated with arguments passed to the pre-processor.

At its simplest, a macro in JWasm is constructed as follows:

    ItemName MACRO argument1, argument2, argument3:VARARG
      mov argument1, argument2
      mov argument3, argument1
    ENDM

This macro is called as follows,

    ItemName eax, ecx, edx

It is expanded by the pre-processor to,

    mov eax, ecx
    mov edx, eax

Licence

JWasm is licenced under the Sybase Open Watcom Public License and is available for use in environments and projects that are excluded by the Microsoft EULA for MASM. JWasm has no restrictions in writing Open Source software or writing software for non-Microsoft operating systems.

External Links, Reference And Footnotes