HOWTO use NASM function for Windows 64-bits

Writing 64-bits assembler functions for Windows is quite different from the ones written for Linux or even for Windows 32 bits.

Even when using NASM for a simplified approach, you shall care about a few details.

First steps

The first step is to add the following lines at the very beginning of the .asm file:

64 bits
default rel

We need the ‘default rel’ to avoid any linker problem accessing items in .data and/or .bbs sections.
The alternative is to add rel keyword to any memory access for example:

 mov rax,[rel _my_data]

The second step is to define ‘section .data‘ and/or ‘section .bbs‘ with all their contents (if present).

The third step is to define ‘section .text‘ to write our functions.

At this point, things begin to differ from 32-bits conventions or even 64 bits Linux counterpart.

Function names

The C function name doesn’t have any decoration to apply, or to be aware.

For example, C declaration:

extern int MySampleProc ();

NASM declaration:

global MySampleProc
MySampleProc:
%push proc_ctx
%stacksize flat64
    <... my code here ...>
    ret
%pop

Function parameters

The function parameters are passed in a different way (see this link for further details).

From 1st to 4th…

The first 4 parameters are always set to the registers below:

  • any ptr orĀ  64 to 8 bits integer values are set in RCX, RDX, R8, R9 registers,
    (note that, for any parameter smaller than 64 bits, we shall consider the unused part of the register as uninitialized/void)
  • any floating point values are set in XMM0, XMM1, XMM2, XMM3,
  • any other values types are passed by reference.

The parent function shall always reserve a 32 bytes ‘shadow space‘ right before the return address:

  • [RSP + 0x08] shadow space (32 bytes)
  • [RSP + 0x00] return address (8 bytes)

This area might be used as a free storage area by the child function.

For example, C declaration:

extern int MySampleProc (unsigned p1, void *p2, int p3);

NASM declaration:

global MySampleProc
MySampleProc:
%push proc_ctx
%stacksize flat64
; rcx = p1
; rdx = p2
; r8  = p3
%arg _w64_shadow_space:YWORD

    ; set the stack frame
    push rbp
    mov rbp,rsp
    <... my code here ...>
    ; destroy the stack frame
    leave
    ret
%pop

Note that simple functions that:

  • have 4 or fewer params,
  • don’t have local variables,
  • don’t call any C-conventioned functions

might avoid setting up the stack frame.

from 5th to all other params

The 5th and further parameters are stored in the stack almost following the C convention after the shadow area.

The parent function always reserves a 32 bytes shadow area right before the return address:

  • [RSP + …] the 6th param…
  • [RSP + 0x28] the 5th param
  • [RSP + 0x08] shadow space (32 bytes)
  • [RSP + 0x00] return address (8 bytes)

For example, C declaration:

extern int MySampleProc (unsigned p1, void *p2, int p3, const void *p4, unsigned p5);

NASM declaration:

global MySampleProc
MySampleProc:
%push proc_ctx
%stacksize flat64
; rcx = p1
; rdx = p2
; r8  = p3
; r9  = p4
%arg _w64_shadow_space:YWORD, p5:QWORD

    ; set the stack frame
    push rbp
    mov rbp,rsp
    <... my code here ...>
    ; destroy the stack frame
    leave
    ret
%pop

Varagrs function parameters

They shall use the same rules above (i.e. the shadow area, the first 4 params in the proper register, and the others in the stack).

Local variables

If we need to define any local variable, it is working as usual, caring the 8 bytes alignment of RSP register (i.e. the sum of the used bytes shall be a multiple of 8).

Return values

For scalar types (pointers, integers from 8 to 64 bits), the value shall be stored in RAX without initializing any unused portion of the register.

For any non-scalar types (floats, double, and vector), the return value shall be stored in XMM0.

Volatile registers

All registers shall be considered volatile with the exceptions listed below:

  • RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15,
  • and XMM6XMM15

These registers above shall always be saved before being used.

Volatile registers shall be saved only if your code calls a procedure with the very same assumptions.