x86-64 TUTORIAL: DOUBLY LINKED LIST

THE LABOUCHERE SYSTEM

The Labouchere system for roulette is played as follows. Write down a list of numbers, usually 1, 2, 3, 4. Bet the sum of the first and last, i.e. 1 + 4 = 5, on red. If you win, delete the first and last numbers from the list. If you lose, add the amount that you last bet to the end of the list. Then use the new list and bet the sum of the first and last numbers (if there is only one number, bet that amount). Continue until your list becomes empty. You will see that, if this happens, you will always win the sum 1 + 2 + 3 + 4 = 10, of the original list. The below program simulates this system. Execute the program, and see if you always win!

In Las Vegas, a roulette wheel has 38 slots numbered 0, 00, 1, 2, …,

The 0 and 00 slots are green, and half of the remaining slots are red and half are black. A croupier spins the wheel and throws in an ivory ball. The probability of the ball landing on red when the wheel stops is 18/38. So, we use a random number generator function like rand() and generate random positive integers between 1 and 38. If the integer is less than 19, we consider that the ball landed on the red, and the bet was won, otherwise the bet was lost.’,

The basic algorithm is as follows:

while (LIST != empty) {
     get_colour_result_from_roulette_spin();
     if (bet was won)
        remove head and tail from the list;
     else
        add the total amount that was bet to tail of the list;
}

The program written in C can be found here.

There are two ways (maybe more) in which the above algorithm can be written in assembly.

One approach is to use variables or storage in the data section or the stack (as in the C program above) to hold the head and tail of the linked list. This approach is easier to code and easier to understand as well. However, there is another, more intricate, approach in which no variables have been used and almost all the registers of the AMD64 architecture have been used (We have 16 general purpose registers, so might as well use some of them).

The first approach is given below. The all register usage approach can be found here.

We introduce the concept of accessing linked lists and their elements in this piece of code and also some floating point instructions. Linked lists are accessed in the same ways as arrays are accessed. The linked list used here is a doubly-connected linked list. Each element of the linked list has a previous and a next element and a data element. The linked list can be accessed as long as the head of the link list is known. For traversing the link list, we just need to use the next element pointer, check whether it is NULL and can move to the next element in the list. To traverse the list backwards, we need to know the tail of the link list, and then use its previous element pointer to access the list backwards.

In the below code, every element of the link list is of size 24 bytes, with the first 8 bytes pointing to the data stored in the list, the next 8 bytes pointing to the address of the previous element in the list and the last 8 bytes pointing to the next element in the list. The accessing of the details of any element of the list is very easily done by knowing the offset (required for accessing the data, previous or next pointer of the element of the list) as can be seen in the code below.

Floating point instructions are required here because we are using the rand() function provided to us by the C standard library to generate a random number between 1 and 38, inclusive. The function rand() returns a pseudo-random integer (4-byte) between 0 and RAND_MAX, so we scale that to between 1 and 38. The explanation of each floating point instruction is given next to its usage in the code below. While loading integers as floating points into the Floating Point Unit (FPU), we have to use a memory location instead of a general purpose register. Maybe it is because in earlier x86 computers, the FPU (or x87 unit) was separate processor from the x86 processor unlike today where it is integrated. We could also use the XMM0-15 registers to do the floating point calculations. FIXME: XMM code for get_bet().

We also demonstrate loop unrolling while creating the first 4 elements of the linked list. Instead of using a while loop or a for loop, we create all the 4 elements one by one explicitly.

; struct Link {
;   int data;   
;   struct Link* prev;
;   struct Link* next;
; };
%define data_offset     0
%define prev_offset     8
%define next_offset     16
%define size_of_list    24

%define RAND_MAX        2147483647

%define sys_getpid      39

section .rodata
    prompt1 db "Winnings: %ld in %ld chances.",10,0
    int_format0 db  "win : %ld ",10,0
    int_format1 db "loss: %ld ",10,0
    int_format2 db "AMT: %ld ",10,0

section .bss
    head    resq    1
    tail    resq    1

section .text
    global  get_bet, get_amt, add_to_list, remove_from_list
    global  main
    extern  printf
    extern  srand, rand           ; used for getting the random number
    extern  malloc, free          ; used for creating and deleting members for link list

    main:
        ; prologue with making space for temporary variables on stack
        push    rbp
        mov     rbp, rsp
        push    rbx
        push    r12
        push    r13
        push    r14
        push    r15
        pushf
        ; do we need stack variables ?
        ; let us use the multitude of registers available to us
        ; let R14 be the winnings at the end
        ; let R15 be the number of chances
        ; let R12 point to the head of the list
        ; let R13 point to the tail of the list
        xor     r12, r12
        xor     r13, r13
        xor     r14, r14
        xor     r15, r15

        ; First fill in the first 4 elements of the list
        ; instead of using a for loop, we use loop-unrolling
        ; here since it will execute faster than branch instructions
        xor     rax,rax
        mov     rdi, size_of_list
        call    malloc
        mov     r12, rax
        mov     [r12 + data_offset], dword 1             ; the first element holds the value 1
        xor     rax, rax
        mov     [r12 + prev_offset], rax
        mov     rdi, size_of_list
        call    malloc
        mov     [r12 + next_offset], rax
        mov     [rax + data_offset], dword 2             ; the next element holds the value 2
        mov     [rax + prev_offset], r12
        push    rax
        mov     rdi, size_of_list
        xor     rax, rax
        call    malloc
        pop     r8
        mov     [r8 + next_offset], rax
        mov     [rax + data_offset], dword 3             ; the next element holds the value 3
        mov     [rax + prev_offset], r8
        push    rax
        mov     rdi, size_of_list
        xor     rax, rax
        call    malloc
        pop     r8
        mov     [r8 + next_offset], rax
        mov     r13, rax
        mov     [r13 + data_offset], dword 4             ; the last element holds the value 4
        mov     [r13 + prev_offset], r8
        xor     rax, rax
        mov     [r13 + next_offset], rax

        mov     [head], r12                     ; move the address of the first element into the variable "head"
        mov     [tail], r13                     ; move the address of the last element into the variable "tail"

        ; giving a dynamic seed to the random number generator by using the PID of the process.
        mov     rax, sys_getpid
        syscall
        mov     rdi, rax
        call    srand
        ; main while loop 
while_loop:
        mov     r12, [head]
        mov     r13, [tail]
        ; if (head != NULL)
        cmp     r12,0
        jz      print_result
        ; if (tail != NULL)
        cmp     r13,0 
        jz      print_result
        ; add the data in the first and last elements
        call    get_amt
        ; store in RBX. Remember RBX is not modified by any function that is called as per the ABI.
        mov     rbx, rax
        ; increment the number of chances
        inc     r15
      
        ; print the amount just calculated (not required but useful for debugging)
        mov     rsi, rax
        mov     rdi, dword int_format2
        xor     rax, rax
        call    printf

        ; get the result of the roulette wheel rotation.
        call    get_bet
        ; if the result is a Red colour 
        cmp     rax, 18
        jge     loss_block
win_block:
        ; add the amount won to the winnings
        add     r14, rbx
        ; remove the end points of the list
        call    remove_from_list

        ; print the current value of winnings (not required but useful for debugging)
        mov     rsi, r14
        mov     rdi, dword int_format0
        xor     rax, rax
        call    printf
        ; return back to while loop
        jmp     while_loop
loss_block:
        ; subtract the amount lost from the winnings
        sub     r14, rbx
        ; add the amount lost as an element to the list
        mov     rdi, rbx
        call    add_to_list
      
        ; print the current value of winnings (not required but useful for debugging)      
        mov     rsi, r14
        mov     rdi, dword int_format1
        xor     rax, rax
        call    printf

        ; return back to while loop
        jmp     while_loop

print_result:
        ; print the final results: total winnings and the number of chances required.
        mov     rdx, r15
        mov     rsi, r14
        mov     rdi, dword prompt1
        xor     rax, rax
        call    printf
        ; epilogue with return value 0
        xor     rax,rax
        popf
        pop     r15
        pop     r14
        pop     r13
        pop     r12
        pop     rbx
        mov     rsp, rbp
        pop     rbp
        ret

remove_from_list:
        push    rbp
        mov     rbp, rsp
        ; if head == tail, then make the list empty
        mov     rdi, [head]
        cmp     rdi, [tail]
        je      head_eq_tail
        cmp     rdi, 0
        je      head_eq_tail
        ; else set head = head->next and free the original head.
        mov     rcx, [rdi + next_offset]
        mov     [head], rcx
        cmp     rcx, 0
        je      head_prev_0
        ; set the new head->prev = NULL;
        xor     rdx, rdx
        mov     [rcx + prev_offset],rdx
head_prev_0:
        call    free ; head is already in rdi 
head_eq_tail:
        mov     rdi, [tail]
        cmp     rdi, 0 
        je      ret_frm_remvfrmlist
        mov     rcx, [rdi + prev_offset]
        mov     [tail], rcx
        ; if tail != NULL, set tail = tail->prev and free the original tail
        cmp     rcx, 0
        je      tail_next_0
        xor     rdx, rdx
        mov     [rcx + next_offset], rdx
tail_next_0:
        call    free
ret_frm_remvfrmlist:
        mov     rsp, rbp
        pop     rbp
        ret

add_to_list:
        push    rbp
        mov     rbp, rsp
        ; data to be added is in RDI 
        push    rdi
        ; create a new element of the list 
        mov     rdi, size_of_list
        call    malloc
        pop     rdi
        ; set the new element as the tail of the list
        ; and assign the data in RDI as the new element\'s data value
        mov     rsi, [tail]
        mov     [rsi + next_offset], rax
        cmp     rax, 0
        je      ret_frm_add2list
        mov     [rax + data_offset], rdi
        xor     rdx, rdx
        mov     [rax + next_offset], rdx
        mov     [rax + prev_offset], rsi
        mov     [tail], rax
ret_frm_add2list:
        mov     rsp, rbp
        pop     rbp
        ret

get_amt:
        push    rbp
        mov     rbp, rsp
        ; add the data elements in the head and tail of the list if head!=tail
        ; if head == tail, then just return the data element.
        mov     rsi, [tail]
        mov     rdi, [head]
        mov     rax, [rdi + data_offset]
        xor     rcx, rcx
        cmp     [rdi + next_offset],rcx
        jz      ret_frm_getamt
        add     rax, [rsi + data_offset]
ret_frm_getamt:
        mov     rsp, rbp
        pop     rbp
        ret

get_bet:
        push    rbp
        mov     rbp, rsp
        ; a roulette wheel has 38 different possibilities
        ; we are trying to return the value (int)(rand()*38.0/(RAND_MAX+1.0))
        ; which is a random number between 1 and 38.

        push    38
        push    RAND_MAX
        call    rand

        ; we get the random number and place it in RAX. Then we push RAX on the stack
        push    rax
        ; we then load RAX into the FPU\'s top register ST(0).  
        ; FILD is floating point integer load instruction, but it can load the integer only from a memory
        ; location and not a general purpose register, which is why we pushed RAX on the stack.

        fild    qword [rbp - 24]            ; RAX is on ST(0)
        fild    qword [rbp - 8]             ; 38 is on ST(0) and RAX is on ST(1)

        ; multiply ST(0)  and ST(1) and pop the stack and store the result in ST(0)
        fmulp   st1
        fild    qword [rbp - 16]            ; RAND_MAX is on ST(0)
        fld1                                ; load the integer 1 on ST(0)
        faddp   st1                         ; add RAND_MAX and 1 and pop the stack, store result in ST(0).
                                            ; Previous result on ST(1)
        fdivp   st1                         ; divide the ST(1) from ST(0), pop the stack and store result in ST(0)
        fistp   qword [rbp - 24]            ; Load the integer result back on to the memory location
        mov     rax, [rbp -24]              ; return the result
        mov     rsp, rbp
        pop     rbp
        ret

If the above assembly program was called roulette.asm then executing the following statements would create the executable roulette.out.

$ yasm -f elf64 roulette.asm
$ ld -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 \
      /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o \
      roulette.o  /usr/lib/x86_64-linux-gnu/crtn.o -lc -o roulette.out

Download roulette.asm.

x86-64 TUTORIAL: DOUBLY LINKED LIST

THE LABOUCHERE SYSTEM

SUPPORT THIS SITE