## The Labouchere System

The Labouchere system for roulette is played as follows. Write down a list of numbers, usually 1, 2, 3, 4. Bet the sum of the first and last, i.e. 1 + 4 = 5, on red. If you win, delete the first and last numbers from the list. If you lose, add the amount that you last bet to the end of the list. Then use the new list and bet the sum of the first and last numbers (if there is only one number, bet that amount). Continue until your list becomes empty. You will see that, if this happens, you will always win the sum 1 + 2 + 3 + 4 = 10, of the original list. The below program simulates this system. Execute the program, and see if you always win!

In Las Vegas, a roulette wheel has 38 slots numbered 0, 00, 1, 2, ..., 36. The 0 and 00 slots are green, and half of the remaining slots are red and half are black. A croupier spins the wheel and throws in an ivory ball. The probability of the ball landing on red when the wheel stops is 18/38. So, we use a random number generator function like `rand()` and generate random positive integers between 1 and 38. If the integer is less than 19, we consider that the ball landed on the red, and the bet was won, otherwise the bet was lost.

The basic algorithm is as follows:

```while (LIST != empty) {
get_colour_result_from_roulette_spin();
if (bet was won) remove head and tail from the list;
else add the total amount that was bet to tail of the list;
}
```

The program written in C can be found here . There are two ways (maybe more) in which the above algorithm can be written in assembly. One approach is to use variables or storage in the data section or the stack (as in the C program above) to hold the head and tail of the linked list. This approach is easier to code and easier to understand as well. However, there is another, more intricate, approach in which no variables have been used and almost all the registers of the AMD64 architecture have been used (We have 16 general purpose registers, so might as well use some of them).

The first approach is given below. The all register usage approach can be found here .

We introduce the concept of accessing linked lists and their elements in this piece of code and also some floating point instructions. Linked lists are accessed in the same ways as arrays are accessed. The linked list used here is a doubly-connected linked list. Each element of the linked list has a previous and a next element and a data element. The linked list can be accessed as long as the head of the link list is known. For traversing the link list, we just need to use the next element pointer, check whether it is `NULL` and can move to the next element in the list. To traverse the list backwards, we need to know the tail of the link list, and then use its previous element pointer to access the list backwards.

In the below code, every element of the link list is of size 24 bytes, with the first 8 bytes pointing to the data stored in the list, the next 8 bytes pointing to the address of the previous element in the list and the last 8 bytes pointing to the next element in the list. The accessing of the details of any element of the list is very easily done by knowing the offset (required for accessing the data, previous or next pointer of the element of the list) as can be seen in the code below.

Floating point instructions are required here because we are using the `rand()` function provided to us by the C standard library to generate a random number between 1 and 38, inclusive. The function `rand()` returns a pseudo-random integer (4-byte) between `0` and `RAND_MAX` , so we scale that to between 1 and 38. The explanation of each floating point instruction is given next to its usage in the code below. While loading integers as floating points into the Floating Point Unit (FPU), we have to use a memory location instead of a general purpose register. Maybe it is because in earlier x86 computers, the FPU (or x87 unit) was separate processor from the x86 processor unlike today where it is integrated. We could also use the `XMM0-15` registers to do the floating point calculations. FIXME - `XMM` code for `get_bet()` .

We also demonstrate loop unrolling while creating the first 4 elements of the linked list. Instead of using a while loop or a for loop, we create all the 4 elements one by one explicitly.

```; struct Link {
;   int data;
; };
%define data_offset     0
%define prev_offset     8
%define next_offset     16
%define size_of_list    24

%define RAND_MAX        2147483647

%define sys_getpid      39

section .rodata
prompt1 db "Winnings: %ld in %ld chances.",10,0
int_format0 db  "win : %ld ",10,0
int_format1 db "loss: %ld ",10,0
int_format2 db "AMT: %ld ",10,0

section .bss
tail    resq    1

section .text
global  main
extern  printf
extern  srand, rand           ; used for getting the random number
extern  malloc, free          ; used for creating and deleting members for link list

main:
; prologue with making space for temporary variables on stack
push    rbp
mov     rbp, rsp
push    rbx
push    r12
push    r13
push    r14
push    r15
pushf
; do we need stack variables ?
; let us use the multitude of registers available to us
; let R14 be the winnings at the end
; let R15 be the number of chances
; let R12 point to the head of the list
; let R13 point to the tail of the list
xor     r12, r12
xor     r13, r13
xor     r14, r14
xor     r15, r15

; First fill in the first 4 elements of the list
; instead of using a for loop, we use loop-unrolling
; here since it will execute faster than branch instructions
xor     rax,rax
mov     rdi, size_of_list
call    malloc
mov     r12, rax
mov     [r12 + data_offset], dword 1             ; the first element holds the value 1
xor     rax, rax
mov     [r12 + prev_offset], rax
mov     rdi, size_of_list
call    malloc
mov     [r12 + next_offset], rax
mov     [rax + data_offset], dword 2             ; the next element holds the value 2
mov     [rax + prev_offset], r12
push    rax
mov     rdi, size_of_list
xor     rax, rax
call    malloc
pop     r8
mov     [r8 + next_offset], rax
mov     [rax + data_offset], dword 3             ; the next element holds the value 3
mov     [rax + prev_offset], r8
push    rax
mov     rdi, size_of_list
xor     rax, rax
call    malloc
pop     r8
mov     [r8 + next_offset], rax
mov     r13, rax
mov     [r13 + data_offset], dword 4             ; the last element holds the value 4
mov     [r13 + prev_offset], r8
xor     rax, rax
mov     [r13 + next_offset], rax

mov     [tail], r13                     ; move the address of the last element into the variable "tail"

; giving a dynamic seed to the random number generator by using the PID of the process.
mov     rax, sys_getpid
syscall
mov     rdi, rax
call    srand
; main while loop
while_loop:
mov     r13, [tail]
cmp     r12,0
jz      print_result
; if (tail != NULL)
cmp     r13,0
jz      print_result
; add the data in the first and last elements
call    get_amt
; store in RBX. Remember RBX is not modified by any function that is called as per the ABI.
mov     rbx, rax
; increment the number of chances
inc     r15

; print the amount just calculated (not required but useful for debugging)
mov     rsi, rax
mov     rdi, dword int_format2
xor     rax, rax
call    printf

; get the result of the roulette wheel rotation.
call    get_bet
; if the result is a Red colour
cmp     rax, 18
jge     loss_block
win_block:
; add the amount won to the winnings
; remove the end points of the list
call    remove_from_list

; print the current value of winnings (not required but useful for debugging)
mov     rsi, r14
mov     rdi, dword int_format0
xor     rax, rax
call    printf
; return back to while loop
jmp     while_loop
loss_block:
; subtract the amount lost from the winnings
sub     r14, rbx
; add the amount lost as an element to the list
mov     rdi, rbx

; print the current value of winnings (not required but useful for debugging)
mov     rsi, r14
mov     rdi, dword int_format1
xor     rax, rax
call    printf

; return back to while loop
jmp     while_loop

print_result:
; print the final results: total winnings and the number of chances required.
mov     rdx, r15
mov     rsi, r14
mov     rdi, dword prompt1
xor     rax, rax
call    printf
; epilogue with return value 0
xor     rax,rax
popf
pop     r15
pop     r14
pop     r13
pop     r12
pop     rbx
mov     rsp, rbp
pop     rbp
ret

remove_from_list:
push    rbp
mov     rbp, rsp
; if head == tail, then make the list empty
cmp     rdi, [tail]
cmp     rdi, 0
mov     rcx, [rdi + next_offset]
cmp     rcx, 0
; set the new head->prev = NULL;
xor     rdx, rdx
mov     [rcx + prev_offset],rdx
mov     rdi, [tail]
cmp     rdi, 0
je      ret_frm_remvfrmlist
mov     rcx, [rdi + prev_offset]
mov     [tail], rcx
; if tail != NULL, set tail = tail->prev and free the original tail
cmp     rcx, 0
je      tail_next_0
xor     rdx, rdx
mov     [rcx + next_offset], rdx
tail_next_0:
call    free
ret_frm_remvfrmlist:
mov     rsp, rbp
pop     rbp
ret

push    rbp
mov     rbp, rsp
; data to be added is in RDI
push    rdi
; create a new element of the list
mov     rdi, size_of_list
call    malloc
pop     rdi
; set the new element as the tail of the list
; and assign the data in RDI as the new element's data value
mov     rsi, [tail]
mov     [rsi + next_offset], rax
cmp     rax, 0
mov     [rax + data_offset], rdi
xor     rdx, rdx
mov     [rax + next_offset], rdx
mov     [rax + prev_offset], rsi
mov     [tail], rax
mov     rsp, rbp
pop     rbp
ret

get_amt:
push    rbp
mov     rbp, rsp
; if head == tail, then just return the data element.
mov     rsi, [tail]
mov     rax, [rdi + data_offset]
xor     rcx, rcx
cmp     [rdi + next_offset],rcx
jz      ret_frm_getamt
ret_frm_getamt:
mov     rsp, rbp
pop     rbp
ret

get_bet:
push    rbp
mov     rbp, rsp
; a roulette wheel has 38 different possibilities
; we are trying to return the value (int)(rand()*38.0/(RAND_MAX+1.0))
; which is a random number between 1 and 38.

push    38
push    RAND_MAX
call    rand

; we get the random number and place it in RAX. Then we push RAX on the stack
push    rax
; we then load RAX into the FPU's top register ST(0).
; FILD is floating point integer load instruction, but it can load the integer only from a memory
; location and not a general purpose register, which is why we pushed RAX on the stack.

fild    qword [rbp - 24]            ; RAX is on ST(0)
fild    qword [rbp - 8]             ; 38 is on ST(0) and RAX is on ST(1)

; multiply ST(0)  and ST(1) and pop the stack and store the result in ST(0)
fmulp   st1
fild    qword [rbp - 16]            ; RAND_MAX is on ST(0)
fld1                                ; load the integer 1 on ST(0)
faddp   st1                         ; add RAND_MAX and 1 and pop the stack, store result in ST(0).
; Previous result on ST(1)
fdivp   st1                         ; divide the ST(1) from ST(0), pop the stack and store result in ST(0)
fistp   qword [rbp - 24]            ; Load the integer result back on to the memory location
mov     rax, [rbp -24]              ; return the result
mov     rsp, rbp
pop     rbp
ret
```

If the above assembly program was called `roulette.asm` then executing the following statements would create the executable `roulette.out` .

```\$ yasm -f elf64 roulette.asm
\$ ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib64/crt1.o /usr/lib64/crti.o \
roulette.o  /usr/lib64/crtn.o -lc -o roulette.out
```