# x86-64 TUTORIAL: FACTORIAL WITH RECURSION

 ← x86-64 TUTORIAL: CONDITIONAL OPERATIONS WITHOUT BRANCHING | x86-64 TUTORIAL: DOUBLY LINKED LIST →

The calculation of a factorial of a number can be done using recursion. Below is the algorithm:

``````FACTORIAL(num) {
if (num==0)
return 1;
else
return num * FACTORIAL(num-1);
}
``````

We can do it in assembly in possibly multiple ways. One way is to use the stack to store every return value from `FACTORIAL()` so as to calculate the product. Below is the assembly version of the above algorithm. An `unsigned long` (8 bytes) is used to store the factorial, although ideally factorials for numbers > 20 can be huge and exceed the 64-bit space allotted to them.

I have not figured out how to take care of that yet and that requires Bignum implementation which is out of scope for this tutorial.

``````section .rodata
prompt1 db  "Enter a positive number:",0
num_fmt db  "%lu",0
prompt2 db  "The factorial of %lu is %lu ",10,0

section .text
global  main, factorial
extern  printf, scanf

main:
push    rbp
mov     rbp, rsp
sub     rsp, 8
pushfq
push    rbx
push    r12
push    r13
push    r14
push    r15

mov     rdi, dword prompt1
xor     rax, rax
call    printf
lea     rsi, [rbp-8]
mov     rdi, dword num_fmt
xor     rax,rax
call    scanf

; Calculate Factorial. Result returned in RDX:RAX
mov     rdi, [rbp-8]
xor     rax, rax
call    factorial

; Print result (this part is flawed. will be explained later)
mov     rdx, rax
mov     rsi, [rbp-8]
mov     rdi, dword prompt2
xor     rax, rax
call    printf

pop     r15
pop     r14
pop     r13
pop     r12
pop     rbx
popfq
mov     rsp,rbp
pop     rbp
ret

factorial:
push    rbp
mov     rbp, rsp
; Check if the number in RDI is 0.
; If yes, return with value 1 in RAX as factorial(0).
mov     rax, 1
xor     rdx,rdx
cmp     rdi, 0
je      return_here
; If value in RDI > 0, push on stack for multiplication,
; decrement and call factorial again.
push    rdi
dec     rdi
xor     rax, rax
call    factorial
; Eventually when factorial returns, it will have RAX = 1 when RDI = 0.
; So we pop the stack and do unsigned multiply with RAX and return that value in RDX:RAX.
pop     rsi
mul     rsi
return_here:
mov     rsp, rbp
pop     rbp
ret
``````

As we see above, the code is fairly simple. There are a few points to note however.

• Recursion is handled by use of the stack - we could try not to use the stack, but then we would run out of registers where we would store every value of the number in `RDI` when factorial gets called multiple times. Moreover, it would be very hard to keep track of those registers. The stack gives us an easy way to handle recursion, with the hardware and the LIFO concept coming to our rescue.

• Results need to be in the same register everytime for the recursive return to work - the final result keeps getting multiplied with the numbers from the stack and placed back into the same register. This will keep the factorial function independent of the number being multiplied with.

If we look closely at the factorial function, we see that we do not handle overflow. When the unsigned multiply instruction `MUL` is used, it multiplies `RSI` with `RAX` and places the resulting value in `RDX:RAX`.

But when a particular call to factorial returns, we are always multiplying with the value in `RAX` and neglecting the value in `RDX`. So in cases where the value of the factorial of the input number overflows into `RDX` we get the wrong result.

While printing out the result in the main function, again we do not take care of the overflow and move the value from `RAX` to `RDX` so that the x86-64 ABI rules can be followed. This needs to be fixed and we should be able to print a 128-bit number as the result. This limits the maximum number to which the correct factorial can be calculated to 20.

If the above assembly program was called `factorial.asm` below would be the steps to assembling and linking it into an executable `factorial.out`.

``````\$ yasm -f elf64 factorial.asm
\$ ld -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 \
/usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o \
factorial.o  /usr/lib/x86_64-linux-gnu/crtn.o -lc -o factorial.out
``````