Factorial

The calculation of a factorial of a number can be done using recursion. Below is the algorithm:

FACTORIAL(num) { if(num==0) return 1; else return num * FACTORIAL(num-1); }

We can do it in assembly in possibly multiple ways. One way is to use the stack
to store every return value from `FACTORIAL()`

so as to calculate the product.
Below is the assembly version of the above algorithm. An ```
unsigned
long
```

(8 bytes)
is used to store the factorial, although ideally factorials for numbers > 20 can
be huge and exceed the 64-bit space allotted to them. I have not figured out how
to take care of that yet.

section .rodata prompt1 db "Enter a positive number:",0 num_fmt db "%lu",0 prompt2 db "The factorial of %lu is %lu ",10,0 section .text global main, factorial extern printf, scanf main: push rbp mov rbp, rsp sub rsp, 8 pushfq push rbx push r12 push r13 push r14 push r15 ; Read in a number mov rdi, dword prompt1 xor rax, rax call printf lea rsi, [rbp-8] mov rdi, dword num_fmt xor rax,rax call scanf ; Calculate Factorial. Result returned in RDX:RAX mov rdi, [rbp-8] xor rax, rax call factorial ; Print result (this part is flawed. will be explained later) mov rdx, rax mov rsi, [rbp-8] mov rdi, dword prompt2 xor rax, rax call printf pop r15 pop r14 pop r13 pop r12 pop rbx popfq mov rsp,rbp pop rbp ret factorial: push rbp mov rbp, rsp ; Check if the number in RDI is 0. ; If yes, return with value 1 in RAX as factorial(0). mov rax, 1 xor rdx,rdx cmp rdi, 0 je return_here ; If value in RDI > 0, push on stack for multiplication, ; decrement and call factorial again. push rdi dec rdi xor rax, rax call factorial ; Eventually when factorial returns, it will have RAX = 1 when RDI = 0. ; So we pop the stack and do unsigned multiply with RAX and return that value in RDX:RAX. pop rsi mul rsi return_here: mov rsp, rbp pop rbp ret

As we see above, the code is fairly simple. There are a few points to note however.

- Recursion is handled by use of the stack - we could try not to use the
stack, but then we would run out of registers where we would store every
value of the number in
`RDI`

when factorial gets called multiple times. Moreover, it would be very hard to keep track of those registers. The stack gives us an easy way to handle recursion, with the hardware and the LIFO concept coming to our rescue. - Results need to be in the same register everytime for the recursive return to work - the final result keeps getting multiplied with the numbers from the stack and placed back into the same register. This will keep the factorial function independent of the number being multiplied with.

If we look closely at the factorial function, we see that we do not
handle overflow. When the unsigned multiply instruction ( `MUL`

) is used,
it multiplies `RSI`

with `RAX`

and places the resulting
value in `RDX:RAX`

.
But when a particular call to factorial returns, we are always
multiplying with the value in `RAX`

and neglecting the value in `RDX`

. So in
cases where the value of the factorial of the input number overflows
into `RDX`

, we get the wrong result.

While printing out the result in the main function, again we do not take
care of the overflow and move the value from `RAX`

to `RDX`

so that the x86-64 ABI rules can be followed. This needs to be fixed and we should
be able to print a 128-bit number as the result. This limits the maximum
number to which the correct factorial can be calculated to 20.

If the above assembly program was called `factorial.asm`

, below would be
the steps to assembling and linking it into an executable `factorial.out.`

$ yasm -f elf64 factorial.asm $ ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib64/crt1.o /usr/lib64/crti.o \ factorial.o /usr/lib64/crtn.o -lc -o factorial.out

Tweet

Follow @_vicash_