The calculation of a factorial of a number can be done using recursion. Below is the algorithm:

         return 1;
    else return num * FACTORIAL(num-1);

We can do it in assembly in possibly multiple ways. One way is to use the stack to store every return value from FACTORIAL() so as to calculate the product. Below is the assembly version of the above algorithm. An unsigned long (8 bytes) is used to store the factorial, although ideally factorials for numbers > 20 can be huge and exceed the 64-bit space allotted to them. I have not figured out how to take care of that yet.

section .rodata
    prompt1 db  "Enter a positive number:",0
    num_fmt db  "%lu",0
    prompt2 db  "The factorial of %lu is %lu ",10,0

section .text
    global  main, factorial
    extern  printf, scanf

        push    rbp
        mov     rbp, rsp
        sub     rsp, 8
        push    rbx
        push    r12
        push    r13
        push    r14
        push    r15
        ; Read in a number
        mov     rdi, dword prompt1
        xor     rax, rax
        call    printf
        lea     rsi, [rbp-8]
        mov     rdi, dword num_fmt
        xor     rax,rax
        call    scanf

        ; Calculate Factorial. Result returned in RDX:RAX
        mov     rdi, [rbp-8]
        xor     rax, rax
        call    factorial
        ; Print result (this part is flawed. will be explained later)
        mov     rdx, rax
        mov     rsi, [rbp-8]
        mov     rdi, dword prompt2
        xor     rax, rax
        call    printf
        pop     r15
        pop     r14
        pop     r13
        pop     r12
        pop     rbx
        mov     rsp,rbp
        pop     rbp

        push    rbp
        mov     rbp, rsp
        ; Check if the number in RDI is 0. 
        ; If yes, return with value 1 in RAX as factorial(0). 
        mov     rax, 1
        xor     rdx,rdx
        cmp     rdi, 0
        je      return_here
        ; If value in RDI > 0, push on stack for multiplication,
        ; decrement and call factorial again.
        push    rdi
        dec     rdi
        xor     rax, rax
        call    factorial
        ; Eventually when factorial returns, it will have RAX = 1 when RDI = 0.
        ; So we pop the stack and do unsigned multiply with RAX and return that value in RDX:RAX.
        pop     rsi
        mul     rsi
        mov     rsp, rbp
        pop     rbp

As we see above, the code is fairly simple. There are a few points to note however.

If we look closely at the factorial function, we see that we do not handle overflow. When the unsigned multiply instruction ( MUL ) is used, it multiplies RSI with RAX and places the resulting value in RDX:RAX . But when a particular call to factorial returns, we are always multiplying with the value in RAX and neglecting the value in RDX . So in cases where the value of the factorial of the input number overflows into RDX , we get the wrong result.

While printing out the result in the main function, again we do not take care of the overflow and move the value from RAX to RDX so that the x86-64 ABI rules can be followed. This needs to be fixed and we should be able to print a 128-bit number as the result. This limits the maximum number to which the correct factorial can be calculated to 20.

If the above assembly program was called factorial.asm , below would be the steps to assembling and linking it into an executable factorial.out.

$ yasm -f elf64 factorial.asm
$ ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib64/crt1.o /usr/lib64/crti.o \
      factorial.o  /usr/lib64/crtn.o -lc -o factorial.out


Follow @_vicash_