x86-64 TUTORIAL: BIT SHIFTING OPERATIONS

Logical shifts are operations in which the bits of a register or memory location are moved to the right or left by a certain number or a value in the CL register. They are also a very quick way to multiply or divide by 2 or powers of 2 as it involves just a shift of bits. There are 4 shift bit instructions, 4 rotate bit instructions and 2 double precision shift bit instructions for general purpose registers.

The shift arithmetic left SAL and shift logical left SHL instructions perform the same operation, and shift the bits in the destination operand to the left. For each shifted bit, the most significant bit is moved into the CF flag (carry flag) in the RFLAGS register, and the least significant bit is cleared. Similarly, the shift arithmetic right SAR and shift logical right SHR instruction shift the bits in the destination operand to the right, with the least significant bit being moved into the CF flag. However, the most significant bit is cleared only for the SHR instruction. It remains the same for the SAR instruction, for maintaining the sign of the unshifted value in the destination operand.

The rotate left ROL and rotate through carry left RCL shift all their bits to their more-significant bit locations, where the most significant bit is rotated back into the least significant bit location. The rotate right ROR and rotate through carry right RCR shift all their bits to their lesser significant bit locations, where the least significant bit is rotated back into the most significant bit location. The RCL and RCR instructions include the carry flag CF in the rotation. The overflow flag OF is defined only for the 1-bit rotations.

In 64-bit mode, the default operation size is 32 bits, and the mask-width for the CL register is 5 bits (value is 31). This means that the default maximum number of bit-shifts will be 31. To change the operation size to 64 bits, and the mask-width for the CL register to 6 bits (value is 63), the REX.W prefix needs to be used. The assembler will automatically add that if the 64-bit registers like RAX, RBX, etc. are used. If the 32-bit registers like EAX, EBX, etc. are used, no REX prefix is added. If the extra registers R8 - R15 are used, the corresponding REX prefix is added by the assembler. This is valid for all the rotate and shift operation instructions.

Here is an example of what the opcodes would look like for different size registers being used:

If the instruction is rol eax, 1, the opcode generated (in hexadecimal notation) is D1 C0.
If the instruction is rol rax, 1, the opcode generated (in hexadecimal notation) is 48 D1 C0. You can see that 0x48 is the REX.W prefix addedby the assembler.
If the instruction is rol r8,1, the opcode generated (in hexadecimal notation) is 49 D1 C0. The REX prefix here is 0x49.

Below is a sample program to count the number of bits that are on (value 1) in a 4-byte integer entered by the user at the prompt.

section .rodata
    prompt1  db "Enter a number:",0
    prompt2  db "The number of bits that are on in %d are %d.",10,0
    num_format db "%d"

section .text
    global main
    extern printf, scanf

    main:
        push  rsp
        mov   rbp, rsp
        sub   rsp, 8    ; we plan to read in a 4-byte integer on the stack
        push  rbx
        push  r12
        push  r13
        push  r14
        push  r15
        pushfq 
       
        ; read in the 4-byte integer
        mov   rdi, dword prompt1
        xor   rax, rax
        call  printf
        lea   rsi, [rbp-8]
        mov   rdi, dword num_format       
        xor   rax, rax
        call  scanf

        ; count the bits that have value 1
        mov   eax, [rbp-8]    ; since we deal with a 4-byte integer we use EAX here. 
                              ; If we want to work with a 64-bit integer we will use RAX instead.
        mov   rcx, 64         ; set the maximum number of bits you want to count, in this case 64 (register size).
        xor   rdx, rdx
  count_loop:
        rol   rax, 1          ; since we want to rotate the bits so as to maintain the unshifted value we use RAX.
        adc   rdx, 0          ; since the most-significant bit is moved into CF, we add with carry. 
        loop  count_loop

        ; print the result
                          ; the third argument is the counted value, but that is already stored in RDX.
        mov   rsi, rax    ; move the original 4-byte integer value into RSI. We can also use EAX and ESI. 
        mov   rdi, dword prompt2
        xor   rax, rax
        call  printf

        pop   rcx         ; removing the subtracted 8 bytes
        popfq   
        pop   r15
        pop   r14
        pop   r13
        pop   r12
        pop   rbx
        leave
        ret

As we can see, the above program maintains the original 4-byte integer value in the same register RAX because of the fact that we use the ROL instruction. We could also use the SHL instruction and the same result would be achieved. However, we would end up losing the original value in the RAX register and would have to get it back from memory to print it on screen in the results. Will this program work in the same way if the ROL instruction was replaced by the ROR instruction ?

To read in an 8-byte long integer, we would have to make the following changes, viz.,

replace "%d" with "%ld" in all the string prompts and formats, and
replace mov eax, [rbp+8] with mov rax, [rbp+8].

To compile and link the program above called bitcount.asm we do the following:

 $ yasm -f elf64 bitcount.asm
 $ ld -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 \
    /usr/lib/x86_64-linux-gnu/crt1.o  /usr/lib/x86_64-linux-gnu/crti.o \
    bitcount.o /usr/lib/x86_64-linux-gnu/crtn.o -lc -o bitcount.out

Download bitcount.asm, asm_io.inc and asm_io.asm.

There are two double-precision shift instructions SHLD and SHRD … to be done!.

x86-64 TUTORIAL: BIT SHIFTING OPERATIONS

SUPPORT THIS SITE