Bit Shifting Operations

Logical shifts are operations in which the bits of a register or memory location are moved to the right or left by a certain number or a value in the `CL` register. They are also a very quick way to multiply or divide by 2 or powers of 2 as it involves just a shift of bits. There are 4 shift bit instructions, 4 rotate bit instructions and 2 double precision shift bit instructions for general purpose registers.

The shift arithmetic left ( `SAL` ) and shift logical left ( `SHL` ) instructions perform the same operation, and shift the bits in the destination operand to the left. For each shifted bit, the most significant bit is moved into the `CF` flag (carry flag) in the `RFLAGS` register, and the least significant bit is cleared. Similarly, the shift arithmetic right ( `SAR` ) and shift logical right ( `SHR` ) instruction shift the bits in the destination operand to the right, with the least significant bit being moved into the `CF` flag. However, the most significant bit is cleared only for the `SHR` instruction. It remains the same for the `SAR` instruction, for maintaining the sign of the unshifted value in the destination operand.

The rotate left ( `ROL` ) and rotate through carry left ( `RCL` ) shift all their bits to their more-significant bit locations, where the most significant bit is rotated back into the least significant bit location. The rotate right ( `ROR` ) and rotate through carry right ( `RCR` ) shift all their bits to their lesser significant bit locations, where the least significant bit is rotated back into the most significant bit location. The `RCL` and `RCR` instructions include the carry flag `CF` in the rotation. The overflow flag `OF` is defined only for the 1-bit rotations.

In 64-bit mode, the default operation size is 32 bits, and the mask-width for the `CL` register is 5 bits (value is 31). This means that the default maximum number of bit-shifts will be 31. To change the operation size to 64 bits, and the mask-width for the `CL` register to 6 bits (value is 63), the `REX.W` prefix needs to be used. The assembler will automatically add that if the 64-bit registers like `RAX` , `RBX` , etc. are used. If the 32-bit registers like `EAX` , `EBX` , etc. are used, no `REX` prefix is added. If the extra registers `R8 - R15` are used, the corresponding `REX` prefix is added by the assembler. This is valid for all the rotate and shift operation instructions.

Here is an example of what the opcodes would look like for different size registers being used:

• If the instruction is
`rol eax,1`
the opcode generated (in hexadecimal notation) is
`D1 C0`
• If the instruction is
`rol rax,1`
the opcode generated (in hexadecimal notation) is
`48 D1 C0`
You can see that `0x48` is the `REX.W` prefix addedby the assembler.
• If the instruction is
`rol r8,1`
the opcode generated (in hexadecimal notation) is
`49 D1 C0`
The `REX` prefix here is `0x49` .

Below is a sample program to count the number of bits that are on (value 1) in a 4-byte integer entered by the user at the prompt.

```section .rodata
prompt1  db "Enter a number:",0
prompt2  db "The number of bits that are on in %d are %d.",10,0
num_format db "%d"

section .text
global main
extern printf, scanf

main:
push  rsp
mov   rbp, rsp
sub   rsp, 8    ; we plan to read in a 4-byte integer on the stack
push  rbx
push  r12
push  r13
push  r14
push  r15
pushfq

; read in the 4-byte integer
mov   rdi, dword prompt1
xor   rax, rax
call  printf
lea   rsi, [rbp-8]
mov   rdi, dword num_format
xor   rax, rax
call  scanf

; count the bits that have value 1
mov   eax, [rbp-8]    ; since we deal with a 4-byte integer we use EAX here.
; If we want to work with a 64-bit integer we will use RAX instead.
mov   rcx, 64         ; set the maximum number of bits you want to count, in this case 64 (register size).
xor   rdx, rdx
count_loop:
rol   rax, 1          ; since we want to rotate the bits so as to maintain the unshifted value we use RAX.
adc   rdx, 0          ; since the most-significant bit is moved into CF, we add with carry.
loop  count_loop

; print the result
; the third argument is the counted value, but that is already stored in RDX.
mov   rsi, rax    ; move the original 4-byte integer value into RSI. We can also use EAX and ESI.
mov   rdi, dword prompt2
xor   rax, rax
call  printf

pop   rcx         ; removing the subtracted 8 bytes
popfq
pop   r15
pop   r14
pop   r13
pop   r12
pop   rbx
leave
ret
```

As we can see, the above program maintains the original 4-byte integer value in the same register `RAX` because of the fact that we use the `ROL` instruction. We could also use the `SHL` instruction and the same result would be achieved, however we would end up losing the original value in the `RAX` register and would have to get it back from memory to print it on screen in the results. Will this program work in the same way if the `ROL` instruction was replaced by the `ROR` instruction ?

To read in an 8-byte long integer, we would have to make the following changes, viz.,

• replace `"%d"` with `"%ld"` in all the string prompts and formats
• replace
`mov eax, [rbp+8]`
with
`mov rax, [rbp+8]`

To compile and link the program above called `bitcount.asm` , we do the following:

``` \$ yasm -f elf64 bitcount.asm
\$ ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib64/crt1.o  /usr/lib64/crti.o \
bitcount.o /usr/lib64/crtn.o -lc -o bitcount.out
```

There are two double-precision shift instructions `SHLD` and `SHRD` ... to be done!