Logical shifts are operations in which the bits of a register or memory location
are moved to the right or left by a certain number or a value in the CL
register. They are also a very quick way to multiply or divide by 2 or powers of
2 as it involves just a shift of bits. There are 4 shift bit instructions, 4
rotate bit instructions and 2 double precision shift bit instructions for
general purpose registers.
The shift arithmetic left SAL
and shift logical left SHL
instructions
perform the same operation, and shift the bits in the destination operand to the
left. For each shifted bit, the most significant bit is moved into the CF
flag
(carry flag) in the RFLAGS
register, and the least significant bit is cleared.
Similarly, the shift arithmetic right SAR
and shift logical right SHR
instruction shift the bits in the destination operand to the right, with the
least significant bit being moved into the CF
flag. However, the most
significant bit is cleared only for the SHR
instruction. It remains the same for
the SAR
instruction, for maintaining the sign of the unshifted value in the
destination operand.
The rotate left ROL
and rotate through carry left RCL
shift all their bits
to their more-significant bit locations, where the most significant bit is
rotated back into the least significant bit location. The rotate right ROR
and
rotate through carry right RCR
shift all their bits to their
lesser significant bit locations, where the least significant bit is rotated
back into the most significant bit location. The RCL
and RCR
instructions include
the carry flag CF
in the rotation. The overflow flag OF
is defined only for the 1-bit rotations.
In 64-bit mode, the default operation size is 32 bits, and the mask-width for
the CL
register is 5 bits (value is 31). This means that the default maximum
number of bit-shifts will be 31. To change the operation size to 64 bits, and
the mask-width for the CL
register to 6 bits (value is 63), the REX.W
prefix
needs to be used. The assembler will automatically add that if the 64-bit
registers like RAX
, RBX
, etc. are used. If the
32-bit registers like EAX
, EBX
, etc. are used, no REX
prefix is added. If the extra registers R8 - R15
are used,
the corresponding REX
prefix is added by the assembler. This is valid for all
the rotate and shift operation instructions.
Here is an example of what the opcodes would look like for different size registers being used:
- If the instruction is
rol eax, 1
, the opcode generated (in hexadecimal notation) isD1 C0
. - If the instruction is
rol rax, 1
, the opcode generated (in hexadecimal notation) is48 D1 C0
. You can see that0x48
is theREX.W
prefix addedby the assembler. - If the instruction is
rol r8,1
, the opcode generated (in hexadecimal notation) is49 D1 C0
. TheREX
prefix here is0x49
.
Below is a sample program to count the number of bits that are on (value 1) in a 4-byte integer entered by the user at the prompt.
section .rodata
prompt1 db "Enter a number:",0
prompt2 db "The number of bits that are on in %d are %d.",10,0
num_format db "%d"
section .text
global main
extern printf, scanf
main:
push rsp
mov rbp, rsp
sub rsp, 8 ; we plan to read in a 4-byte integer on the stack
push rbx
push r12
push r13
push r14
push r15
pushfq
; read in the 4-byte integer
mov rdi, dword prompt1
xor rax, rax
call printf
lea rsi, [rbp-8]
mov rdi, dword num_format
xor rax, rax
call scanf
; count the bits that have value 1
mov eax, [rbp-8] ; since we deal with a 4-byte integer we use EAX here.
; If we want to work with a 64-bit integer we will use RAX instead.
mov rcx, 64 ; set the maximum number of bits you want to count, in this case 64 (register size).
xor rdx, rdx
count_loop:
rol rax, 1 ; since we want to rotate the bits so as to maintain the unshifted value we use RAX.
adc rdx, 0 ; since the most-significant bit is moved into CF, we add with carry.
loop count_loop
; print the result
; the third argument is the counted value, but that is already stored in RDX.
mov rsi, rax ; move the original 4-byte integer value into RSI. We can also use EAX and ESI.
mov rdi, dword prompt2
xor rax, rax
call printf
pop rcx ; removing the subtracted 8 bytes
popfq
pop r15
pop r14
pop r13
pop r12
pop rbx
leave
ret
As we can see, the above program maintains the original 4-byte integer value in
the same register RAX
because of the fact that we use the ROL
instruction. We
could also use the SHL
instruction and the same result would be achieved.
However, we would end up losing the original value in the RAX
register and would
have to get it back from memory to print it on screen in the results. Will this
program work in the same way if the ROL
instruction was replaced
by the ROR
instruction ?
To read in an 8-byte long integer, we would have to make the following changes, viz.,
- replace
"%d"
with"%ld"
in all the string prompts and formats, and - replace
mov eax, [rbp+8]
withmov rax, [rbp+8]
.
To compile and link the program above called bitcount.asm
we do the following:
$ yasm -f elf64 bitcount.asm
$ ld -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 \
/usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o \
bitcount.o /usr/lib/x86_64-linux-gnu/crtn.o -lc -o bitcount.out
Download bitcount.asm, asm_io.inc and asm_io.asm.
There are two double-precision shift instructions SHLD
and SHRD
… to be done!.