The aim of solving this problem is to learn how to use the XMM registers for
multiplication of floating point numbers. Matrix multiplication is a slow
calculation especially if the floating point unit is used, and hence doing
packed floating point calculations (if double precision is not required) might
just be much faster. So this program will test that.