x86: optimize left-shift-by-1
These can be replaced by adds when acting on a register operand.
While for the scalar forms there's no gain in encoding size, ADD
generally has higher throughput than SHL. EFLAGS set by ADD are a
superset of those set by SHL (AF in particular is undefined there).
For the SIMD cases the transformation also reduced code size, by
eliminating the 1-byte immediate from the resulting encoding. Note
that this transformation is not applied by gcc13 (according to my
observations), so would - as of now - even improve compiler generated
code.
8 files changed