x86: fold certain AVX2 templates into their AVX counterparts

Like for AVX512VL we can make the handling of operand sizes a little
more flexible to allow reducing the number of templates we have.
3 files changed