x86: reduce AVX512 FP set of insns decoded through vex_w_table[]

Like for AVX512-FP16, there's not that many FP insns where going through
this table is easier / cheaper than using suitable macros. Utilize %XS
and %XD more to eliminate a fair number of table entries.

While doing this I noticed a few anomalies. Where lines get touched /
moved anyway, these are being addressed right here:
- vmovshdup used EXx for its 2nd operand, thus displaying seemingly
  valid broadcast when EVEX.b is set with a memory operand; use
  EXEvexXNoBcst instead just like vmovsldup already does
- vmovlhps used EXx for its 3rd operand, when all sibling entries use
  EXq; switch to EXq there for consistency (the two differ only for
  memory operands)
5 files changed