aarch64: Support for FEAT_SVE_F16F32MM, FEAT_F8F16M, FEAT_F8F32MM

FEAT_SVE_F16F32MM introduces the SVE half-precision floating-point
matrix multiply-accumulate to single-precision instruction.

FEAT_F8F32MM introduces the Advanced SIMD 8-bit floating-point matrix
multiply-accumulate to single-precision instruction.

FEAT_F8F16MM introduces the Advanced SIMD 8-bit floating-point matrix
multiply-accumulate to half-precision instruction.
26 files changed