aarch64: Add support for sme2.1 luti2 and luti4 instructions.

This patch adds support for following sme2.1 luti2 and luti4 instructions, spec is
available here [1]

1. LUTI2 (two registers) strided.
2. LUTI2 (four registers) strided.
3. LUTI4 (two registers) strided.
4. LUTI4 (four registers) strided.

[1]: https://developer.arm.com/documentation/ddi0602/2024-03/SME-Instructions?lang=en
10 files changed