middle-end: Add optimized float addsub without needing VEC_PERM_EXPR.

For IEEE 754 floating point formats we can replace a sequence of alternative
+/- with fneg of a wider type followed by an fadd.  This eliminated the need for
using a permutation.  This patch adds a math.pd rule to recognize and do this
rewriting.

For

void f (float *restrict a, float *restrict b, float *res, int n)
{
   for (int i = 0; i < (n & -4); i+=2)
    {
      res[i+0] = a[i+0] + b[i+0];
      res[i+1] = a[i+1] - b[i+1];
    }
}

we generate:

.L3:
        ldr     q1, [x1, x3]
        ldr     q0, [x0, x3]
        fneg    v1.2d, v1.2d
        fadd    v0.4s, v0.4s, v1.4s
        str     q0, [x2, x3]
        add     x3, x3, 16
        cmp     x3, x4
        bne     .L3

now instead of:

.L3:
        ldr     q1, [x0, x3]
        ldr     q2, [x1, x3]
        fadd    v0.4s, v1.4s, v2.4s
        fsub    v1.4s, v1.4s, v2.4s
        tbl     v0.16b, {v0.16b - v1.16b}, v3.16b
        str     q0, [x2, x3]
        add     x3, x3, 16
        cmp     x3, x4
        bne     .L3

Thanks to George Steed for the idea.

gcc/ChangeLog:

	* generic-match-head.cc: Include langooks.
	* gimple-match-head.cc: Likewise.
	* match.pd: Add fneg/fadd rule.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/simd/addsub_1.c: New test.
	* gcc.target/aarch64/sve/addsub_1.c: New test.
5 files changed