b2bb611d90d01f64a2456c29de2a2ca1211ac134 - gcc

commit	b2bb611d90d01f64a2456c29de2a2ca1211ac134	[log] [tgz]
author	Tamar Christina <tamar.christina@arm.com>	Mon Nov 14 15:42:42 2022 +0000
committer	Tamar Christina <tamar.christina@arm.com>	Mon Nov 14 17:40:56 2022 +0000
tree	beaed686bf35b867edc42d73a51ef5f0044ccb7f
parent	2b85d759dae79c930abe8118e1102ecb673b74aa [diff]

middle-end: Add optimized float addsub without needing VEC_PERM_EXPR. For IEEE 754 floating point formats we can replace a sequence of alternative +/- with fneg of a wider type followed by an fadd. This eliminated the need for using a permutation. This patch adds a math.pd rule to recognize and do this rewriting. For void f (float *restrict a, float *restrict b, float *res, int n) { for (int i = 0; i < (n & -4); i+=2) { res[i+0] = a[i+0] + b[i+0]; res[i+1] = a[i+1] - b[i+1]; } } we generate: .L3: ldr q1, [x1, x3] ldr q0, [x0, x3] fneg v1.2d, v1.2d fadd v0.4s, v0.4s, v1.4s str q0, [x2, x3] add x3, x3, 16 cmp x3, x4 bne .L3 now instead of: .L3: ldr q1, [x0, x3] ldr q2, [x1, x3] fadd v0.4s, v1.4s, v2.4s fsub v1.4s, v1.4s, v2.4s tbl v0.16b, {v0.16b - v1.16b}, v3.16b str q0, [x2, x3] add x3, x3, 16 cmp x3, x4 bne .L3 Thanks to George Steed for the idea. gcc/ChangeLog: * generic-match-head.cc: Include langooks. * gimple-match-head.cc: Likewise. * match.pd: Add fneg/fadd rule. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/addsub_1.c: New test. * gcc.target/aarch64/sve/addsub_1.c: New test.