gnu /
gcc /
a6291d88d5b6c17d41950e21d7d452f7f0f73020 Remove pass_cpb which is related to enable avx512 embedded broadcast from constant pool.
By optimizing vector movement to broadcast in ix86_expand_vector_move
during pass_expand, pass_reload/LRA can automatically generate an avx512
embedded broadcast, pass_cpb is not needed.
Considering that in the absence of avx512f, broadcast from memory is
still slightly faster than loading the entire memory, so always enable
broadcast.
benchmark:
https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vaddps/broadcast
The performance diff
strategy : cycles
memory : 1046611188
memory : 1255420817
memory : 1044720793
memory : 1253414145
average : 1097868397
broadcast : 1044430688
broadcast : 1044477630
broadcast : 1253554603
broadcast : 1044561934
average : 1096756213
But however broadcast has larger size.
the size diff
size broadcast.o
text data bss dec hex filename
137 0 0 137 89 broadcast.o
size memory.o
text data bss dec hex filename
115 0 0 115 73 memory.o
gcc/ChangeLog:
* config/i386/i386-expand.c
(ix86_broadcast_from_integer_constant): Rename to ..
(ix86_broadcast_from_constant): .. this, and extend it to
handle float mode.
(ix86_expand_vector_move): Extend to float mode.
* config/i386/i386-features.c
(replace_constant_pool_with_broadcast): Remove.
(remove_partial_avx_dependency_gate): Ditto.
(constant_pool_broadcast): Ditto.
(class pass_constant_pool_broadcast): Ditto.
(make_pass_constant_pool_broadcast): Ditto.
(remove_partial_avx_dependency): Adjust gate.
* config/i386/i386-passes.def: Remove pass_constant_pool_broadcast.
* config/i386/i386-protos.h
(make_pass_constant_pool_broadcast): Remove.
gcc/testsuite/ChangeLog:
* gcc.target/i386/fuse-caller-save-xmm.c: Adjust testcase.
5 files changed