Remove pass_cpb which is related to enable avx512 embedded broadcast from constant pool.

By optimizing vector movement to broadcast in ix86_expand_vector_move
during pass_expand, pass_reload/LRA can automatically generate an avx512
embedded broadcast, pass_cpb is not needed.

Considering that in the absence of avx512f, broadcast from memory is
still slightly faster than loading the entire memory, so always enable
broadcast.

benchmark:
https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vaddps/broadcast

The performance diff

strategy    : cycles
memory      : 1046611188
memory      : 1255420817
memory      : 1044720793
memory      : 1253414145
average	    : 1097868397

broadcast   : 1044430688
broadcast   : 1044477630
broadcast   : 1253554603
broadcast   : 1044561934
average	    : 1096756213

But however broadcast has larger size.

the size diff

size broadcast.o
   text	   data	    bss	    dec	    hex	filename
    137	      0	      0	    137	     89	broadcast.o

size memory.o
   text	   data	    bss	    dec	    hex	filename
    115	      0	      0	    115	     73	memory.o

gcc/ChangeLog:

	* config/i386/i386-expand.c
	(ix86_broadcast_from_integer_constant): Rename to ..
	(ix86_broadcast_from_constant): .. this, and extend it to
	handle float mode.
	(ix86_expand_vector_move): Extend to float mode.
	* config/i386/i386-features.c
	(replace_constant_pool_with_broadcast): Remove.
	(remove_partial_avx_dependency_gate): Ditto.
	(constant_pool_broadcast): Ditto.
	(class pass_constant_pool_broadcast): Ditto.
	(make_pass_constant_pool_broadcast): Ditto.
	(remove_partial_avx_dependency): Adjust gate.
	* config/i386/i386-passes.def: Remove pass_constant_pool_broadcast.
	* config/i386/i386-protos.h
	(make_pass_constant_pool_broadcast): Remove.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/fuse-caller-save-xmm.c: Adjust testcase.
5 files changed