[PATCH] PR rtl-optimization/46235: Improved use of bt for bit tests on x86_64.

This patch tackles PR46235 to improve the code generated for bit tests
on x86_64 by making more use of the bt instruction.  Currently, GCC emits
bt instructions when followed by condition jumps (thanks to Uros' splitters).
This patch adds splitters in i386.md, to catch the cases where bt is followed
by a conditional move (as in the original report), or by a setc/setnc (as in
comment 5 of the Bugzilla PR).

With this patch, the function in the original PR
int foo(int a, int x, int y) {
    if (a & (1 << x))
       return a;
   return 1;
}

which with -O2 on mainline generates:
foo:	movl    %edi, %eax
        movl    %esi, %ecx
        sarl    %cl, %eax
        testb   $1, %al
        movl    $1, %eax
        cmovne  %edi, %eax
        ret

now generates:
foo:	btl     %esi, %edi
        movl    $1, %eax
        cmovc   %edi, %eax
        ret

Likewise, IsBitSet1 and IsBitSet2 (from comment 5)
bool IsBitSet1(unsigned char byte, int index) {
    return (byte & (1<<index)) != 0;
}
bool IsBitSet2(unsigned char byte, int index) {
    return (byte >> index) & 1;
}

Before:
        movzbl  %dil, %eax
        movl    %esi, %ecx
        sarl    %cl, %eax
        andl    $1, %eax
        ret

After:
        movzbl  %dil, %edi
        btl     %esi, %edi
        setc    %al
        ret

According to Agner Fog, SAR/SHR r,cl takes 2 cycles on skylake,
where BT r,r takes only one, so the performance improvements on
recent hardware may be more significant than implied by just
the reduced number of instructions.  I've avoided transforming cases
(such as btsi_setcsi) where using bt sequences may not be a clear
win (over sarq/andl).

2010-06-15  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR rtl-optimization/46235
	* config/i386/i386.md: New define_split for bt followed by cmov.
	(*bt<mode>_setcqi): New define_insn_and_split for bt followed by setc.
	(*bt<mode>_setncqi): New define_insn_and_split for bt then setnc.
	(*bt<mode>_setnc<mode>): New define_insn_and_split for bt followed
	by setnc with zero extension.

gcc/testsuite/ChangeLog
	PR rtl-optimization/46235
	* gcc.target/i386/bt-5.c: New test.
	* gcc.target/i386/bt-6.c: New test.
	* gcc.target/i386/bt-7.c: New test.
4 files changed