gnu /
gcc /
a89ac9011e04cf8ebdf856b679bd91000ef70175 AArch64: Add SVE2 implementation for pow2 bitmask division
In plenty of image and video processing code it's common to modify pixel values
by a widening operation and then scale them back into range by dividing by 255.
This patch adds an named function to allow us to emit an optimized sequence
when doing an unsigned division that is equivalent to:
x = y / (2 ^ (bitsize (y)/2)-1)
For SVE2 this means we generate for:
void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
{
for (int i = 0; i < (n & -16); i+=1)
pixel[i] = (pixel[i] * level) / 0xff;
}
the following:
mov z3.b, #1
.L3:
ld1b z0.h, p0/z, [x0, x3]
mul z0.h, p1/m, z0.h, z2.h
addhnb z1.b, z0.h, z3.h
addhnb z0.b, z0.h, z1.h
st1b z0.h, p0, [x0, x3]
inch x3
whilelo p0.h, w3, w2
b.any .L3
instead of:
.L3:
ld1b z0.h, p1/z, [x0, x3]
mul z0.h, p0/m, z0.h, z1.h
umulh z0.h, p0/m, z0.h, z2.h
lsr z0.h, z0.h, #7
st1b z0.h, p1, [x0, x3]
inch x3
whilelo p1.h, w3, w2
b.any .L3
Which results in significantly faster code.
gcc/ChangeLog:
* config/aarch64/aarch64-sve2.md (@aarch64_bitmask_udiv<mode>3): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve2/div-by-bitmask_1.c: New test.
2 files changed