Opened 4 years ago
Last modified 4 years ago
#9149 new defect
bugs in Altivec code
Reported by: | Ilya Kurdyukov | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | swscale |
Version: | unspecified | Keywords: | ppc altivec |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Is the Altivec code still in use?
I am making a SIMD optimization patch for the Elbrus architecture (e2k). It's easier to port from existing code with intrinsic functions, so I'm rewriting the Altivec code to Elbrus.
When doing this, I noticed three places in swscale that are definitely wrong in Altivec/VSX (compared to the C reference code). I can't check that for sure because I don't have Altivec hardware.
I check swscale with:
$ libswscale/tests/swscale -cpuflags 0 2> /dev/null > ../swscale_ref.log $ libswscale/tests/swscale 2> /dev/null > ../swscale.log $ diff -u swscale_ref.log swscale.log > swscale.diff
It's pretty slow. After completing the tests, "swscale.diff" shows the problems.
Some yuv2rgb conversions are more accurate, it's okay. But check the lines with big SSD values, this is what is not working correctly.
libswscale/ppc/swscale_vsx.c:2110
Here's should be added "dstFormat != AV_PIX_FMT_P010LE && dstFormat != AV_PIX_FMT_P010BE", because this code doesn't handle these formats.
if (!(c->flags & (SWS_BITEXACT | SWS_FULL_CHR_H_INT)) && !c->needAlpha) { switch (c->dstBpc) { case 8: c->yuv2plane1 = yuv2plane1_8_vsx; break;
libswscale/ppc/swscale_vsx.c:1060
In yuv2rgb_full_1_vsx_template() there is a code:
const vec_s16 mul8 = vec_splat_s16(8); ... vu32_l = vec_mule(vu, mul8); vu32_r = vec_mulo(vu, mul8); vv32_l = vec_mule(vv, mul8); vv32_r = vec_mulo(vv, mul8);
But should be:
const vec_s16 mul2 = vec_splat_s16(2); ... tmp32 = vec_mule(vu, mul2); tmp32_2 = vec_mulo(vu, mul2); vu32_l = vec_mergeh(tmp32, tmp32_2); vu32_r = vec_mergel(tmp32, tmp32_2); tmp32 = vec_mule(vv, mul2); tmp32_2 = vec_mulo(vv, mul2); vv32_l = vec_mergeh(tmp32, tmp32_2); vv32_r = vec_mergel(tmp32, tmp32_2);
Actually, this code may be written a lot better, by multiplying coef multipliers before the for loop. And using vec_unpack instead of vec_mul/vec_merge.
libswscale/ppc/swscale_vsx.c:1513
In yuv2422_2_vsx_template() there is a wrong macro SETUP, that multiplies by (4096 - alpha) and (4096 - alpha), should be (4096 - alpha) and (alpha).
You can see yuv2rgb_full_2_vsx_template() for the correct code.
Change History (3)
comment:1 by , 4 years ago
comment:2 by , 4 years ago
Keywords: | ppc altivec added |
---|
If the issues you see are reproducible with current FFmpeg git head, please send your patches split into individual bug fixes and made with git format-patch
to the development mailing list.
comment:3 by , 4 years ago
"If the issues you see are reproducible with current FFmpeg git head"
As I said, I can't test this because I don't have Altivec hardware to test it, but I'm pretty sure this code is wrong as I ported this code to a different SIMD architecture and changed these parts with the correct code.
And I have no intention of fixing the Altivec code, when I told my colleagues that I found bugs in the Altivec code during porting, they recommended reporting this to the ffmpeg bug tracker, which I did.
Also I found strange code in "libswscale/ppc/yuv2rgb_altivec.c":
yuv2packedX_altivec() has some code after the loop that calculates the extra remaining bytes that are less than 16x RGB blocks. But this code will never be called, because of "i < dstW" conditional, it should be "i + 15 < dstW".
for (i = 0; i < dstW; i += 16) {
...
}
if (i < dstW) {
i -= 16;
...
memcpy(&((uint32_t *) dest)[i], scratch, (dstW - i) / 4);
}
And altivec_uyvy_rgb32() in the same source file - seems to be never called in the current version of ffmpeg.
ffmpeg version is 4.3.1 (not in list on tracker)