Opened 3 months ago

Last modified 3 months ago

#9149 new defect

bugs in Altivec code

Reported by: Ilya Kurdyukov Owned by:
Priority: normal Component: swscale
Version: unspecified Keywords: ppc altivec
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no


Is the Altivec code still in use?

I am making a SIMD optimization patch for the Elbrus architecture (e2k). It's easier to port from existing code with intrinsic functions, so I'm rewriting the Altivec code to Elbrus.

When doing this, I noticed three places in swscale that are definitely wrong in Altivec/VSX (compared to the C reference code). I can't check that for sure because I don't have Altivec hardware.

I check swscale with:

$ libswscale/tests/swscale -cpuflags 0 2> /dev/null > ../swscale_ref.log
$ libswscale/tests/swscale 2> /dev/null > ../swscale.log
$ diff -u swscale_ref.log swscale.log > swscale.diff

It's pretty slow. After completing the tests, "swscale.diff" shows the problems.
Some yuv2rgb conversions are more accurate, it's okay. But check the lines with big SSD values, this is what is not working correctly.


Here's should be added "dstFormat != AV_PIX_FMT_P010LE && dstFormat != AV_PIX_FMT_P010BE", because this code doesn't handle these formats.

    if (!(c->flags & (SWS_BITEXACT | SWS_FULL_CHR_H_INT)) && !c->needAlpha) {
        switch (c->dstBpc) {
        case 8:
            c->yuv2plane1 = yuv2plane1_8_vsx;


In yuv2rgb_full_1_vsx_template() there is a code:

    const vec_s16 mul8 = vec_splat_s16(8);
            vu32_l = vec_mule(vu, mul8);
            vu32_r = vec_mulo(vu, mul8);
            vv32_l = vec_mule(vv, mul8);
            vv32_r = vec_mulo(vv, mul8);

But should be:

    const vec_s16 mul2 = vec_splat_s16(2);
            tmp32 = vec_mule(vu, mul2);
            tmp32_2 = vec_mulo(vu, mul2);
            vu32_l = vec_mergeh(tmp32, tmp32_2);
            vu32_r = vec_mergel(tmp32, tmp32_2);
            tmp32 = vec_mule(vv, mul2);
            tmp32_2 = vec_mulo(vv, mul2);
            vv32_l = vec_mergeh(tmp32, tmp32_2);
            vv32_r = vec_mergel(tmp32, tmp32_2);

Actually, this code may be written a lot better, by multiplying coef multipliers before the for loop. And using vec_unpack instead of vec_mul/vec_merge.


In yuv2422_2_vsx_template() there is a wrong macro SETUP, that multiplies by (4096 - alpha) and (4096 - alpha), should be (4096 - alpha) and (alpha).

You can see yuv2rgb_full_2_vsx_template() for the correct code.

Change History (3)

comment:1 by Ilya Kurdyukov, 3 months ago

ffmpeg version is 4.3.1 (not in list on tracker)

comment:2 by Carl Eugen Hoyos, 3 months ago

Keywords: ppc altivec added

If the issues you see are reproducible with current FFmpeg git head, please send your patches split into individual bug fixes and made with git format-patch to the development mailing list.

comment:3 by Ilya Kurdyukov, 3 months ago

"If the issues you see are reproducible with current FFmpeg git head"

As I said, I can't test this because I don't have Altivec hardware to test it, but I'm pretty sure this code is wrong as I ported this code to a different SIMD architecture and changed these parts with the correct code.

And I have no intention of fixing the Altivec code, when I told my colleagues that I found bugs in the Altivec code during porting, they recommended reporting this to the ffmpeg bug tracker, which I did.

Also I found strange code in "libswscale/ppc/yuv2rgb_altivec.c":

yuv2packedX_altivec() has some code after the loop that calculates the extra remaining bytes that are less than 16x RGB blocks. But this code will never be called, because of "i < dstW" conditional, it should be "i + 15 < dstW".

for (i = 0; i < dstW; i += 16) {



if (i < dstW) {

i -= 16;
memcpy(&((uint32_t *) dest)[i], scratch, (dstW - i) / 4);


And altivec_uyvy_rgb32() in the same source file - seems to be never called in the current version of ffmpeg.

Note: See TracTickets for help on using tickets.