Opened 3 years ago

Closed 15 months ago

Last modified 15 months ago

#7124 closed defect (fixed)

Unexpected results from YUV-RGB color space conversion optimization on POWER8/LE

Reported by: sayed adel Owned by:
Priority: normal Component: swscale
Version: git-master Keywords: ppc altivec
Cc: dje.gcc@gmail.com Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: yes

Description

It seems current implantation of Altivec doesn't support little-endian.

FFmpeg was compiled as:
./configure --prefix=/usr --extra-libs="-lpthread -lm" --enable-gpl --enable-libvpx --enable-libx264 --enable-nonfree --enable-shared --enable-avresample

Run time options:
ffmpeg -i ./Sintel.mp4 -filter:v fps=fps=1/60 /tests/before/frames/%0d.png

A full report about it could be found on an exciting issue at https://github.com/opencv/opencv/issues/11034

Attachments (6)

libswscale-disable-ppc-yuv2rgb.diff (367 bytes ) - added by sayed adel 3 years ago.
after.png (372.5 KB ) - added by sayed adel 3 years ago.
after disable PPC/ YUV-RGB optimization
before.png (390.3 KB ) - added by sayed adel 3 years ago.
With PPC/YUV-RGB optimization enabled
Dockerfile (2.4 KB ) - added by sayed adel 3 years ago.
Used during build and test
ffmpeg_7124.patch (10.1 KB ) - added by Hong Bo Peng 3 years ago.
ffmpeg_altivec_yuv2rgb.patch (3.5 KB ) - added by Chip Kerchner 2 years ago.
Fixes compiler bug - replace vec_lvsl/vec_perm instructions with vec_xl

Download all attachments as: .zip

Change History (20)

comment:1 by David Edelsohn, 3 years ago

Cc: dje.gcc@gmail.com added

comment:2 by Carl Eugen Hoyos, 3 years ago

Keywords: ppc added; Altivec VSX ppc64le removed
Priority: importantnormal
Resolution: worksforme
Status: newclosed

This unfortunately has no similarities with a valid ticket.

In any case, altivec optimizations for yuv->rgb work here on Power8/LE 64bit and show a significant speed-up.

comment:3 by sayed adel, 3 years ago

I'm sorry for not being clear enough.

It's not about the performance but the sanity of conversion.

Please could you look into that link and check my comments on it?
or even try to grep frames from video file and encoded into png format just like I mentioned above?

in reply to:  3 comment:4 by Carl Eugen Hoyos, 3 years ago

Keywords: altivec added
Reproduced by developer: set
Resolution: worksforme
Status: closedreopened
Version: unspecifiedgit-master

Replying to seiko:

It's not about the performance but the sanity of conversion.

Sorry, I realized that by now. Allow me to repeat that the original report was just invalid, post all necessary information including the command line you tested together with the complete, uncut console output here in the ticket, do not use external resources except for large source files.

$ ffmpeg -lavfi testsrc2,format=yuv420p -pix_fmt rgb24 -vcodec png -t 10 out.nut
ffmpeg version N-90615-g53688b6 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 6.3.1 (GCC) 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065.
  configuration: --enable-gpl --cpu=power8 --cc=/opt/at10.0/bin/gcc
  libavutil      56. 13.100 / 56. 13.100
  libavcodec     58. 17.100 / 58. 17.100
  libavformat    58. 10.100 / 58. 10.100
  libavdevice    58.  2.100 / 58.  2.100
  libavfilter     7. 14.100 /  7. 14.100
  libswscale      5.  0.102 /  5.  0.102
  libswresample   3.  0.101 /  3.  0.101
  libpostproc    55.  0.100 / 55.  0.100
Stream mapping:
  format -> Stream #0:0 (png)
Press [q] to stop, [?] for help
[swscaler @ 0x10019995be0] ALTIVEC: Color Space RGB24
Output #0, nut, to 'out.nut':
  Metadata:
    encoder         : Lavf58.10.100
    Stream #0:0: Video: png (MPNG / 0x474E504D), rgb24, 320x240 [SAR 1:1 DAR 4:3], q=2-31, 200 kb/s, 25 fps, 51200 tbn, 25 tbc (default)
    Metadata:
      encoder         : Lavc58.17.100 png
frame=  250 fps=0.0 q=-0.0 Lsize=    2289kB time=00:00:09.96 bitrate=1882.4kbits/s speed=56.6x
video:2286kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.130058%

The output file is clearly recognizable but looks broken.
I tested a few of the libavcodec altivec optimizations and they work fine on le (and are tested by fate), this is indeed a specific issue in libswscale.

comment:5 by Carl Eugen Hoyos, 3 years ago

Works fine with -cpuflags 0.

by sayed adel, 3 years ago

by sayed adel, 3 years ago

Attachment: after.png added

after disable PPC/ YUV-RGB optimization

by sayed adel, 3 years ago

Attachment: before.png added

With PPC/YUV-RGB optimization enabled

by sayed adel, 3 years ago

Attachment: Dockerfile added

Used during build and test

comment:6 by sayed adel, 3 years ago

Sorry, I realized that by now. Allow me to repeat that the original report was just invalid, post all necessary information including the command line you tested together with the complete, uncut console output here in the ticket, do not use external resources except for large source files.

Sorry again, I just rushed into it and I thought that provide an external URL for the original report will be just enough since I only investigated on it just to clear OpenCV side, Thanks goes to Haitao Yue who actually discover it.


this is indeed a specific issue in libswscale.

Yes, this how the frame looks like before disable optimization.
With PPC/YUV-RGB optimization enabled

And here's after I disable PPC/yuv2rgb by using the patch in attachment.

after disable PPC/ YUV-RGB optimization

I did a quick look into Altivec code on libswscale and I didn't realize any special cases for le or be and also you can easily realize many deprecated intrinsics on it, that why I said "current implementation of Altivec doesn't support little-endian".

Version 0, edited 3 years ago by sayed adel (next)

by Hong Bo Peng, 3 years ago

Attachment: ffmpeg_7124.patch added

comment:7 by Hong Bo Peng, 3 years ago

I did some investigate in this ticket. 2 issues were found.

Firstly, the code yuv2rgb_altivec.c uses vec_ld/vec_st to load and store data. But the compiler automatically do some changes for the perm masks for LE. So it is better use vec_xl (or vec_vsx_ld) for the data load and vec_xst for store.

One more important change is related to the YUV->RGB transform. I searched on web, there are 2 formulas. I checked all the coefficients of CRV, CBU, CGU, CGV in the code and they matches this formula

B = 1.164(Y - 16) + 2.018(U - 128)
G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128)
R = 1.164(Y - 16) + 1.596(V - 128)

(http://www.mplayerhq.hu/DOCS/tech/colorspaces.txt)
OpenCV imgproc/src/color_yuv.cpp uses the same formula in YUV420 - RGB transform.

But as stated in the same page,

Julien (sorry, I can't recall his surname) suggests that there are problems with the above formula

So there is another usage in OpenCV YCrCb - RGB transform following Julien's proposal as:

R = Y + 1.403V'
G = Y - 0.344U' - 0.714V'
B = Y + 1.770U'

While in existing libswscale code, it uses formula as B = Y+2.018 (U-128), etc. I have to change CY and subtract by 16. After this, all the test passes.

I also found some other locations of vec_ld/vec_st. But I have no test data to get to those calling path. So I only made a patch for this ticket. Please review. Thanks!

by Chip Kerchner, 2 years ago

Fixes compiler bug - replace vec_lvsl/vec_perm instructions with vec_xl

comment:8 by Chip Kerchner, 2 years ago

Analyzed by developer: set

I spent some time looking at the cause of this bug. It looks like it is a bug with the gcc compiler for Power in versions 6.x and 7.x (verified with 6.3 and 7.4). It was fixed in version 8.x (verified with 8.3). I was using ffmpeg version 4.1 and a Power 9 ppc64le machine for building and testing.

It appears the compiler is generating the wrong code for little endian machines for the vec_lvsl/vec_perm instruction combos in some cases. If these instructions are replaced with vec_xl, the problem goes away for all versions of the compilers. It should be noted that the compiler is producing warnings for using vec_lvsl being deprecated for little endian.

I would recommend that the ffmpeg_altivec_yuv2rgb.patch be integrated into the project. The above previous patch does have the fix but also includes additional code.

comment:9 by Carl Eugen Hoyos, 2 years ago

Resolution: fixed
Status: reopenedclosed

Fixed by Chip Kerchner in 3a557c5d88b7b15b5954ba2743febb055549b536

comment:10 by marillat, 16 months ago

This patch break the build with gcc 9.3.0 (Debian 9.3.0-13) ffmpeg 4.3

Altivec is enabled with "--cpu=g4"

CC	libswscale/ppc/yuv2rgb_altivec.o
libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_abgr’:
libswscale/ppc/yuv2rgb_altivec.c:338:18: error: implicit declaration of function ‘vec_xl’; did you mean ‘vec_rl’? [-Werror=implicit-function-declaration]
  338 |             y0 = vec_xl(0, y1i);                                              \
      |                  ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:437:1: note: in expansion of macro ‘DEFCSP420_CVT’
  437 | DEFCSP420_CVT(yuv2_abgr,  out_abgr)
      | ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
  338 |             y0 = vec_xl(0, y1i);                                              \
      |                  ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:437:1: note: in expansion of macro ‘DEFCSP420_CVT’
  437 | DEFCSP420_CVT(yuv2_abgr,  out_abgr)
      | ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
  340 |             y1 = vec_xl(0, y2i);                                              \
      |                  ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:437:1: note: in expansion of macro ‘DEFCSP420_CVT’
  437 | DEFCSP420_CVT(yuv2_abgr,  out_abgr)
      | ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:437:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c:437:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_bgra’:
libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
  338 |             y0 = vec_xl(0, y1i);                                              \
      |                  ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:438:1: note: in expansion of macro ‘DEFCSP420_CVT’
  438 | DEFCSP420_CVT(yuv2_bgra,  out_bgra)
      | ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
  340 |             y1 = vec_xl(0, y2i);                                              \
      |                  ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:438:1: note: in expansion of macro ‘DEFCSP420_CVT’
  438 | DEFCSP420_CVT(yuv2_bgra,  out_bgra)
      | ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:438:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c:438:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_rgba’:
libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
  338 |             y0 = vec_xl(0, y1i);                                              \
      |                  ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:439:1: note: in expansion of macro ‘DEFCSP420_CVT’
  439 | DEFCSP420_CVT(yuv2_rgba,  out_rgba)
      | ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
  340 |             y1 = vec_xl(0, y2i);                                              \
      |                  ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:439:1: note: in expansion of macro ‘DEFCSP420_CVT’
  439 | DEFCSP420_CVT(yuv2_rgba,  out_rgba)
      | ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:439:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c:439:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_argb’:
libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
  338 |             y0 = vec_xl(0, y1i);                                              \
      |                  ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:440:1: note: in expansion of macro ‘DEFCSP420_CVT’
  440 | DEFCSP420_CVT(yuv2_argb,  out_argb)
      | ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
  340 |             y1 = vec_xl(0, y2i);                                              \
      |                  ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:440:1: note: in expansion of macro ‘DEFCSP420_CVT’
  440 | DEFCSP420_CVT(yuv2_argb,  out_argb)
      | ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:440:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c:440:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_rgb24’:
libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
  338 |             y0 = vec_xl(0, y1i);                                              \
      |                  ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:441:1: note: in expansion of macro ‘DEFCSP420_CVT’
  441 | DEFCSP420_CVT(yuv2_rgb24, out_rgb24)
      | ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
  340 |             y1 = vec_xl(0, y2i);                                              \
      |                  ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:441:1: note: in expansion of macro ‘DEFCSP420_CVT’
  441 | DEFCSP420_CVT(yuv2_rgb24, out_rgb24)
      | ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:441:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c:441:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_bgr24’:
libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
  338 |             y0 = vec_xl(0, y1i);                                              \
      |                  ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:442:1: note: in expansion of macro ‘DEFCSP420_CVT’
  442 | DEFCSP420_CVT(yuv2_bgr24, out_bgr24)
      | ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
  340 |             y1 = vec_xl(0, y2i);                                              \
      |                  ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:442:1: note: in expansion of macro ‘DEFCSP420_CVT’
  442 | DEFCSP420_CVT(yuv2_bgr24, out_bgr24)
      | ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:442:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c:442:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
cc1: some warnings being treated as errors

comment:11 by marillat, 15 months ago

Resolution: fixed
Status: closedreopened

I reopen this bug has this bug introcuce another bug.

comment:12 by Carl Eugen Hoyos, 15 months ago

Resolution: fixed
Status: reopenedclosed

I am quite sure that the issue reported in this ticket has been fixed.

comment:13 by marillat, 15 months ago

did you read my comment ?

ffmpeg doesn't build with this patch

comment:14 by marillat, 15 months ago

I filled a new bug #8750

Note: See TracTickets for help on using tickets.