#7124 closed defect (fixed)
Unexpected results from YUV-RGB color space conversion optimization on POWER8/LE
| Reported by: | sayed adel | Owned by: | |
|---|---|---|---|
| Priority: | normal | Component: | swscale |
| Version: | git-master | Keywords: | ppc altivec |
| Cc: | dje.gcc@gmail.com | Blocked By: | |
| Blocking: | Reproduced by developer: | yes | |
| Analyzed by developer: | yes |
Description
It seems current implantation of Altivec doesn't support little-endian.
FFmpeg was compiled as:
./configure --prefix=/usr --extra-libs="-lpthread -lm" --enable-gpl --enable-libvpx --enable-libx264 --enable-nonfree --enable-shared --enable-avresample
Run time options:
ffmpeg -i ./Sintel.mp4 -filter:v fps=fps=1/60 /tests/before/frames/%0d.png
A full report about it could be found on an exciting issue at https://github.com/opencv/opencv/issues/11034
Attachments (6)
Change History (20)
comment:1 by , 8 years ago
| Cc: | added |
|---|
comment:2 by , 8 years ago
| Keywords: | ppc added; Altivec VSX ppc64le removed |
|---|---|
| Priority: | important → normal |
| Resolution: | → worksforme |
| Status: | new → closed |
follow-up: 4 comment:3 by , 8 years ago
I'm sorry for not being clear enough.
It's not about the performance but the sanity of conversion.
Please could you look into that link and check my comments on it?
or even try to grep frames from video file and encoded into png format just like I mentioned above?
comment:4 by , 8 years ago
| Keywords: | altivec added |
|---|---|
| Reproduced by developer: | set |
| Resolution: | worksforme |
| Status: | closed → reopened |
| Version: | unspecified → git-master |
Replying to seiko:
It's not about the performance but the sanity of conversion.
Sorry, I realized that by now. Allow me to repeat that the original report was just invalid, post all necessary information including the command line you tested together with the complete, uncut console output here in the ticket, do not use external resources except for large source files.
$ ffmpeg -lavfi testsrc2,format=yuv420p -pix_fmt rgb24 -vcodec png -t 10 out.nut
ffmpeg version N-90615-g53688b6 Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 6.3.1 (GCC) 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065.
configuration: --enable-gpl --cpu=power8 --cc=/opt/at10.0/bin/gcc
libavutil 56. 13.100 / 56. 13.100
libavcodec 58. 17.100 / 58. 17.100
libavformat 58. 10.100 / 58. 10.100
libavdevice 58. 2.100 / 58. 2.100
libavfilter 7. 14.100 / 7. 14.100
libswscale 5. 0.102 / 5. 0.102
libswresample 3. 0.101 / 3. 0.101
libpostproc 55. 0.100 / 55. 0.100
Stream mapping:
format -> Stream #0:0 (png)
Press [q] to stop, [?] for help
[swscaler @ 0x10019995be0] ALTIVEC: Color Space RGB24
Output #0, nut, to 'out.nut':
Metadata:
encoder : Lavf58.10.100
Stream #0:0: Video: png (MPNG / 0x474E504D), rgb24, 320x240 [SAR 1:1 DAR 4:3], q=2-31, 200 kb/s, 25 fps, 51200 tbn, 25 tbc (default)
Metadata:
encoder : Lavc58.17.100 png
frame= 250 fps=0.0 q=-0.0 Lsize= 2289kB time=00:00:09.96 bitrate=1882.4kbits/s speed=56.6x
video:2286kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.130058%
The output file is clearly recognizable but looks broken.
I tested a few of the libavcodec altivec optimizations and they work fine on le (and are tested by fate), this is indeed a specific issue in libswscale.
by , 8 years ago
| Attachment: | libswscale-disable-ppc-yuv2rgb.diff added |
|---|
comment:6 by , 8 years ago
Sorry, I realized that by now. Allow me to repeat that the original report was just invalid, post all necessary information including the command line you tested together with the complete, uncut console output here in the ticket, do not use external resources except for large source files.
Sorry again, I just rushed into it and I thought that provide an external URL for the original report will be just enough since I only investigated on it just to clear OpenCV side, Thanks goes to Haitao Yue who actually discover it.
this is indeed a specific issue in libswscale.
Yes, this how the frame looks like before disable optimization.

And here's after I disable PPC/yuv2rgb by using the patch in attachment.
I think I have to reconsider my review, It seems it has a support for little-endian.. I'm going to give it a try and I may sent a patch to fix it soon enough.
by , 7 years ago
| Attachment: | ffmpeg_7124.patch added |
|---|
comment:7 by , 7 years ago
I did some investigate in this ticket. 2 issues were found.
Firstly, the code yuv2rgb_altivec.c uses vec_ld/vec_st to load and store data. But the compiler automatically do some changes for the perm masks for LE. So it is better use vec_xl (or vec_vsx_ld) for the data load and vec_xst for store.
One more important change is related to the YUV->RGB transform. I searched on web, there are 2 formulas. I checked all the coefficients of CRV, CBU, CGU, CGV in the code and they matches this formula
B = 1.164(Y - 16) + 2.018(U - 128)
G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128)
R = 1.164(Y - 16) + 1.596(V - 128)
(http://www.mplayerhq.hu/DOCS/tech/colorspaces.txt)
OpenCV imgproc/src/color_yuv.cpp uses the same formula in YUV420 - RGB transform.
But as stated in the same page,
Julien (sorry, I can't recall his surname) suggests that there are problems with the above formula
So there is another usage in OpenCV YCrCb - RGB transform following Julien's proposal as:
R = Y + 1.403V'
G = Y - 0.344U' - 0.714V'
B = Y + 1.770U'
While in existing libswscale code, it uses formula as B = Y+2.018 (U-128), etc. I have to change CY and subtract by 16. After this, all the test passes.
I also found some other locations of vec_ld/vec_st. But I have no test data to get to those calling path. So I only made a patch for this ticket. Please review. Thanks!
by , 7 years ago
| Attachment: | ffmpeg_altivec_yuv2rgb.patch added |
|---|
Fixes compiler bug - replace vec_lvsl/vec_perm instructions with vec_xl
comment:8 by , 7 years ago
| Analyzed by developer: | set |
|---|
I spent some time looking at the cause of this bug. It looks like it is a bug with the gcc compiler for Power in versions 6.x and 7.x (verified with 6.3 and 7.4). It was fixed in version 8.x (verified with 8.3). I was using ffmpeg version 4.1 and a Power 9 ppc64le machine for building and testing.
It appears the compiler is generating the wrong code for little endian machines for the vec_lvsl/vec_perm instruction combos in some cases. If these instructions are replaced with vec_xl, the problem goes away for all versions of the compilers. It should be noted that the compiler is producing warnings for using vec_lvsl being deprecated for little endian.
I would recommend that the ffmpeg_altivec_yuv2rgb.patch be integrated into the project. The above previous patch does have the fix but also includes additional code.
comment:9 by , 7 years ago
| Resolution: | → fixed |
|---|---|
| Status: | reopened → closed |
Fixed by Chip Kerchner in 3a557c5d88b7b15b5954ba2743febb055549b536
comment:10 by , 6 years ago
This patch break the build with gcc 9.3.0 (Debian 9.3.0-13) ffmpeg 4.3
Altivec is enabled with "--cpu=g4"
CC libswscale/ppc/yuv2rgb_altivec.o
libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_abgr’:
libswscale/ppc/yuv2rgb_altivec.c:338:18: error: implicit declaration of function ‘vec_xl’; did you mean ‘vec_rl’? [-Werror=implicit-function-declaration]
338 | y0 = vec_xl(0, y1i); \
| ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:437:1: note: in expansion of macro ‘DEFCSP420_CVT’
437 | DEFCSP420_CVT(yuv2_abgr, out_abgr)
| ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
338 | y0 = vec_xl(0, y1i); \
| ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:437:1: note: in expansion of macro ‘DEFCSP420_CVT’
437 | DEFCSP420_CVT(yuv2_abgr, out_abgr)
| ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
340 | y1 = vec_xl(0, y2i); \
| ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:437:1: note: in expansion of macro ‘DEFCSP420_CVT’
437 | DEFCSP420_CVT(yuv2_abgr, out_abgr)
| ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:437:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c:437:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_bgra’:
libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
338 | y0 = vec_xl(0, y1i); \
| ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:438:1: note: in expansion of macro ‘DEFCSP420_CVT’
438 | DEFCSP420_CVT(yuv2_bgra, out_bgra)
| ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
340 | y1 = vec_xl(0, y2i); \
| ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:438:1: note: in expansion of macro ‘DEFCSP420_CVT’
438 | DEFCSP420_CVT(yuv2_bgra, out_bgra)
| ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:438:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c:438:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_rgba’:
libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
338 | y0 = vec_xl(0, y1i); \
| ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:439:1: note: in expansion of macro ‘DEFCSP420_CVT’
439 | DEFCSP420_CVT(yuv2_rgba, out_rgba)
| ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
340 | y1 = vec_xl(0, y2i); \
| ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:439:1: note: in expansion of macro ‘DEFCSP420_CVT’
439 | DEFCSP420_CVT(yuv2_rgba, out_rgba)
| ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:439:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c:439:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_argb’:
libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
338 | y0 = vec_xl(0, y1i); \
| ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:440:1: note: in expansion of macro ‘DEFCSP420_CVT’
440 | DEFCSP420_CVT(yuv2_argb, out_argb)
| ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
340 | y1 = vec_xl(0, y2i); \
| ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:440:1: note: in expansion of macro ‘DEFCSP420_CVT’
440 | DEFCSP420_CVT(yuv2_argb, out_argb)
| ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:440:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c:440:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_rgb24’:
libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
338 | y0 = vec_xl(0, y1i); \
| ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:441:1: note: in expansion of macro ‘DEFCSP420_CVT’
441 | DEFCSP420_CVT(yuv2_rgb24, out_rgb24)
| ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
340 | y1 = vec_xl(0, y2i); \
| ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:441:1: note: in expansion of macro ‘DEFCSP420_CVT’
441 | DEFCSP420_CVT(yuv2_rgb24, out_rgb24)
| ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:441:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c:441:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_bgr24’:
libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
338 | y0 = vec_xl(0, y1i); \
| ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:442:1: note: in expansion of macro ‘DEFCSP420_CVT’
442 | DEFCSP420_CVT(yuv2_bgr24, out_bgr24)
| ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’
340 | y1 = vec_xl(0, y2i); \
| ^~~~~~
libswscale/ppc/yuv2rgb_altivec.c:442:1: note: in expansion of macro ‘DEFCSP420_CVT’
442 | DEFCSP420_CVT(yuv2_bgr24, out_bgr24)
| ^~~~~~~~~~~~~
libswscale/ppc/yuv2rgb_altivec.c:442:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
libswscale/ppc/yuv2rgb_altivec.c:442:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size
cc1: some warnings being treated as errors
comment:11 by , 6 years ago
| Resolution: | fixed |
|---|---|
| Status: | closed → reopened |
I reopen this bug has this bug introcuce another bug.
comment:12 by , 6 years ago
| Resolution: | → fixed |
|---|---|
| Status: | reopened → closed |
I am quite sure that the issue reported in this ticket has been fixed.




This unfortunately has no similarities with a valid ticket.
In any case, altivec optimizations for yuv->rgb work here on Power8/LE 64bit and show a significant speed-up.