#7124 closed defect (fixed)
Unexpected results from YUV-RGB color space conversion optimization on POWER8/LE
Reported by: | sayed adel | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | swscale |
Version: | git-master | Keywords: | ppc altivec |
Cc: | dje.gcc@gmail.com | Blocked By: | |
Blocking: | Reproduced by developer: | yes | |
Analyzed by developer: | yes |
Description
It seems current implantation of Altivec doesn't support little-endian.
FFmpeg was compiled as:
./configure --prefix=/usr --extra-libs="-lpthread -lm" --enable-gpl --enable-libvpx --enable-libx264 --enable-nonfree --enable-shared --enable-avresample
Run time options:
ffmpeg -i ./Sintel.mp4 -filter:v fps=fps=1/60 /tests/before/frames/%0d.png
A full report about it could be found on an exciting issue at https://github.com/opencv/opencv/issues/11034
Attachments (6)
Change History (20)
comment:1 by , 7 years ago
Cc: | added |
---|
comment:2 by , 7 years ago
Keywords: | ppc added; Altivec VSX ppc64le removed |
---|---|
Priority: | important → normal |
Resolution: | → worksforme |
Status: | new → closed |
follow-up: 4 comment:3 by , 7 years ago
I'm sorry for not being clear enough.
It's not about the performance but the sanity of conversion.
Please could you look into that link and check my comments on it?
or even try to grep frames from video file and encoded into png format just like I mentioned above?
comment:4 by , 7 years ago
Keywords: | altivec added |
---|---|
Reproduced by developer: | set |
Resolution: | worksforme |
Status: | closed → reopened |
Version: | unspecified → git-master |
Replying to seiko:
It's not about the performance but the sanity of conversion.
Sorry, I realized that by now. Allow me to repeat that the original report was just invalid, post all necessary information including the command line you tested together with the complete, uncut console output here in the ticket, do not use external resources except for large source files.
$ ffmpeg -lavfi testsrc2,format=yuv420p -pix_fmt rgb24 -vcodec png -t 10 out.nut ffmpeg version N-90615-g53688b6 Copyright (c) 2000-2018 the FFmpeg developers built with gcc 6.3.1 (GCC) 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065. configuration: --enable-gpl --cpu=power8 --cc=/opt/at10.0/bin/gcc libavutil 56. 13.100 / 56. 13.100 libavcodec 58. 17.100 / 58. 17.100 libavformat 58. 10.100 / 58. 10.100 libavdevice 58. 2.100 / 58. 2.100 libavfilter 7. 14.100 / 7. 14.100 libswscale 5. 0.102 / 5. 0.102 libswresample 3. 0.101 / 3. 0.101 libpostproc 55. 0.100 / 55. 0.100 Stream mapping: format -> Stream #0:0 (png) Press [q] to stop, [?] for help [swscaler @ 0x10019995be0] ALTIVEC: Color Space RGB24 Output #0, nut, to 'out.nut': Metadata: encoder : Lavf58.10.100 Stream #0:0: Video: png (MPNG / 0x474E504D), rgb24, 320x240 [SAR 1:1 DAR 4:3], q=2-31, 200 kb/s, 25 fps, 51200 tbn, 25 tbc (default) Metadata: encoder : Lavc58.17.100 png frame= 250 fps=0.0 q=-0.0 Lsize= 2289kB time=00:00:09.96 bitrate=1882.4kbits/s speed=56.6x video:2286kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.130058%
The output file is clearly recognizable but looks broken.
I tested a few of the libavcodec altivec optimizations and they work fine on le (and are tested by fate), this is indeed a specific issue in libswscale.
by , 7 years ago
Attachment: | libswscale-disable-ppc-yuv2rgb.diff added |
---|
comment:6 by , 7 years ago
Sorry, I realized that by now. Allow me to repeat that the original report was just invalid, post all necessary information including the command line you tested together with the complete, uncut console output here in the ticket, do not use external resources except for large source files.
Sorry again, I just rushed into it and I thought that provide an external URL for the original report will be just enough since I only investigated on it just to clear OpenCV side, Thanks goes to Haitao Yue who actually discover it.
this is indeed a specific issue in libswscale.
Yes, this how the frame looks like before disable optimization.
And here's after I disable PPC/yuv2rgb by using the patch in attachment.
I did a quick look into Altivec code on libswscale and I didn't realize any special cases for le or be and also you can easily realize many deprecated intrinsics on it, that why I said "current implementation of Altivec doesn't support little-endian".
by , 6 years ago
Attachment: | ffmpeg_7124.patch added |
---|
comment:7 by , 6 years ago
I did some investigate in this ticket. 2 issues were found.
Firstly, the code yuv2rgb_altivec.c uses vec_ld/vec_st to load and store data. But the compiler automatically do some changes for the perm masks for LE. So it is better use vec_xl (or vec_vsx_ld) for the data load and vec_xst for store.
One more important change is related to the YUV->RGB transform. I searched on web, there are 2 formulas. I checked all the coefficients of CRV, CBU, CGU, CGV in the code and they matches this formula
B = 1.164(Y - 16) + 2.018(U - 128)
G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128)
R = 1.164(Y - 16) + 1.596(V - 128)
(http://www.mplayerhq.hu/DOCS/tech/colorspaces.txt)
OpenCV imgproc/src/color_yuv.cpp uses the same formula in YUV420 - RGB transform.
But as stated in the same page,
Julien (sorry, I can't recall his surname) suggests that there are problems with the above formula
So there is another usage in OpenCV YCrCb - RGB transform following Julien's proposal as:
R = Y + 1.403V'
G = Y - 0.344U' - 0.714V'
B = Y + 1.770U'
While in existing libswscale code, it uses formula as B = Y+2.018 (U-128), etc. I have to change CY and subtract by 16. After this, all the test passes.
I also found some other locations of vec_ld/vec_st. But I have no test data to get to those calling path. So I only made a patch for this ticket. Please review. Thanks!
by , 6 years ago
Attachment: | ffmpeg_altivec_yuv2rgb.patch added |
---|
Fixes compiler bug - replace vec_lvsl/vec_perm instructions with vec_xl
comment:8 by , 6 years ago
Analyzed by developer: | set |
---|
I spent some time looking at the cause of this bug. It looks like it is a bug with the gcc compiler for Power in versions 6.x and 7.x (verified with 6.3 and 7.4). It was fixed in version 8.x (verified with 8.3). I was using ffmpeg version 4.1 and a Power 9 ppc64le machine for building and testing.
It appears the compiler is generating the wrong code for little endian machines for the vec_lvsl/vec_perm instruction combos in some cases. If these instructions are replaced with vec_xl, the problem goes away for all versions of the compilers. It should be noted that the compiler is producing warnings for using vec_lvsl being deprecated for little endian.
I would recommend that the ffmpeg_altivec_yuv2rgb.patch be integrated into the project. The above previous patch does have the fix but also includes additional code.
comment:9 by , 5 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Fixed by Chip Kerchner in 3a557c5d88b7b15b5954ba2743febb055549b536
comment:10 by , 5 years ago
This patch break the build with gcc 9.3.0 (Debian 9.3.0-13) ffmpeg 4.3
Altivec is enabled with "--cpu=g4"
CC libswscale/ppc/yuv2rgb_altivec.o libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_abgr’: libswscale/ppc/yuv2rgb_altivec.c:338:18: error: implicit declaration of function ‘vec_xl’; did you mean ‘vec_rl’? [-Werror=implicit-function-declaration] 338 | y0 = vec_xl(0, y1i); \ | ^~~~~~ libswscale/ppc/yuv2rgb_altivec.c:437:1: note: in expansion of macro ‘DEFCSP420_CVT’ 437 | DEFCSP420_CVT(yuv2_abgr, out_abgr) | ^~~~~~~~~~~~~ libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’ 338 | y0 = vec_xl(0, y1i); \ | ^~~~~~ libswscale/ppc/yuv2rgb_altivec.c:437:1: note: in expansion of macro ‘DEFCSP420_CVT’ 437 | DEFCSP420_CVT(yuv2_abgr, out_abgr) | ^~~~~~~~~~~~~ libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’ 340 | y1 = vec_xl(0, y2i); \ | ^~~~~~ libswscale/ppc/yuv2rgb_altivec.c:437:1: note: in expansion of macro ‘DEFCSP420_CVT’ 437 | DEFCSP420_CVT(yuv2_abgr, out_abgr) | ^~~~~~~~~~~~~ libswscale/ppc/yuv2rgb_altivec.c:437:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size libswscale/ppc/yuv2rgb_altivec.c:437:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_bgra’: libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’ 338 | y0 = vec_xl(0, y1i); \ | ^~~~~~ libswscale/ppc/yuv2rgb_altivec.c:438:1: note: in expansion of macro ‘DEFCSP420_CVT’ 438 | DEFCSP420_CVT(yuv2_bgra, out_bgra) | ^~~~~~~~~~~~~ libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’ 340 | y1 = vec_xl(0, y2i); \ | ^~~~~~ libswscale/ppc/yuv2rgb_altivec.c:438:1: note: in expansion of macro ‘DEFCSP420_CVT’ 438 | DEFCSP420_CVT(yuv2_bgra, out_bgra) | ^~~~~~~~~~~~~ libswscale/ppc/yuv2rgb_altivec.c:438:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size libswscale/ppc/yuv2rgb_altivec.c:438:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_rgba’: libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’ 338 | y0 = vec_xl(0, y1i); \ | ^~~~~~ libswscale/ppc/yuv2rgb_altivec.c:439:1: note: in expansion of macro ‘DEFCSP420_CVT’ 439 | DEFCSP420_CVT(yuv2_rgba, out_rgba) | ^~~~~~~~~~~~~ libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’ 340 | y1 = vec_xl(0, y2i); \ | ^~~~~~ libswscale/ppc/yuv2rgb_altivec.c:439:1: note: in expansion of macro ‘DEFCSP420_CVT’ 439 | DEFCSP420_CVT(yuv2_rgba, out_rgba) | ^~~~~~~~~~~~~ libswscale/ppc/yuv2rgb_altivec.c:439:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size libswscale/ppc/yuv2rgb_altivec.c:439:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_argb’: libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’ 338 | y0 = vec_xl(0, y1i); \ | ^~~~~~ libswscale/ppc/yuv2rgb_altivec.c:440:1: note: in expansion of macro ‘DEFCSP420_CVT’ 440 | DEFCSP420_CVT(yuv2_argb, out_argb) | ^~~~~~~~~~~~~ libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’ 340 | y1 = vec_xl(0, y2i); \ | ^~~~~~ libswscale/ppc/yuv2rgb_altivec.c:440:1: note: in expansion of macro ‘DEFCSP420_CVT’ 440 | DEFCSP420_CVT(yuv2_argb, out_argb) | ^~~~~~~~~~~~~ libswscale/ppc/yuv2rgb_altivec.c:440:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size libswscale/ppc/yuv2rgb_altivec.c:440:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_rgb24’: libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’ 338 | y0 = vec_xl(0, y1i); \ | ^~~~~~ libswscale/ppc/yuv2rgb_altivec.c:441:1: note: in expansion of macro ‘DEFCSP420_CVT’ 441 | DEFCSP420_CVT(yuv2_rgb24, out_rgb24) | ^~~~~~~~~~~~~ libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’ 340 | y1 = vec_xl(0, y2i); \ | ^~~~~~ libswscale/ppc/yuv2rgb_altivec.c:441:1: note: in expansion of macro ‘DEFCSP420_CVT’ 441 | DEFCSP420_CVT(yuv2_rgb24, out_rgb24) | ^~~~~~~~~~~~~ libswscale/ppc/yuv2rgb_altivec.c:441:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size libswscale/ppc/yuv2rgb_altivec.c:441:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size libswscale/ppc/yuv2rgb_altivec.c: In function ‘altivec_yuv2_bgr24’: libswscale/ppc/yuv2rgb_altivec.c:338:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’ 338 | y0 = vec_xl(0, y1i); \ | ^~~~~~ libswscale/ppc/yuv2rgb_altivec.c:442:1: note: in expansion of macro ‘DEFCSP420_CVT’ 442 | DEFCSP420_CVT(yuv2_bgr24, out_bgr24) | ^~~~~~~~~~~~~ libswscale/ppc/yuv2rgb_altivec.c:340:18: error: incompatible types when assigning to type ‘__vector unsigned char’ {aka ‘__vector(16) unsigned char’} from type ‘int’ 340 | y1 = vec_xl(0, y2i); \ | ^~~~~~ libswscale/ppc/yuv2rgb_altivec.c:442:1: note: in expansion of macro ‘DEFCSP420_CVT’ 442 | DEFCSP420_CVT(yuv2_bgr24, out_bgr24) | ^~~~~~~~~~~~~ libswscale/ppc/yuv2rgb_altivec.c:442:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size libswscale/ppc/yuv2rgb_altivec.c:442:1: error: can’t convert a value of type ‘int’ to vector type ‘__vector(16) signed char’ which has different size cc1: some warnings being treated as errors
comment:11 by , 5 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
I reopen this bug has this bug introcuce another bug.
comment:12 by , 5 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
I am quite sure that the issue reported in this ticket has been fixed.
This unfortunately has no similarities with a valid ticket.
In any case, altivec optimizations for yuv->rgb work here on Power8/LE 64bit and show a significant speed-up.