Opened 15 months ago

Last modified 5 months ago

#7124 reopened defect

Unexpected results from YUV-RGB color space conversion optimization on POWER8/LE

Reported by: seiko Owned by:
Priority: normal Component: swscale
Version: git-master Keywords: ppc altivec
Cc: dje.gcc@gmail.com Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: no

Description

It seems current implantation of Altivec doesn't support little-endian.

FFmpeg was compiled as:
./configure --prefix=/usr --extra-libs="-lpthread -lm" --enable-gpl --enable-libvpx --enable-libx264 --enable-nonfree --enable-shared --enable-avresample

Run time options:
ffmpeg -i ./Sintel.mp4 -filter:v fps=fps=1/60 /tests/before/frames/%0d.png

A full report about it could be found on an exciting issue at https://github.com/opencv/opencv/issues/11034

Attachments (5)

libswscale-disable-ppc-yuv2rgb.diff (367 bytes) - added by seiko 15 months ago.
after.png (372.5 KB) - added by seiko 15 months ago.
after disable PPC/ YUV-RGB optimization
before.png (390.3 KB) - added by seiko 15 months ago.
With PPC/YUV-RGB optimization enabled
Dockerfile (2.4 KB) - added by seiko 15 months ago.
Used during build and test
ffmpeg_7124.patch (10.1 KB) - added by winice 5 months ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 15 months ago by edelsohn

  • Cc dje.gcc@gmail.com added

comment:2 Changed 15 months ago by cehoyos

  • Keywords ppc added; Altivec VSX ppc64le removed
  • Priority changed from important to normal
  • Resolution set to worksforme
  • Status changed from new to closed

This unfortunately has no similarities with a valid ticket.

In any case, altivec optimizations for yuv->rgb work here on Power8/LE 64bit and show a significant speed-up.

comment:3 follow-up: Changed 15 months ago by seiko

I'm sorry for not being clear enough.

It's not about the performance but the sanity of conversion.

Please could you look into that link and check my comments on it?
or even try to grep frames from video file and encoded into png format just like I mentioned above?

comment:4 in reply to: ↑ 3 Changed 15 months ago by cehoyos

  • Keywords altivec added
  • Reproduced by developer set
  • Resolution worksforme deleted
  • Status changed from closed to reopened
  • Version changed from unspecified to git-master

Replying to seiko:

It's not about the performance but the sanity of conversion.

Sorry, I realized that by now. Allow me to repeat that the original report was just invalid, post all necessary information including the command line you tested together with the complete, uncut console output here in the ticket, do not use external resources except for large source files.

$ ffmpeg -lavfi testsrc2,format=yuv420p -pix_fmt rgb24 -vcodec png -t 10 out.nut
ffmpeg version N-90615-g53688b6 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 6.3.1 (GCC) 20170515 (Advance-Toolchain-at10.0) IBM AT 10 branch, based on subversion id 248065.
  configuration: --enable-gpl --cpu=power8 --cc=/opt/at10.0/bin/gcc
  libavutil      56. 13.100 / 56. 13.100
  libavcodec     58. 17.100 / 58. 17.100
  libavformat    58. 10.100 / 58. 10.100
  libavdevice    58.  2.100 / 58.  2.100
  libavfilter     7. 14.100 /  7. 14.100
  libswscale      5.  0.102 /  5.  0.102
  libswresample   3.  0.101 /  3.  0.101
  libpostproc    55.  0.100 / 55.  0.100
Stream mapping:
  format -> Stream #0:0 (png)
Press [q] to stop, [?] for help
[swscaler @ 0x10019995be0] ALTIVEC: Color Space RGB24
Output #0, nut, to 'out.nut':
  Metadata:
    encoder         : Lavf58.10.100
    Stream #0:0: Video: png (MPNG / 0x474E504D), rgb24, 320x240 [SAR 1:1 DAR 4:3], q=2-31, 200 kb/s, 25 fps, 51200 tbn, 25 tbc (default)
    Metadata:
      encoder         : Lavc58.17.100 png
frame=  250 fps=0.0 q=-0.0 Lsize=    2289kB time=00:00:09.96 bitrate=1882.4kbits/s speed=56.6x
video:2286kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.130058%

The output file is clearly recognizable but looks broken.
I tested a few of the libavcodec altivec optimizations and they work fine on le (and are tested by fate), this is indeed a specific issue in libswscale.

comment:5 Changed 15 months ago by cehoyos

Works fine with -cpuflags 0.

Changed 15 months ago by seiko

Changed 15 months ago by seiko

after disable PPC/ YUV-RGB optimization

Changed 15 months ago by seiko

With PPC/YUV-RGB optimization enabled

Changed 15 months ago by seiko

Used during build and test

comment:6 Changed 15 months ago by seiko

Sorry, I realized that by now. Allow me to repeat that the original report was just invalid, post all necessary information including the command line you tested together with the complete, uncut console output here in the ticket, do not use external resources except for large source files.

Sorry again, I just rushed into it and I thought that provide an external URL for the original report will be just enough since I only investigated on it just to clear OpenCV side, Thanks goes to Haitao Yue who actually discover it.


this is indeed a specific issue in libswscale.

Yes, this how the frame looks like before disable optimization.
With PPC/YUV-RGB optimization enabled

And here's after I disable PPC/yuv2rgb by using the patch in attachment.

after disable PPC/ YUV-RGB optimization

I think I have to reconsider my review, It seems it has a support for little-endian.. I'm going to give it a try and I may sent a patch to fix it soon enough.

Last edited 15 months ago by seiko (previous) (diff)

Changed 5 months ago by winice

comment:7 Changed 5 months ago by winice

I did some investigate in this ticket. 2 issues were found.

Firstly, the code yuv2rgb_altivec.c uses vec_ld/vec_st to load and store data. But the compiler automatically do some changes for the perm masks for LE. So it is better use vec_xl (or vec_vsx_ld) for the data load and vec_xst for store.

One more important change is related to the YUV->RGB transform. I searched on web, there are 2 formulas. I checked all the coefficients of CRV, CBU, CGU, CGV in the code and they matches this formula

B = 1.164(Y - 16) + 2.018(U - 128)
G = 1.164(Y - 16) - 0.813(V - 128) - 0.391(U - 128)
R = 1.164(Y - 16) + 1.596(V - 128)

(http://www.mplayerhq.hu/DOCS/tech/colorspaces.txt)
OpenCV imgproc/src/color_yuv.cpp uses the same formula in YUV420 - RGB transform.

But as stated in the same page,

Julien (sorry, I can't recall his surname) suggests that there are problems with the above formula

So there is another usage in OpenCV YCrCb - RGB transform following Julien's proposal as:

R = Y + 1.403V'
G = Y - 0.344U' - 0.714V'
B = Y + 1.770U'

While in existing libswscale code, it uses formula as B = Y+2.018 (U-128), etc. I have to change CY and subtract by 16. After this, all the test passes.

I also found some other locations of vec_ld/vec_st. But I have no test data to get to those calling path. So I only made a patch for this ticket. Please review. Thanks!

Note: See TracTickets for help on using tickets.