Opened 8 years ago

Last modified 3 years ago

#5570 open enhancement

POWER8 VSX vectorization libswscale/input.c

Reported by: David Edelsohn Owned by:
Priority: wish Component: swscale
Version: git-master Keywords: bounty vsx
Cc: linuxer@xakep.ru, daniel@pocock.pro Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Optimize approximately 50 functions in libswscale/input.c for POWER8 VSX SIMD instructions on PPC64 Linux.

Change History (15)

comment:1 by David Edelsohn, 8 years ago

Keywords: bounty added
Version: unspecifiedgit-master

comment:3 by Mike Lieman, 8 years ago

commit 1df908f33f658979b32599489ca6f1a39821013c breaks build on not POWER8 VSX SIMD

make -C ffmpeg libswscale/libswscale.a
make[1]: Entering directory '/home/mike/software/mplayer/ffmpeg'
CC libswscale/swscale.o
libswscale/swscale.c:569:9: error: use of undeclared identifier 'HAVE_VSX'

if (HAVE_VSX && (!HAVE_BIGENDIAN)) {

common.mak:60: recipe for target 'libswscale/swscale.o' failed
make[1]: Leaving directory '/home/mike/software/mplayer/ffmpeg'
Makefile:740: recipe for target 'ffmpeg/libswscale/libswscale.a' failed

comment:4 by Carl Eugen Hoyos, 7 years ago

Keywords: vsx added

comment:5 by reverse_forever, 4 years ago

Resolution: fixed
Status: openclosed

comment:6 by Carl Eugen Hoyos, 4 years ago

Resolution: fixed
Status: closedreopened

comment:7 by reverse_forever, 4 years ago

Cc: linuxer@xakep.ru added

PPC input.c coverage now exceeds no vsx cpu. Speedups 2-12x, usually in line with no vsx cpu. Some functions got speed less than cpu, because it is impossible to vectorize them correctly(For example palToUV, palToY)I commented on them. Some functions got speed equivalent no vsx cpu(For example planar_rgb16_to_a)

Sorry for accidentally closing the ticket. As the reporter, you should close it.

comment:8 by Carl Eugen Hoyos, 4 years ago

Status: reopenedopen

You should remove the pal functions from your patch: They were not needed, even more so if they show no significant speed improvement.

On this bug tracker, tickets get closed once the patch gets committed.

comment:9 by reverse_forever, 4 years ago

Yes, pal functions was removed from patch.

comment:10 by reverse_forever, 4 years ago

Please, check my patch. I have been waiting for its commiting more than a year. I want to get my bounty from bountysource platform.

comment:11 by reverse_forever, 4 years ago

Please, check my patch. I have been waiting for its commiting more than a year. I want to get my bounty.

comment:12 by pocock, 3 years ago

Cc: daniel@pocock.pro added

I tried to add the patch to a build of 4.3.1 on Debian buster. It fails with some minor errors.

a) the timer.h includes and STOP_TIMER macros need to be removed, like this
https://github.com/FFmpeg/FFmpeg/commit/cc2a9509ce79793fcd00f91bb6ca3f4c53721e9e

b) some functions are commented out and attempts to use them cause an error

There are also some compiler warnings, it would be nice to tidy them up.

gcc -I. -Isrc/ -Wdate-time -D_FORTIFY_SOURCE=2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -D_ISOC99_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_POSIX_C_SOURCE=200112 -D_XOPEN_SOURCE=600 -DZLIB_CONST -DHAVE_AV_CONFIG_H -DBUILDING_swscale -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-overflow -fstack-protector-all -fPIE   -std=c11 -fomit-frame-pointer -maltivec -mabi=altivec -mvsx -pthread  -I/usr/include/p11-kit-1  -I/usr/include/lilv-0 -I/usr/include/sratom-0 -I/usr/include/sord-0 -I/usr/include/serd-0 -I/usr/include/harfbuzz -I/usr/include/glib-2.0 -I/usr/lib/powerpc64le-linux-gnu/glib-2.0/include -I/usr/include/uuid -I/usr/include/fribidi -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/libxml2 -I/usr/include/uuid -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/bs2b    -I/usr/include/libdrm -I/usr/include/uuid -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/fribidi  -I/usr/include/openjpeg-2.3  -I/usr/include/opus -I/usr/include/opus -D_REENTRANT  -pthread -I/usr/include/librsvg-2.0 -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib/powerpc64le-linux-gnu/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/uuid -I/usr/include/freetype2 -I/usr/include/libpng16                -isystem /usr/include/mit-krb5 -I/usr/include/pgm-5.2  -I/usr/include/libxml2  -I/usr/include/sphinxbase -I/usr/include/pocketsphinx -I/usr/include/powerpc64le-linux-gnu -I/usr/include/powerpc64le-linux-gnu/sphinxbase -I/usr/include/alsa          -g -Wdeclaration-after-statement -Wall -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wwrite-strings -Wtype-limits -Wundef -Wmissing-prototypes -Wno-pointer-to-int-cast -Wstrict-prototypes -Wempty-body -Wno-parentheses -Wno-switch -Wno-format-zero-length -Wno-pointer-sign -Wno-unused-const-variable -Wno-bool-operation -Wno-char-subscripts -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -Werror=format-security -Werror=implicit-function-declaration -Werror=missing-prototypes -Werror=return-type -Werror=vla -Wformat -fdiagnostics-color=auto -Wno-maybe-uninitialized -D_REENTRANT -I/usr/include/SDL2  -MMD -MF libswscale/ppc/swscale_altivec.d -MT libswscale/ppc/swscale_altivec.o -c -o libswscale/ppc/swscale_altivec.o src/libswscale/ppc/swscale_altivec.c
src/libswscale/ppc/input_vsx.c: In function ‘rgb64ToUV_c_template_vsx’:
src/libswscale/ppc/input_vsx.c:192:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     int i, width_adj, is_BE ;
     ^~~
src/libswscale/ppc/input_vsx.c: In function ‘rgb48ToUV_c_template_vsx’:
src/libswscale/ppc/input_vsx.c:483:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     int i, width_adj, is_BE ;
     ^~~
In file included from src/libavutil/internal.h:42,
                 from src/libavutil/common.h:533,
                 from src/libavutil/avutil.h:296,
                 from src/libswscale/ppc/input_vsx.c:33:
src/libswscale/ppc/input_vsx.c: In function ‘rgb48ToUV_half_c_template’:
src/libavutil/timer.h:134:5: error: ‘tend’ undeclared (first use in this function); did you mean ‘rand’?
     tend = AV_READ_TIME();                                                \
     ^~~~
src/libswscale/ppc/input_vsx.c:630:5: note: in expansion of macro ‘STOP_TIMER’
     STOP_TIMER("3.1")
     ^~~~~~~~~~
src/libavutil/timer.h:134:5: note: each undeclared identifier is reported only once for each function it appears in
     tend = AV_READ_TIME();                                                \
     ^~~~
src/libswscale/ppc/input_vsx.c:630:5: note: in expansion of macro ‘STOP_TIMER’
     STOP_TIMER("3.1")
     ^~~~~~~~~~
In file included from src/libavutil/common.h:106,
                 from src/libavutil/avutil.h:296,
                 from src/libswscale/ppc/input_vsx.c:33:
src/libavutil/timer.h:135:29: error: ‘tstart’ undeclared (first use in this function); did you mean ‘qsort’?
     TIMER_REPORT(id, tend - tstart)
                             ^~~~~~
src/libavutil/intmath.h:39:44: note: in definition of macro ‘ff_log2’
 #   define ff_log2(x) (31 - __builtin_clz((x)|1))
                                            ^
src/libavutil/timer.h:135:5: note: in expansion of macro ‘TIMER_REPORT’
     TIMER_REPORT(id, tend - tstart)
     ^~~~~~~~~~~~
src/libswscale/ppc/input_vsx.c:630:5: note: in expansion of macro ‘STOP_TIMER’
     STOP_TIMER("3.1")
     ^~~~~~~~~~
src/libswscale/ppc/input_vsx.c: In function ‘rgb16_32ToY_c_template_vsx’:
src/libswscale/ppc/input_vsx.c:710:51: warning: unused variable ‘v_val’ [-Wunused-variable]
     vector signed short v_rd0, v_rd1, v_px,v_sign,v_val;
                                                   ^~~~~
src/libswscale/ppc/input_vsx.c:710:44: warning: unused variable ‘v_sign’ [-Wunused-variable]
     vector signed short v_rd0, v_rd1, v_px,v_sign,v_val;
                                            ^~~~~~
src/libswscale/ppc/input_vsx.c: In function ‘rgb16_32ToUV_c_template_vsx’:
src/libswscale/ppc/input_vsx.c:859:53: warning: unused variable ‘v_val’ [-Wunused-variable]
     vector signed short v_rd0, v_rd1, v_px, v_sign, v_val;
                                                     ^~~~~
src/libswscale/ppc/input_vsx.c:859:45: warning: unused variable ‘v_sign’ [-Wunused-variable]
     vector signed short v_rd0, v_rd1, v_px, v_sign, v_val;
                                             ^~~~~~
src/libswscale/ppc/input_vsx.c: In function ‘rgb16_32ToUV_half_c_template_vsx’:
src/libswscale/ppc/input_vsx.c:1035:47: warning: unused variable ‘v_val’ [-Wunused-variable]
     vector signed short v_rd0, v_rd1, v_sign, v_val;
                                               ^~~~~
src/libswscale/ppc/input_vsx.c:1035:39: warning: unused variable ‘v_sign’ [-Wunused-variable]
     vector signed short v_rd0, v_rd1, v_sign, v_val;
                                       ^~~~~~
src/libswscale/ppc/input_vsx.c: In function ‘uyvyToY_c_vsx’:
src/libswscale/ppc/input_vsx.c:2349:35: warning: unused variable ‘sample2’ [-Wunused-variable]
     vector unsigned char sample1, sample2;
                                   ^~~~~~~
src/libswscale/ppc/input_vsx.c: In function ‘p010LEToY_c_vsx’:
src/libswscale/ppc/input_vsx.c:2486:26: warning: unused variable ‘sample’ [-Wunused-variable]
     vector unsigned char sample;
                          ^~~~~~
src/libswscale/ppc/input_vsx.c: In function ‘rgb24ToUV_c_vsx’:
src/libswscale/ppc/input_vsx.c:3131:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     int i, width_adj;
     ^~~
src/libswscale/ppc/input_vsx.c: In function ‘rgb24ToUV_half_c_vsx’:
src/libswscale/ppc/input_vsx.c:3241:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     int i, width_adj;
     ^~~
src/libswscale/ppc/input_vsx.c: In function ‘planar_rgb16_to_y_vsx’:
src/libswscale/ppc/input_vsx.c:3624:15: warning: unused variable ‘src_addr’ [-Wunused-variable]
     uintptr_t src_addr = (uintptr_t)src;
               ^~~~~~~~
src/libswscale/ppc/input_vsx.c:3614:41: warning: unused variable ‘v_rd2’ [-Wunused-variable]
     vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1;
                                         ^~~~~
src/libswscale/ppc/input_vsx.c:3614:34: warning: unused variable ‘v_rd1’ [-Wunused-variable]
     vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1;
                                  ^~~~~
src/libswscale/ppc/input_vsx.c:3614:27: warning: unused variable ‘v_rd0’ [-Wunused-variable]
     vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1;
                           ^~~~~
src/libswscale/ppc/input_vsx.c: In function ‘planar_rgb16_to_a_vsx’:
src/libswscale/ppc/input_vsx.c:3704:34: warning: unused variable ‘v_a’ [-Wunused-variable]
     vector unsigned short v_rd0, v_a, v_dst, shift;
                                  ^~~
src/libswscale/ppc/input_vsx.c: In function ‘planar_rgb16_to_uv_vsx’:
src/libswscale/ppc/input_vsx.c:3742:41: warning: unused variable ‘v_rd2’ [-Wunused-variable]
     vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1;
                                         ^~~~~
src/libswscale/ppc/input_vsx.c:3742:34: warning: unused variable ‘v_rd1’ [-Wunused-variable]
     vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1;
                                  ^~~~~
src/libswscale/ppc/input_vsx.c:3742:27: warning: unused variable ‘v_rd0’ [-Wunused-variable]
     vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1;
                           ^~~~~
In file included from src/libavutil/internal.h:42,
                 from src/libavutil/common.h:533,
                 from src/libavutil/avutil.h:296,
                 from src/libswscale/ppc/input_vsx.c:33:
src/libswscale/ppc/input_vsx.c: In function ‘grayf32ToY16_c_vsx’:
src/libavutil/timer.h:134:5: error: ‘tend’ undeclared (first use in this function); did you mean ‘rand’?
     tend = AV_READ_TIME();                                                \
     ^~~~
src/libswscale/ppc/input_vsx.c:3864:5: note: in expansion of macro ‘STOP_TIMER’
     STOP_TIMER("47")
     ^~~~~~~~~~
In file included from src/libavutil/common.h:106,
                 from src/libavutil/avutil.h:296,
                 from src/libswscale/ppc/input_vsx.c:33:
src/libavutil/timer.h:135:29: error: ‘tstart’ undeclared (first use in this function); did you mean ‘qsort’?
     TIMER_REPORT(id, tend - tstart)
                             ^~~~~~
src/libavutil/intmath.h:39:44: note: in definition of macro ‘ff_log2’
 #   define ff_log2(x) (31 - __builtin_clz((x)|1))
                                            ^
src/libavutil/timer.h:135:5: note: in expansion of macro ‘TIMER_REPORT’
     TIMER_REPORT(id, tend - tstart)
     ^~~~~~~~~~~~
src/libswscale/ppc/input_vsx.c:3864:5: note: in expansion of macro ‘STOP_TIMER’
     STOP_TIMER("47")
     ^~~~~~~~~~
In file included from src/libavutil/internal.h:42,
                 from src/libavutil/common.h:533,
                 from src/libavutil/avutil.h:296,
                 from src/libswscale/ppc/input_vsx.c:33:
src/libswscale/ppc/input_vsx.c: In function ‘grayf32ToY16_bswap_c_vsx’:
src/libavutil/timer.h:134:5: error: ‘tend’ undeclared (first use in this function); did you mean ‘rand’?
     tend = AV_READ_TIME();                                                \
     ^~~~
src/libswscale/ppc/input_vsx.c:3878:5: note: in expansion of macro ‘STOP_TIMER’
     STOP_TIMER("48")
     ^~~~~~~~~~
In file included from src/libavutil/common.h:106,
                 from src/libavutil/avutil.h:296,
                 from src/libswscale/ppc/input_vsx.c:33:
src/libavutil/timer.h:135:29: error: ‘tstart’ undeclared (first use in this function); did you mean ‘qsort’?
     TIMER_REPORT(id, tend - tstart)
                             ^~~~~~
src/libavutil/intmath.h:39:44: note: in definition of macro ‘ff_log2’
 #   define ff_log2(x) (31 - __builtin_clz((x)|1))
                                            ^
src/libavutil/timer.h:135:5: note: in expansion of macro ‘TIMER_REPORT’
     TIMER_REPORT(id, tend - tstart)
     ^~~~~~~~~~~~
src/libswscale/ppc/input_vsx.c:3878:5: note: in expansion of macro ‘STOP_TIMER’
     STOP_TIMER("48")
     ^~~~~~~~~~
src/libswscale/ppc/input_vsx.c: In function ‘ff_sws_init_input_funcs_vsx’:
src/libswscale/ppc/input_vsx.c:4029:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     enum AVPixelFormat srcFormat = c->srcFormat;
     ^~~~
src/libswscale/ppc/input_vsx.c:4055:24: error: ‘palToUV_c_vsx’ undeclared (first use in this function); did you mean ‘palToA_c_vsx’?
         c->chrToYV12 = palToUV_c_vsx;
                        ^~~~~~~~~~~~~
                        palToA_c_vsx
src/libswscale/ppc/input_vsx.c:4468:24: error: ‘palToY_c_vsx’ undeclared (first use in this function); did you mean ‘palToA_c_vsx’?
         c->lumToYV12 = palToY_c_vsx;
                        ^~~~~~~~~~~~
                        palToA_c_vsx
make[2]: *** [/<<PKGBUILDDIR>>/ffbuild/common.mak:59: libswscale/ppc/input_vsx.o] Error 1

comment:13 by reverse_forever, 3 years ago

Hi! Functions that were commented out did not well speed. The actual patch with little fixes and without START_TIME/STOP_TIME macroses can be taken from https://patchwork.ffmpeg.org/project/ffmpeg/patch/20201230120822.8891-1-pestov.vyach@yandex.ru/

comment:14 by pocock, 3 years ago

Could this patch be related to #9077 ?

comment:15 by pocock, 3 years ago

I copied the patches from this bug #5570 and also #9077 into the build tree for the Debian package version 7:4.3.2-0+deb11u2. I put everything in this branch based on the Debian packaging:
https://gitlab.com/debify/ffmpeg/-/tree/pocock/bullseye-ppc64el

I ran the build on a Debian 11 (bullseye) system

dpkg-buildpackage -rfakeroot -i.git -j65 -b --no-sign

Building with both patches some of the unit tests fail.

Building without the patch here for #5077 the unit tests pass.

I've copied the test output below

--- BUILD_DIR/tests/ref/fate/filter-pixfmts-scale       2021-09-19 17:23:40.129078035 +0200
+++ tests/data/fate/filter-pixfmts-scale        2021-09-21 07:43:08.567738778 +0200
@@ -9,12 +9,12 @@
 bgr444le            01be36a28ebca1a11eb4d192986cd4e9
 bgr48be             3ae02769c69d2512eaa26fff65763acb
 bgr48le             a6ce2344f07b77438258b6787fe5c24c
-bgr4_byte           01efea74088e5e3343c19ee053b95f31
+bgr4_byte           191a304d3ab99e7f3e7c0e00b353d928
 bgr555be            ab353278d103d379e1ec86e5cabb645f
 bgr555le            16ccbf59297e4b9ab25fd8af5a84a95d
 bgr565be            3477e19fc11f95285836f30fdff26c1d
 bgr565le            82a81e7c9d4e0431fa22f4df9694afdc
-bgr8                2c57e76ccf04d51de6acafcf35d6fa70
+bgr8                2b8447aa9ec0a1c19efb03d5f98fe8d7
 bgra                d8316272bc3a360ef9dff3ecc84520a3
 bgra64be            4e6a1b9f9c18b881c27d76611d45f737
 bgra64le            efeee0abcc658ebcff049d5e74d74943
@@ -63,19 +63,19 @@
 p010le              4b316f2b9e18972299beb73511278fa8
 p016be              31e204018cbb53f8988c4e1174ea8ce9
 p016le              d5afe557f492a09317e525d7cb782f5b
-pal8                29e10892009b2cfe431815ec3052ed3b
+pal8                0c9a2544b27b62e3d1fd0809bae67bbd
 rgb0                fbd27e98154efb7535826afed41e9bb0
 rgb24               e022e741451e81f2ecce1c7240b93e87
 rgb444be            db52b9ecdf98479b693e3f4bd9e77bac
 rgb444le            63288425c05f146cde5c82b85bb126e0
 rgb48be             45b25016f10d54cf36eef3479afd8249
 rgb48le             40577b147620ecfb115717473d000697
-rgb4_byte           9e540a2e7193ebcbf1c7f85d192a0c4e
+rgb4_byte           03192eea530b12d078edba6446a4d233
 rgb555be            cb5407a0d40f3d0120155daeaaa9a222
 rgb555le            c15540d1fc887882c35860634009c439
 rgb565be            c69fa7d6e458509de65e911d147629a8
 rgb565le            a4a6ef89cdc10282b428cb1392f2a353
-rgb8                bcdc033b4ef0979d060dbc8893d4db58
+rgb8                092fb66393af3d56713d49f652405de4
 rgba                85bb5d03cea1c6e8002ced3373904336
 rgba64be            ee73e57923af984b31cc7795d13929da
 rgba64le            783d2779adfafe3548bdb671ec0de69e
 BUILD_DIR/debian/standard/ffmpeg -nostdin -nostats -cpuflags all -flags +bitexact -idct simple -sws_flags +accurate_rnd+bitexact -fflags +bitexact -threads 1 -f image2 -vcodec pgmyuv -hwaccel none -threads 1 -thread_type frame+slice -i BUILD_DIR/debian/standard/tests/vsynth1/%02d.pgm -flags +bitexact -sws_flags +accurate_rnd+bitexact -fflags +bitexact -flags +bitexact -idct simple -sws_flags +accurate_rnd+bitexact -fflags +bitexact -threads 1 -dct fastint -vf format=yvyu422,vflip= -vcodec rawvideo -frames:v 5 -pix_fmt yvyu422 -frames:v 1 -f nut md5:
Test filter-pixfmts-scale failed. Look at tests/data/fate/filter-pixfmts-scale.err for details.
Note: See TracTickets for help on using tickets.