Opened 5 years ago

Last modified 4 months ago

#5570 open enhancement

POWER8 VSX vectorization libswscale/input.c

Reported by: David Edelsohn Owned by:
Priority: wish Component: swscale
Version: git-master Keywords: bounty vsx
Cc: linuxer@xakep.ru, daniel@pocock.pro Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Optimize approximately 50 functions in libswscale/input.c for POWER8 VSX SIMD instructions on PPC64 Linux.

Change History (14)

comment:1 by David Edelsohn, 5 years ago

Keywords: bounty added
Version: unspecifiedgit-master

comment:3 by Mike Lieman, 5 years ago

commit 1df908f33f658979b32599489ca6f1a39821013c breaks build on not POWER8 VSX SIMD

make -C ffmpeg libswscale/libswscale.a
make[1]: Entering directory '/home/mike/software/mplayer/ffmpeg'
CC libswscale/swscale.o
libswscale/swscale.c:569:9: error: use of undeclared identifier 'HAVE_VSX'

if (HAVE_VSX && (!HAVE_BIGENDIAN)) {

common.mak:60: recipe for target 'libswscale/swscale.o' failed
make[1]: Leaving directory '/home/mike/software/mplayer/ffmpeg'
Makefile:740: recipe for target 'ffmpeg/libswscale/libswscale.a' failed

comment:4 by Carl Eugen Hoyos, 5 years ago

Keywords: vsx added

comment:5 by reverse_forever, 13 months ago

Resolution: fixed
Status: openclosed

comment:6 by Carl Eugen Hoyos, 13 months ago

Resolution: fixed
Status: closedreopened

comment:7 by reverse_forever, 13 months ago

Cc: linuxer@xakep.ru added

PPC input.c coverage now exceeds no vsx cpu. Speedups 2-12x, usually in line with no vsx cpu. Some functions got speed less than cpu, because it is impossible to vectorize them correctly(For example palToUV, palToY)I commented on them. Some functions got speed equivalent no vsx cpu(For example planar_rgb16_to_a)

Sorry for accidentally closing the ticket. As the reporter, you should close it.

comment:8 by Carl Eugen Hoyos, 13 months ago

Status: reopenedopen

You should remove the pal functions from your patch: They were not needed, even more so if they show no significant speed improvement.

On this bug tracker, tickets get closed once the patch gets committed.

comment:9 by reverse_forever, 13 months ago

Yes, pal functions was removed from patch.

comment:10 by reverse_forever, 13 months ago

Please, check my patch. I have been waiting for its commiting more than a year. I want to get my bounty from bountysource platform.

comment:11 by reverse_forever, 10 months ago

Please, check my patch. I have been waiting for its commiting more than a year. I want to get my bounty.

comment:12 by pocock, 4 months ago

Cc: daniel@pocock.pro added

I tried to add the patch to a build of 4.3.1 on Debian buster. It fails with some minor errors.

a) the timer.h includes and STOP_TIMER macros need to be removed, like this
https://github.com/FFmpeg/FFmpeg/commit/cc2a9509ce79793fcd00f91bb6ca3f4c53721e9e

b) some functions are commented out and attempts to use them cause an error

There are also some compiler warnings, it would be nice to tidy them up.

gcc -I. -Isrc/ -Wdate-time -D_FORTIFY_SOURCE=2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -D_ISOC99_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_POSIX_C_SOURCE=200112 -D_XOPEN_SOURCE=600 -DZLIB_CONST -DHAVE_AV_CONFIG_H -DBUILDING_swscale -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -fno-strict-overflow -fstack-protector-all -fPIE   -std=c11 -fomit-frame-pointer -maltivec -mabi=altivec -mvsx -pthread  -I/usr/include/p11-kit-1  -I/usr/include/lilv-0 -I/usr/include/sratom-0 -I/usr/include/sord-0 -I/usr/include/serd-0 -I/usr/include/harfbuzz -I/usr/include/glib-2.0 -I/usr/lib/powerpc64le-linux-gnu/glib-2.0/include -I/usr/include/uuid -I/usr/include/fribidi -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/libxml2 -I/usr/include/uuid -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/bs2b    -I/usr/include/libdrm -I/usr/include/uuid -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/fribidi  -I/usr/include/openjpeg-2.3  -I/usr/include/opus -I/usr/include/opus -D_REENTRANT  -pthread -I/usr/include/librsvg-2.0 -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib/powerpc64le-linux-gnu/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/uuid -I/usr/include/freetype2 -I/usr/include/libpng16                -isystem /usr/include/mit-krb5 -I/usr/include/pgm-5.2  -I/usr/include/libxml2  -I/usr/include/sphinxbase -I/usr/include/pocketsphinx -I/usr/include/powerpc64le-linux-gnu -I/usr/include/powerpc64le-linux-gnu/sphinxbase -I/usr/include/alsa          -g -Wdeclaration-after-statement -Wall -Wdisabled-optimization -Wpointer-arith -Wredundant-decls -Wwrite-strings -Wtype-limits -Wundef -Wmissing-prototypes -Wno-pointer-to-int-cast -Wstrict-prototypes -Wempty-body -Wno-parentheses -Wno-switch -Wno-format-zero-length -Wno-pointer-sign -Wno-unused-const-variable -Wno-bool-operation -Wno-char-subscripts -O3 -fno-math-errno -fno-signed-zeros -fno-tree-vectorize -Werror=format-security -Werror=implicit-function-declaration -Werror=missing-prototypes -Werror=return-type -Werror=vla -Wformat -fdiagnostics-color=auto -Wno-maybe-uninitialized -D_REENTRANT -I/usr/include/SDL2  -MMD -MF libswscale/ppc/swscale_altivec.d -MT libswscale/ppc/swscale_altivec.o -c -o libswscale/ppc/swscale_altivec.o src/libswscale/ppc/swscale_altivec.c
src/libswscale/ppc/input_vsx.c: In function ‘rgb64ToUV_c_template_vsx’:
src/libswscale/ppc/input_vsx.c:192:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     int i, width_adj, is_BE ;
     ^~~
src/libswscale/ppc/input_vsx.c: In function ‘rgb48ToUV_c_template_vsx’:
src/libswscale/ppc/input_vsx.c:483:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     int i, width_adj, is_BE ;
     ^~~
In file included from src/libavutil/internal.h:42,
                 from src/libavutil/common.h:533,
                 from src/libavutil/avutil.h:296,
                 from src/libswscale/ppc/input_vsx.c:33:
src/libswscale/ppc/input_vsx.c: In function ‘rgb48ToUV_half_c_template’:
src/libavutil/timer.h:134:5: error: ‘tend’ undeclared (first use in this function); did you mean ‘rand’?
     tend = AV_READ_TIME();                                                \
     ^~~~
src/libswscale/ppc/input_vsx.c:630:5: note: in expansion of macro ‘STOP_TIMER’
     STOP_TIMER("3.1")
     ^~~~~~~~~~
src/libavutil/timer.h:134:5: note: each undeclared identifier is reported only once for each function it appears in
     tend = AV_READ_TIME();                                                \
     ^~~~
src/libswscale/ppc/input_vsx.c:630:5: note: in expansion of macro ‘STOP_TIMER’
     STOP_TIMER("3.1")
     ^~~~~~~~~~
In file included from src/libavutil/common.h:106,
                 from src/libavutil/avutil.h:296,
                 from src/libswscale/ppc/input_vsx.c:33:
src/libavutil/timer.h:135:29: error: ‘tstart’ undeclared (first use in this function); did you mean ‘qsort’?
     TIMER_REPORT(id, tend - tstart)
                             ^~~~~~
src/libavutil/intmath.h:39:44: note: in definition of macro ‘ff_log2’
 #   define ff_log2(x) (31 - __builtin_clz((x)|1))
                                            ^
src/libavutil/timer.h:135:5: note: in expansion of macro ‘TIMER_REPORT’
     TIMER_REPORT(id, tend - tstart)
     ^~~~~~~~~~~~
src/libswscale/ppc/input_vsx.c:630:5: note: in expansion of macro ‘STOP_TIMER’
     STOP_TIMER("3.1")
     ^~~~~~~~~~
src/libswscale/ppc/input_vsx.c: In function ‘rgb16_32ToY_c_template_vsx’:
src/libswscale/ppc/input_vsx.c:710:51: warning: unused variable ‘v_val’ [-Wunused-variable]
     vector signed short v_rd0, v_rd1, v_px,v_sign,v_val;
                                                   ^~~~~
src/libswscale/ppc/input_vsx.c:710:44: warning: unused variable ‘v_sign’ [-Wunused-variable]
     vector signed short v_rd0, v_rd1, v_px,v_sign,v_val;
                                            ^~~~~~
src/libswscale/ppc/input_vsx.c: In function ‘rgb16_32ToUV_c_template_vsx’:
src/libswscale/ppc/input_vsx.c:859:53: warning: unused variable ‘v_val’ [-Wunused-variable]
     vector signed short v_rd0, v_rd1, v_px, v_sign, v_val;
                                                     ^~~~~
src/libswscale/ppc/input_vsx.c:859:45: warning: unused variable ‘v_sign’ [-Wunused-variable]
     vector signed short v_rd0, v_rd1, v_px, v_sign, v_val;
                                             ^~~~~~
src/libswscale/ppc/input_vsx.c: In function ‘rgb16_32ToUV_half_c_template_vsx’:
src/libswscale/ppc/input_vsx.c:1035:47: warning: unused variable ‘v_val’ [-Wunused-variable]
     vector signed short v_rd0, v_rd1, v_sign, v_val;
                                               ^~~~~
src/libswscale/ppc/input_vsx.c:1035:39: warning: unused variable ‘v_sign’ [-Wunused-variable]
     vector signed short v_rd0, v_rd1, v_sign, v_val;
                                       ^~~~~~
src/libswscale/ppc/input_vsx.c: In function ‘uyvyToY_c_vsx’:
src/libswscale/ppc/input_vsx.c:2349:35: warning: unused variable ‘sample2’ [-Wunused-variable]
     vector unsigned char sample1, sample2;
                                   ^~~~~~~
src/libswscale/ppc/input_vsx.c: In function ‘p010LEToY_c_vsx’:
src/libswscale/ppc/input_vsx.c:2486:26: warning: unused variable ‘sample’ [-Wunused-variable]
     vector unsigned char sample;
                          ^~~~~~
src/libswscale/ppc/input_vsx.c: In function ‘rgb24ToUV_c_vsx’:
src/libswscale/ppc/input_vsx.c:3131:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     int i, width_adj;
     ^~~
src/libswscale/ppc/input_vsx.c: In function ‘rgb24ToUV_half_c_vsx’:
src/libswscale/ppc/input_vsx.c:3241:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     int i, width_adj;
     ^~~
src/libswscale/ppc/input_vsx.c: In function ‘planar_rgb16_to_y_vsx’:
src/libswscale/ppc/input_vsx.c:3624:15: warning: unused variable ‘src_addr’ [-Wunused-variable]
     uintptr_t src_addr = (uintptr_t)src;
               ^~~~~~~~
src/libswscale/ppc/input_vsx.c:3614:41: warning: unused variable ‘v_rd2’ [-Wunused-variable]
     vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1;
                                         ^~~~~
src/libswscale/ppc/input_vsx.c:3614:34: warning: unused variable ‘v_rd1’ [-Wunused-variable]
     vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1;
                                  ^~~~~
src/libswscale/ppc/input_vsx.c:3614:27: warning: unused variable ‘v_rd0’ [-Wunused-variable]
     vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1;
                           ^~~~~
src/libswscale/ppc/input_vsx.c: In function ‘planar_rgb16_to_a_vsx’:
src/libswscale/ppc/input_vsx.c:3704:34: warning: unused variable ‘v_a’ [-Wunused-variable]
     vector unsigned short v_rd0, v_a, v_dst, shift;
                                  ^~~
src/libswscale/ppc/input_vsx.c: In function ‘planar_rgb16_to_uv_vsx’:
src/libswscale/ppc/input_vsx.c:3742:41: warning: unused variable ‘v_rd2’ [-Wunused-variable]
     vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1;
                                         ^~~~~
src/libswscale/ppc/input_vsx.c:3742:34: warning: unused variable ‘v_rd1’ [-Wunused-variable]
     vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1;
                                  ^~~~~
src/libswscale/ppc/input_vsx.c:3742:27: warning: unused variable ‘v_rd0’ [-Wunused-variable]
     vector unsigned short v_rd0, v_rd1, v_rd2, v_g, v_b, v_r, v_g1, v_b1, v_r1;
                           ^~~~~
In file included from src/libavutil/internal.h:42,
                 from src/libavutil/common.h:533,
                 from src/libavutil/avutil.h:296,
                 from src/libswscale/ppc/input_vsx.c:33:
src/libswscale/ppc/input_vsx.c: In function ‘grayf32ToY16_c_vsx’:
src/libavutil/timer.h:134:5: error: ‘tend’ undeclared (first use in this function); did you mean ‘rand’?
     tend = AV_READ_TIME();                                                \
     ^~~~
src/libswscale/ppc/input_vsx.c:3864:5: note: in expansion of macro ‘STOP_TIMER’
     STOP_TIMER("47")
     ^~~~~~~~~~
In file included from src/libavutil/common.h:106,
                 from src/libavutil/avutil.h:296,
                 from src/libswscale/ppc/input_vsx.c:33:
src/libavutil/timer.h:135:29: error: ‘tstart’ undeclared (first use in this function); did you mean ‘qsort’?
     TIMER_REPORT(id, tend - tstart)
                             ^~~~~~
src/libavutil/intmath.h:39:44: note: in definition of macro ‘ff_log2’
 #   define ff_log2(x) (31 - __builtin_clz((x)|1))
                                            ^
src/libavutil/timer.h:135:5: note: in expansion of macro ‘TIMER_REPORT’
     TIMER_REPORT(id, tend - tstart)
     ^~~~~~~~~~~~
src/libswscale/ppc/input_vsx.c:3864:5: note: in expansion of macro ‘STOP_TIMER’
     STOP_TIMER("47")
     ^~~~~~~~~~
In file included from src/libavutil/internal.h:42,
                 from src/libavutil/common.h:533,
                 from src/libavutil/avutil.h:296,
                 from src/libswscale/ppc/input_vsx.c:33:
src/libswscale/ppc/input_vsx.c: In function ‘grayf32ToY16_bswap_c_vsx’:
src/libavutil/timer.h:134:5: error: ‘tend’ undeclared (first use in this function); did you mean ‘rand’?
     tend = AV_READ_TIME();                                                \
     ^~~~
src/libswscale/ppc/input_vsx.c:3878:5: note: in expansion of macro ‘STOP_TIMER’
     STOP_TIMER("48")
     ^~~~~~~~~~
In file included from src/libavutil/common.h:106,
                 from src/libavutil/avutil.h:296,
                 from src/libswscale/ppc/input_vsx.c:33:
src/libavutil/timer.h:135:29: error: ‘tstart’ undeclared (first use in this function); did you mean ‘qsort’?
     TIMER_REPORT(id, tend - tstart)
                             ^~~~~~
src/libavutil/intmath.h:39:44: note: in definition of macro ‘ff_log2’
 #   define ff_log2(x) (31 - __builtin_clz((x)|1))
                                            ^
src/libavutil/timer.h:135:5: note: in expansion of macro ‘TIMER_REPORT’
     TIMER_REPORT(id, tend - tstart)
     ^~~~~~~~~~~~
src/libswscale/ppc/input_vsx.c:3878:5: note: in expansion of macro ‘STOP_TIMER’
     STOP_TIMER("48")
     ^~~~~~~~~~
src/libswscale/ppc/input_vsx.c: In function ‘ff_sws_init_input_funcs_vsx’:
src/libswscale/ppc/input_vsx.c:4029:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
     enum AVPixelFormat srcFormat = c->srcFormat;
     ^~~~
src/libswscale/ppc/input_vsx.c:4055:24: error: ‘palToUV_c_vsx’ undeclared (first use in this function); did you mean ‘palToA_c_vsx’?
         c->chrToYV12 = palToUV_c_vsx;
                        ^~~~~~~~~~~~~
                        palToA_c_vsx
src/libswscale/ppc/input_vsx.c:4468:24: error: ‘palToY_c_vsx’ undeclared (first use in this function); did you mean ‘palToA_c_vsx’?
         c->lumToYV12 = palToY_c_vsx;
                        ^~~~~~~~~~~~
                        palToA_c_vsx
make[2]: *** [/<<PKGBUILDDIR>>/ffbuild/common.mak:59: libswscale/ppc/input_vsx.o] Error 1

comment:13 by reverse_forever, 4 months ago

Hi! Functions that were commented out did not well speed. The actual patch with little fixes and without START_TIME/STOP_TIME macroses can be taken from https://patchwork.ffmpeg.org/project/ffmpeg/patch/20201230120822.8891-1-pestov.vyach@yandex.ru/

comment:14 by pocock, 4 months ago

Could this patch be related to #9077 ?

Note: See TracTickets for help on using tickets.