Opened 10 years ago
Closed 10 years ago
#4148 closed defect (fixed)
Crash in ff_add_bytes_l2_sse2 when decoding attached APNG file
Reported by: | Benoit Fouet | Owned by: | |
---|---|---|---|
Priority: | important | Component: | avcodec |
Version: | git-master | Keywords: | png crash SIGSEGV regression |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | yes | |
Analyzed by developer: | no |
Description
How to reproduce:
$ gdb --args ./ffmpeg_g -i dolske/whee.png -f null /dev/null GNU gdb (GDB) 7.6.1-ubuntu Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /home/hack/ffmpeg/videolan.org/ffmpeg_g...done. (gdb) r Starting program: /home/hack/ffmpeg/videolan.org/./ffmpeg_g -i dolske/whee.png -f null /dev/null [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". ffmpeg version N-68130-gb50e003 Copyright (c) 2000-2014 the FFmpeg developers built on Dec 1 2014 14:20:21 with gcc 4.8 (Ubuntu/Linaro 4.8.1-10ubuntu9) configuration: --enable-gpl --enable-libmp3lame --enable-libx264 --samples=/home/hack/ffmpeg/videolan.org/fate-suite libavutil 54. 15.100 / 54. 15.100 libavcodec 56. 13.100 / 56. 13.100 libavformat 56. 15.101 / 56. 15.101 libavdevice 56. 3.100 / 56. 3.100 libavfilter 5. 2.103 / 5. 2.103 libswscale 3. 1.101 / 3. 1.101 libswresample 1. 1.100 / 1. 1.100 libpostproc 53. 3.100 / 53. 3.100 Input #0, apng, from 'dolske/whee.png': Duration: N/A, start: 0.000000, bitrate: N/A Stream #0:0: Video: apng, rgba, 230x200, 12.08 fps, 12.05 tbr, 100k tbn, 100k tbc [New Thread 0x7ffff2d87700 (LWP 9343)] [New Thread 0x7ffff2586700 (LWP 9344)] [New Thread 0x7ffff1d85700 (LWP 9345)] [New Thread 0x7ffff1584700 (LWP 9346)] [New Thread 0x7ffff0d83700 (LWP 9347)] [New Thread 0x7ffff0582700 (LWP 9348)] [New Thread 0x7fffefd81700 (LWP 9349)] [New Thread 0x7fffef580700 (LWP 9350)] [New Thread 0x7fffeed7f700 (LWP 9351)] [New Thread 0x7fffee57e700 (LWP 9352)] [New Thread 0x7fffedd7d700 (LWP 9353)] [New Thread 0x7fffed57c700 (LWP 9354)] [New Thread 0x7fffecd7b700 (LWP 9355)] [New Thread 0x7fffec57a700 (LWP 9356)] [New Thread 0x7fffebd79700 (LWP 9357)] [New Thread 0x7fffeb578700 (LWP 9358)] [New Thread 0x7fffead77700 (LWP 9359)] [New Thread 0x7fffea576700 (LWP 9360)] Output #0, null, to '/dev/null': Metadata: encoder : Lavf56.15.101 Stream #0:0: Video: rawvideo (RGBA / 0x41424752), rgba, 230x200, q=2-31, 200 kb/s, 12.05 fps, 12.05 tbn, 12.05 tbc Metadata: encoder : Lavc56.13.100 rawvideo Stream mapping: Stream #0:0 -> #0:0 (apng (native) -> rawvideo (native)) Press [q] to stop, [?] for help [null @ 0x1ae2260] Encoder did not produce proper pts, making some up. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffead77700 (LWP 9359)] ff_add_bytes_l2_sse2.loop_v () at libavcodec/x86/pngdsp.asm:90 90 ADD_BYTES_FN 2 (gdb) bt #0 ff_add_bytes_l2_sse2.loop_v () at libavcodec/x86/pngdsp.asm:90 #1 0x0000000000918e76 in png_filter_row (dsp=0x1aa3000, dst=0x7fffc400e754 "", filter_type=2, src=0x7fffc40300b0 "\002\002\002", last=0x7fffc400e3b4 "\034_\274\377 c\300\377\036e\277\377", size=352, bpp=4) at libavcodec/pngdec.c:255 #2 0x00000000009197a9 in png_handle_row (s=0x1aa3000) at libavcodec/pngdec.c:313 #3 0x0000000000919de0 in png_decode_idat (s=0x1aa3000, length=6125) at libavcodec/pngdec.c:398 #4 0x000000000091b373 in decode_idat_chunk (avctx=0x1aa2780, s=0x1aa3000, length=6125, p=0x1aa35c0) at libavcodec/pngdec.c:679 #5 0x000000000091d4c1 in decode_frame_common (avctx=0x1aa2780, s=0x1aa3000, p=0x1aa35c0, avpkt=0x1ad0478) at libavcodec/pngdec.c:1014 #6 0x000000000091dfd5 in decode_frame_apng (avctx=0x1aa2780, data=0x1aa2c60, got_frame=0x1ad04e0, avpkt=0x1ad0478) at libavcodec/pngdec.c:1171 #7 0x000000000092955d in frame_worker_thread (arg=0x1ad0378) at libavcodec/pthread_frame.c:158 #8 0x00007ffff741ff6e in start_thread (arg=0x7fffead77700) at pthread_create.c:311 #9 0x00007ffff657a9cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 (gdb) disassemble $pc-32,$pc+32 Dump of assembler code from 0xc4d3cc to 0xc4d40c: 0x0000000000c4d3cc <ff_diff_pixels_sse2.loop+74>: add %al,(%rax) 0x0000000000c4d3ce <ff_diff_pixels_sse2.loop+76>: add %al,(%rax) 0x0000000000c4d3d0 <ff_add_bytes_l2_sse2+0>: movslq %ecx,%rcx 0x0000000000c4d3d3 <ff_add_bytes_l2_sse2+3>: xor %r9,%r9 0x0000000000c4d3d6 <ff_add_bytes_l2_sse2+6>: mov %rcx,%r8 0x0000000000c4d3d9 <ff_add_bytes_l2_sse2+9>: and $0xffffffffffffffe0,%rcx 0x0000000000c4d3dd <ff_add_bytes_l2_sse2+13>: jmp 0xc4d40a <ff_add_bytes_l2_sse2.end_v> 0x0000000000c4d3df <ff_add_bytes_l2_sse2.loop_v+0>: movdqa (%rsi,%r9,1),%xmm0 0x0000000000c4d3e5 <ff_add_bytes_l2_sse2.loop_v+6>: movdqa 0x10(%rsi,%r9,1),%xmm1 => 0x0000000000c4d3ec <ff_add_bytes_l2_sse2.loop_v+13>: paddb (%rdx,%r9,1),%xmm0 0x0000000000c4d3f2 <ff_add_bytes_l2_sse2.loop_v+19>: paddb 0x10(%rdx,%r9,1),%xmm1 0x0000000000c4d3f9 <ff_add_bytes_l2_sse2.loop_v+26>: movdqa %xmm0,(%rdi,%r9,1) 0x0000000000c4d3ff <ff_add_bytes_l2_sse2.loop_v+32>: movdqa %xmm1,0x10(%rdi,%r9,1) 0x0000000000c4d406 <ff_add_bytes_l2_sse2.loop_v+39>: add $0x20,%r9 0x0000000000c4d40a <ff_add_bytes_l2_sse2.end_v+0>: cmp %rcx,%r9 End of assembler dump. (gdb) info registers all rax 0xc4d3d0 12899280 rbx 0x1aa2780 27928448 rcx 0x160 352 rdx 0x7fffc400e3b4 140736481780660 rsi 0x7fffc40300b0 140736481919152 rdi 0x7fffc400e754 140736481781588 rbp 0x1ad0448 0x1ad0448 rsp 0x7fffead76b88 0x7fffead76b88 r8 0x160 352 r9 0x0 0 r10 0x5865b 362075 r11 0x1aa34f8 27931896 r12 0x13b2060 20652128 r13 0x1acf740 28112704 r14 0x1ad0378 28115832 r15 0x1ad0478 28116088 rip 0xc4d3ec 0xc4d3ec <ff_add_bytes_l2_sse2.loop_v+13> eflags 0x10287 [ CF PF SF IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 st0 -inf (raw 0xffff0000000000000000) st1 -inf (raw 0xffff0000000000000000) st2 -inf (raw 0xffff0000000000000000) st3 -inf (raw 0xffff0000000000000000) st4 -inf (raw 0xffff0000000000000000) st5 -inf (raw 0xffff0000000000000000) st6 -inf (raw 0xffff0000000000000000) st7 -inf (raw 0xffff0000000000000000) fctrl 0x37f 895 fstat 0x0 0 ftag 0xaaaa 43690 fiseg 0x0 0 fioff 0x0 0 foseg 0x0 0 fooff 0x0 0 fop 0x0 0 mxcsr 0x1fa8 [ OE PE IM DM ZM OM UM PM ] ymm0 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x8000000000000000, 0x0, 0x0}, v32_int8 = {0x2, 0x2, 0x2, 0x0, 0x3, 0x3, 0x3, 0x0, 0x2, 0x2, 0x2, 0x0, 0x21, 0x68, 0xc2, 0xff, 0x0 <repeats 16 times>}, v16_int16 = {0x202, 0x2, 0x303, 0x3, 0x202, 0x2, 0x6821, 0xffc2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x20202, 0x30303, 0x20202, 0xffc26821, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x3030300020202, 0xffc2682100020202, 0x0, 0x0}, v2_int128 = {0xffc26821000202020003030300020202, 0x00000000000000000000000000000000}} ymm1 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x1f, 0x69, 0xc4, 0xff, 0xff, 0xff, 0xff, 0x0, 0xfe, 0xfe, 0xfe, 0x0 <repeats 21 times>}, v16_int16 = {0x691f, 0xffc4, 0xffff, 0xff, 0xfefe, 0xfe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0xffc4691f, 0xffffff, 0xfefefe, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xffffffffc4691f, 0xfefefe, 0x0, 0x0}, v2_int128 = { 0x0000000000fefefe00ffffffffc4691f, 0x00000000000000000000000000000000}} ymm2 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x00000000000000000000000000000000, 0x00000000000000000000000000000000}} ymm3 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x00000000000000000000000000000000, 0x00000000000000000000000000000000}} ymm4 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x00000000000000000000000000000000, 0x00000000000000000000000000000000}} ymm5 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x8000000000000000, 0x0, 0x0}, v32_int8 = {0x0 <repeats 14 times>, 0xfe, 0xfe, 0x0 <repeats 16 times>}, v16_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xfefe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x0, 0x0, 0x0, 0xfefe0000, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0xfefe000000000000, 0x0, 0x0}, v2_int128 = {0xfefe0000000000000000000000000000, 0x00000000000000000000000000000000}} ymm6 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0xfe, 0x0, 0xff, 0xff, 0xff, 0x0 <repeats 27 times>}, v16_int16 = {0xfe, 0xffff, 0xff, 0x0 <repeats 13 times>}, v8_int32 = {0xffff00fe, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xffffff00fe, 0x0, 0x0, 0x0}, v2_int128 = {0x0000000000000000000000ffffff00fe, 0x00000000000000000000000000000000}} ---Type <return> to continue, or q <return> to quit--- ymm7 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x00000000000000000000000000000000, 0x00000000000000000000000000000000}} ymm8 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x00000000000000000000000000000000, 0x00000000000000000000000000000000}} ymm9 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 14 times>, 0x3, 0x3, 0x0 <repeats 16 times>}, v16_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x303, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x0, 0x0, 0x0, 0x3030000, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x303000000000000, 0x0, 0x0}, v2_int128 = {0x03030000000000000000000000000000, 0x00000000000000000000000000000000}} ymm10 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0, 0x0, 0x46, 0x84, 0x24, 0x59, 0xd6, 0x3e, 0x0 <repeats 24 times>}, v16_int16 = {0x0, 0x8446, 0x5924, 0x3ed6, 0x0 <repeats 12 times>}, v8_int32 = {0x84460000, 0x3ed65924, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x3ed6592484460000, 0x0, 0x0, 0x0}, v2_int128 = {0x00000000000000003ed6592484460000, 0x00000000000000000000000000000000}} ymm11 {v8_float = {0x9689a800, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x6a, 0xa2, 0x65, 0x50, 0xf2, 0xea, 0x8f, 0xbd, 0x0 <repeats 24 times>}, v16_int16 = {0xa26a, 0x5065, 0xeaf2, 0xbd8f, 0x0 <repeats 12 times>}, v8_int32 = {0x5065a26a, 0xbd8feaf2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xbd8feaf25065a26a, 0x0, 0x0, 0x0}, v2_int128 = {0x0000000000000000bd8feaf25065a26a, 0x00000000000000000000000000000000}} ymm12 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc4, 0x3c, 0x0 <repeats 24 times>}, v16_int16 = {0x0, 0x0, 0x0, 0x3cc4, 0x0 <repeats 12 times>}, v8_int32 = {0x0, 0x3cc40000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x3cc4000000000000, 0x0, 0x0, 0x0}, v2_int128 = {0x00000000000000003cc4000000000000, 0x00000000000000000000000000000000}} ymm13 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80, 0x59, 0xbc, 0x0 <repeats 24 times>}, v16_int16 = {0x0, 0x0, 0x8000, 0xbc59, 0x0 <repeats 12 times>}, v8_int32 = {0x0, 0xbc598000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xbc59800000000000, 0x0, 0x0, 0x0}, v2_int128 = {0x0000000000000000bc59800000000000, 0x00000000000000000000000000000000}} ymm14 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x8e, 0x85, 0x83, 0xe8, 0xf0, 0x24, 0x53, 0x3c, 0x0 <repeats 24 times>}, v16_int16 = {0x858e, 0xe883, 0x24f0, 0x3c53, 0x0 <repeats 12 times>}, v8_int32 = {0xe883858e, 0x3c5324f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x3c5324f0e883858e, 0x0, 0x0, 0x0}, v2_int128 = {0x00000000000000003c5324f0e883858e, 0x00000000000000000000000000000000}} ymm15 {v8_float = {0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x2d, 0x0, 0x0, 0x0}, v32_int8 = {0xc0, 0x9, 0xf2, 0x16, 0xb5, 0xdf, 0x46, 0x40, 0x0 <repeats 24 times>}, v16_int16 = {0x9c0, 0x16f2, 0xdfb5, 0x4046, 0x0 <repeats 12 times>}, v8_int32 = {0x16f209c0, 0x4046dfb5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x4046dfb516f209c0, 0x0, 0x0, 0x0}, v2_int128 = {0x00000000000000004046dfb516f209c0, 0x00000000000000000000000000000000}}
Note that I tried to disable the first loop in add_bytes_l2, and the output seems correct:
diff --git a/libavcodec/x86/pngdsp.asm b/libavcodec/x86/pngdsp.asm index 8e23ccf..d4be8ad 100644 --- a/libavcodec/x86/pngdsp.asm +++ b/libavcodec/x86/pngdsp.asm @@ -50,8 +50,8 @@ cglobal add_bytes_l2, 4, 6, %1, dst, src1, src2, wa, w, i mova [dstq+iq+mmsize], m1 add iq, mmsize*2 .end_v: - cmp iq, waq - jl .loop_v +; cmp iq, waq +; jl .loop_v %if mmsize == 16 ; vector loop
Attachments (1)
Change History (10)
by , 10 years ago
Attachment: | whee_trnc.png added |
---|
comment:1 by , 10 years ago
Keywords: | png crash SIGSEGV added; pngdsp removed |
---|---|
Priority: | normal → important |
Status: | new → open |
comment:2 by , 10 years ago
Keywords: | regression added |
---|
comment:3 by , 10 years ago
Given this is triggered by the APNG demuxer/decoder combo which has just been added, I don't think this really qualify as a regression.
comment:4 by , 10 years ago
void (*add_bytes_l2)(uint8_t *dst /* align 16 */,
uint8_t *src1 /* align 16 */,
uint8_t *src2 /* align 16 */, int w);
rdx 0x7fffc400e3b4
rdi 0x7fffc400e754
dst/src2 are no longer abiding per the alignment constrain. Either the DSP must be modified (most probably that), a specific unaligned version be written, or the caller modified to use aligned addresses.
comment:5 by , 10 years ago
Well, there is already the scalar loop in the DSP. What would you think of just adding a test (on dst or src2, as they have the same alignment), like this (note that I don't speak yasm ;-):
diff --git a/libavcodec/x86/pngdsp.asm b/libavcodec/x86/pngdsp.asm index 8e23ccf..b43fb18 100644 --- a/libavcodec/x86/pngdsp.asm +++ b/libavcodec/x86/pngdsp.asm @@ -36,9 +36,16 @@ cglobal add_bytes_l2, 4, 6, %1, dst, src1, src2, wa, w, i movsxd waq, wad %endif xor iq, iq + mov wq, waq + + ; if dst buffer is unaligned, use scalar loop + mov waq, dstq + and waq, 15 + test waq, waq + jnz .end_s ; vector loop - mov wq, waq + mov waq, wq and waq, ~(mmsize*2-1) jmp .end_v .loop_v:
comment:6 by , 10 years ago
Crap, my comment was not posted, so I'll make it briefer:
0) Are you sure src1 is always aligned? I think not
1) Can the caller be fixed to use aligned addresses without copying around data? If yes, that should be done to have as few unaligned loads as possible
2) Speed generally matters, so deciding in case of unaligned addresses to use the slow, scalar version depends on how often and how large the areas it applies to are
3) If speed matters, a SIMD handling unaligned addresses should be used
4) Compare its speed to the aligned version for aligned addresses. Larger areas, older PCs should show larger differences.
5) If the speed difference is large enough, add a branch for unaligned addresses, otherwise just replace the aligned code
6) If you're right about src1:
test dstq, 15 jnz .handle_unaligned
If you're not:
mov iq, dstq or iq, src1q test iq, 15 jnz .handle_unaligned
then xor iq on each path.
comment:7 by , 10 years ago
Note: that code should not present for mmx.
15 should be replaced by mmsize-1 (although the probability of an AVX2 version is low).
comment:8 by , 10 years ago
0) yes, I am sure, at least it is advertised as being so (see libavcodec/pngdec.c:674)
1) no, it cannot, it's working on (arbitrary) subsets of a buffer, and the alignment cannot be guaranteed
2) the only thing I can tell is that it is only affecting certain APNG files, but their number is something that can vary
3) I thought of something like a "prologue" for the function, just like the end of the buffer is handled. My yasm skills are void, though, so I think a first approach would be to branch as I proposed (using your first version in 6) above)
I'll send a patch to do that shortly.
Thank you.
comment:9 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | open → closed |
Fixed by Christophe Gisquet in 9fa056ba
Could be considered a regression.