Opened 12 months ago

Last modified 5 weeks ago

#10852 new defect

sws_scale overflows buffer for some resolutions using ssse3 instructions

Reported by: Jerome Owned by:
Priority: important Component: swscale
Version: git-master Keywords: memory scale
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description (last modified by Jerome)

Summary of the bug:
This bug occurs when using the sws_scale function to convert the color space of an image from yuv420 to bgr24 (it might occur for other color spaces, I haven't verified).
It started happening with version 5.0 and is still happening with the latest master. It does not systematically happen, only for some image resolution. When writing the output image, sws_scale goes beyond the image buffer by 8 bytes. This memory violation often triggers a segfault.

Valgrind output:

==1050090== Memcheck, a memory error detector
==1050090== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==1050090== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==1050090== Command: ./test_memcheck
==1050090== 
[swscaler @ 0x62cb680] deprecated pixel format used, make sure you did set range correctly
Convert unscaled 3240
==1050090== Invalid write of size 8
==1050090==    at 0x48EEAF6: ??? (libavutil/x86/x86inc.asm:1274)
==1050090==    by 0x48ED6F2: yuv420_bgr24_ssse3 (yuv2rgb_template.c:193)
==1050090==    by 0x48B5B5F: scale_internal (swscale.c:1042)
==1050090==    by 0x48B6668: sws_scale (swscale.c:1214)
==1050090==    by 0x109393: main (main_memcheck.cpp:40)
==1050090==  Address 0x69aac40 is 0 bytes after a block of size 6,220,800 alloc'd
==1050090==    at 0x4847203: operator new[](unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1050090==    by 0x10931A: main (main_memcheck.cpp:23)
==1050090== 
==1050090== Invalid write of size 8
==1050090==    at 0x48EEAFB: ??? (libavutil/x86/x86inc.asm:1274)
==1050090==    by 0x48ED6F2: yuv420_bgr24_ssse3 (yuv2rgb_template.c:193)
==1050090==    by 0x48B5B5F: scale_internal (swscale.c:1042)
==1050090==    by 0x48B6668: sws_scale (swscale.c:1214)
==1050090==    by 0x109393: main (main_memcheck.cpp:40)
==1050090==  Address 0x69aac48 is 8 bytes after a block of size 6,220,800 alloc'd
==1050090==    at 0x4847203: operator new[](unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1050090==    by 0x10931A: main (main_memcheck.cpp:23)

How to reproduce:
I'm attaching a small C++ program that illustrates how to reproduce the bug.

Environment:
Ubuntu 22.04, but was also reproduced on Rocky9.

Build:
The ffmpeg library was built with the following flags:

 --enable-libx265 \
 --enable-libx264 \
 --enable-shared \
 --enable-gnutls \
 --enable-libfreetype \
 --enable-libfontconfig \
 --enable-gpl \
 --enable-nonfree \
 --enable-nvdec \
 --disable-static \
 --enable-debug=3 --disable-stripping \
 --nvccflags="-gencode arch=compute_86,code=sm_86 -O2"

Attachments (3)

main.cpp (1.5 KB ) - added by Jerome 12 months ago.
source file to reproduce the bug
Makefile (308 bytes ) - added by Jerome 12 months ago.
makefile
main_memcheck.cpp (1.5 KB ) - added by Jerome 12 months ago.
source file to get the valgrind output

Download all attachments as: .zip

Change History (16)

by Jerome, 12 months ago

Attachment: main.cpp added

source file to reproduce the bug

by Jerome, 12 months ago

Attachment: Makefile added

makefile

by Jerome, 12 months ago

Attachment: main_memcheck.cpp added

source file to get the valgrind output

comment:1 by Balling, 12 months ago

See bug #979, you need to test with -vf scale=flags=accurate_rnd

in reply to:  1 comment:2 by Jerome, 11 months ago

Replying to Balling:

See bug #979, you need to test with -vf scale=flags=accurate_rnd

Thank you for the comment. I don't believe this is related, though. In my case, I don't use the ffmpeg binary, but instead the libswscale shared library. And the issue is not about the precision of the color conversion, but the fact that it runs over the output buffer by 8 bytes for some resolutions.

comment:3 by Jerome, 11 months ago

Description: modified (diff)

in reply to:  1 ; comment:5 by Balling, 11 months ago

What I mean is that the default color convertion is buggy, no one should use that. And it only affects yuv420p to bgr, not to rgb.

Last edited 11 months ago by Balling (previous) (diff)

in reply to:  5 comment:6 by Jerome, 11 months ago

Replying to Balling:

What I mean is that the default color convertion is buggy, no one should use that. And it only affects yuv420p to bgr, not to rgb.

Thank you for your reply. For this bug, I'm using the function sws_scale directly. Do you mean that this function is buggy? If this is the case, can you point me to an alternative?

in reply to:  5 comment:7 by Jerome, 11 months ago

Replying to Balling:

What I mean is that the default color convertion is buggy, no one should use that. And it only affects yuv420p to bgr, not to rgb.

Ok, I found the flag SWS_ACCURATE_RND. With it, the memory issue is gone. The issue is that it is at least 3x slower than the default color conversion.

comment:8 by Balling, 11 months ago

The issue is that it is at least 3x slower than the default color conversion.

Default color conversion is so wrong, it is useless really.

Happy to hear that with the API flag the bug is gone.

comment:9 by Balling, 4 months ago

Does this still happen? Check please

comment:10 by Balling, 4 months ago

Keywords: scale added

comment:11 by Jerome, 4 months ago

Tested with the latest master and the bug is still there.

comment:12 by Niklas Haas, 5 weeks ago

Similar to https://trac.ffmpeg.org/ticket/6763#comment:4

It seems that the three ways of fixing this would be to either

  1. Simply document the extra padding requirements on sws_scale().
  2. Extend the asm logic to make sure to clamp the tail (or handle the last line using the C function)
  3. Use a scratch buffer for the last line's output and memcpy it into the destination image instead

I'm not sure what is preferable here.

Last edited 5 weeks ago by Niklas Haas (previous) (diff)

comment:13 by Balling, 5 weeks ago

extra padding requirements

No one is gonna do that, also from fuzzing perceptive it is still a bug.

Obviously C code is better (assuming it fixes Memcheck and this is not internal bug in inline assmebly)

Note: See TracTickets for help on using tickets.