Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#8747 closed defect (fixed)

libswscale 4.3 crash if output buffer is not 16 bytes aligned for yuv2rgb conversion

Reported by: melanconj Owned by:
Priority: important Component: swscale
Version: git-master Keywords: regression
Cc: melanconj@gmail.com Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:
With the 4.3 release, swscale now crashes if provided with an output buffer that is not 16 bytes aligned for yuv2rgb conversions. It used to work in previous releases.

How to reproduce:

#include <stdint.h>
#include "libswscale\swscale.h"

void main()
{
	uint8_t* src = malloc(640*480);
	uint8_t* dst = malloc(640*480*4);
	if (((int64_t)dst & ~0xF) == (int64_t)dst) {
		dst += 8; // Ensure we are unaligned. Comment out to see it work
	}

	const uint8_t* srcSlice [3] = { src, src, src };
	int srcStride[3] = {640, 320, 320};
	int dstStride = 640 * 3;

	void *context = sws_getCachedContext(NULL, 640, 480, AV_PIX_FMT_YUV420P, 640, 480, AV_PIX_FMT_RGB24, SWS_POINT, NULL, NULL, NULL);
	sws_scale(context, srcSlice, srcStride, 0, 480, &dst, &dstStride);
}

It seems to have been introduced by the SSSE3 codepath that was added in commit fc6a5883d6af8cae0e96af84dda0ad74b360a084.
I also found ticket https://trac.ffmpeg.org/ticket/8532 that found the same issue as mine but with the input buffer, which was fixed.

Change History (13)

comment:1 by melanconj, 4 years ago

Cc: melanconj@gmail.com added

comment:2 by Carl Eugen Hoyos, 4 years ago

Component: undeterminedswscale
Keywords: regression added
Priority: normalimportant
Version: unspecifiedgit-master

Is there a system where the crash can be reproduced without meddling with the heap pointer?

Last edited 4 years ago by Carl Eugen Hoyos (previous) (diff)

comment:3 by melanconj, 4 years ago

Apologies, I should have mentioned our real world use case in the report. We've noticed this regression when trying to integrate 4.3 in our application. Our test infrastructure has been crashing consistently with the new version. In our case, the application is a C# program doing interop into swscale. Arrays in C# are 8 bytes aligned so we crash about half the time we call into swscale. I've put the alignment check in the sample to ensure it reproduces systematically, although I'm testing a bit more now and it seems C allocations are consistently 16 bytes aligned for this buffer size. This small PoC for a smaller buffer shows unaligned buffers about half the time in 32 bits:

#include <stdint.h>
#include <stdio.h>

int main()
{
    for (int i = 0; i < 8; i++) {
        void* dst = malloc(320*240*3);
        if (((int64_t)dst & ~0xF) == (int64_t)dst) {
            printf("aligned\n");
        } else {
            printf("unaligned\n");
        }
    }
}

I took a look into other languages to see if C# was the odd one here, it seems few offer 16 bytes guarantees for their arrays. For example both Rust and Go are only providing a type-size alignment guarantee for their arrays. C and C++ appear to give 16 bytes in 64 bits, but only 8 for 32 bits programs.

comment:4 by darbyjohnston, 4 years ago

Hi, this might be related, I was trying to upgrade my software to FFmpeg 4.3 and am also running into a number of crashes in libswscale with movies files that worked fine with 4.2.2. I've made this code snippet to reproduce:

#include <libswscale/swscale.h>
#include <libavutil/frame.h>
#include <libavutil/imgutils.h>
#include <stdio.h>

void scale(int w, int h)
{
    const enum AVPixelFormat inFmt = AV_PIX_FMT_YUV420P;
    const enum AVPixelFormat outFmt = AV_PIX_FMT_RGBA;
    struct AVFrame* inFrame = NULL;
    struct AVFrame* outFrame = NULL;
    uint8_t* outData = NULL;
    struct SwsContext* context = NULL;
    
    printf("Scale: %dx%d\n", w, h);
    
    inFrame = av_frame_alloc();
    inFrame->width = w;
    inFrame->height = h;
    inFrame->format = inFmt;
    av_frame_get_buffer(inFrame, 0);
    
    outFrame = av_frame_alloc();
    outData = (uint8_t*)malloc(w * h * 4);
    av_image_fill_arrays(outFrame->data, outFrame->linesize, outData, outFmt, w, h, 1);

    context = sws_getContext(w, h, inFmt, w, h, outFmt, SWS_BILINEAR, NULL, NULL, NULL);
    sws_scale(context, (uint8_t const* const*)inFrame->data, inFrame->linesize, 0, h,
        outFrame->data, outFrame->linesize);
    
    av_frame_free(&inFrame);
    av_frame_free(&outFrame);
    free(outData);
}

int main(int argc, char** argv)
{
    scale(640, 480);
    scale(1920, 1080);
    scale(716, 574); // Valgrind: Invalid write of size 8
    scale(702, 478); // Valgrind: Process terminating with default action of signal 11 (SIGSEGV)
    return 0;
}

Running this with Valgrind on an Ubuntu 20.04 x64 system gives the following output:

==167716== Memcheck, a memory error detector
==167716== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==167716== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==167716== Command: ./a.out
==167716== 
Scale: 640x480
Scale: 1920x1080
Scale: 716x574
==167716== Invalid write of size 8
==167716==    at 0x4B73C68: ??? (in /home/darby/dev/DJV-install-Debug/lib/libswscale.so.5.7.100)
==167716==    by 0x518C1DF: ???
==167716==    by 0x51A71FF: ???
==167716==  Address 0x60995e0 is 0 bytes after a block of size 1,643,936 alloc'd
==167716==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==167716==    by 0x109325: scale (in /home/darby/dev/DJV-Debug/a.out)
==167716==    by 0x109451: main (in /home/darby/dev/DJV-Debug/a.out)
==167716== 
Scale: 702x478
==167716== 
==167716== Process terminating with default action of signal 11 (SIGSEGV)
==167716==  General Protection Fault
==167716==    at 0x4B73C5A: ??? (in /home/darby/dev/DJV-install-Debug/lib/libswscale.so.5.7.100)
==167716==    by 0x5233FDF: ???
==167716==    by 0x52489FF: ???
==167716== 
==167716== HEAP SUMMARY:
==167716==     in use at exit: 2,349,684 bytes in 66 blocks
==167716==   total heap usage: 133 allocs, 67 frees, 18,277,268 bytes allocated
==167716== 
==167716== LEAK SUMMARY:
==167716==    definitely lost: 159,024 bytes in 3 blocks
==167716==    indirectly lost: 229,026 bytes in 42 blocks
==167716==      possibly lost: 0 bytes in 0 blocks
==167716==    still reachable: 1,961,634 bytes in 21 blocks
==167716==         suppressed: 0 bytes in 0 blocks
==167716== Rerun with --leak-check=full to see details of leaked memory
==167716== 
==167716== For lists of detected and suppressed errors, rerun with: -s
==167716== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

in reply to:  4 ; comment:5 by Carl Eugen Hoyos, 4 years ago

Replying to darbyjohnston:

Hi, this might be related

Why do you think your issue is related, what makes the issue described above similar to yours?

in reply to:  3 ; comment:6 by Carl Eugen Hoyos, 4 years ago

Replying to melanconj:

Apologies, I should have mentioned our real world use case in the report. We've noticed this regression when trying to integrate 4.3 in our application. Our test infrastructure has been crashing consistently with the new version. In our case, the application is a C# program doing interop into swscale.

I suspect the problem is that FFmpeg offers C libraries and that there may be no system with SSSE3 support where your crash can be reproduced using libc's allocation functions. Since you are not using C, you may have to compile FFmpeg with --disable-ssse3.
Can't you use FFmpeg's allocation functions?

There is something else that I would like to understand: Why didn't your application crash before using SSE2-optimized functions that also need 16-byte alignment?

in reply to:  5 ; comment:7 by darbyjohnston, 4 years ago

Replying to cehoyos:

Replying to darbyjohnston:

Hi, this might be related

Why do you think your issue is related, what makes the issue described above similar to yours?

Just that we were both getting crashes with the 4.3 release in the same function; should I open a separate issue for the crash I am seeing?

in reply to:  7 comment:8 by Carl Eugen Hoyos, 4 years ago

Replying to darbyjohnston:

Replying to cehoyos:

Replying to darbyjohnston:

Hi, this might be related

Why do you think your issue is related, what makes the issue described above similar to yours?

Just that we were both getting crashes with the 4.3 release in the same function;

(There is no release support on this bug tracker)
Is your crash related to insufficient alignment?

comment:9 by darbyjohnston, 4 years ago

I don't know what is causing the crash; I'll verify with the development branch and if the crash still exists open a separate issue.

in reply to:  6 ; comment:10 by melanconj, 4 years ago

Replying to cehoyos:

Replying to melanconj:

Apologies, I should have mentioned our real world use case in the report. We've noticed this regression when trying to integrate 4.3 in our application. Our test infrastructure has been crashing consistently with the new version. In our case, the application is a C# program doing interop into swscale.

I suspect the problem is that FFmpeg offers C libraries and that there may be no system with SSSE3 support where your crash can be reproduced using libc's allocation functions. Since you are not using C, you may have to compile FFmpeg with --disable-ssse3.

I'm not sure I understand. The second sample I provided shows a small enough buffer will not be aligned. The buffer is also large enough for a small but common-enough video resolution in our usecases. So a C program can hit the issue on its own.

The issue #8532 I linked initially also shows the same crash for the input buffer. The fix seems to have changed the ssse3 assembly to use an unaligned mov for the input buffer. I'm believe the output buffer should be treated the same way.

Can't you use FFmpeg's allocation functions?

We integrate with a few other libraries directly and so far have not needed to rely on the libraries allocations for frame buffers. We manage all that from our own code. This would be a non-trivial change to our architecture.

There is something else that I would like to understand: Why didn't your application crash before using SSE2-optimized functions that also need 16-byte alignment?

I'm not very familiar with the ffmpeg codebase, nor with assembly and SSE. From looking at libswscale/x86/yuv_2_rgb.asm I see two implementations, the MMX one and the new SSSE3. The MMX, which I assume is the older one and is the SSE2 your are mentioning, seems to move from the input/output buffers using movq, which does not have an alignment requirement.

The new SSSE3 uses mova, which I understand to be an aligned-move. The fix for the input-buffer in issue #8532 in commit 828f7db5d9fdf9052bb4b6d1b528009fece5bb10 seems to have simply replaced them with movu, which does not require 16-byte alignment. I believe that since the output buffer may also be user-controlled, then alignment should not be assumed and it should use movu too.

in reply to:  4 comment:11 by James, 4 years ago

Resolution: fixed
Status: newclosed

Should be fixed in ba3e771a42c29ee02c34e7769cfc1b2dbc5c760a.

Replying to darbyjohnston:
Can you open a new ticket for this, if you can still reproduce it on master?

in reply to:  10 ; comment:12 by Carl Eugen Hoyos, 4 years ago

Replying to melanconj:

Replying to cehoyos:

Replying to melanconj:

Apologies, I should have mentioned our real world use case in the report. We've noticed this regression when trying to integrate 4.3 in our application. Our test infrastructure has been crashing consistently with the new version. In our case, the application is a C# program doing interop into swscale.

I suspect the problem is that FFmpeg offers C libraries and that there may be no system with SSSE3 support where your crash can be reproduced using libc's allocation functions. Since you are not using C, you may have to compile FFmpeg with --disable-ssse3.

I'm not sure I understand. The second sample I provided shows a small enough buffer will not be aligned.

I was unable to find a 64bit system where a non-aligned buffer would be allocated.

in reply to:  12 comment:13 by James, 4 years ago

Replying to cehoyos:

I was unable to find a 64bit system where a non-aligned buffer would be allocated.

I can reproduce it easily on an x86_32 target. malloc() will not necessarily return 16 byte aligned buffers.

Note: See TracTickets for help on using tickets.