Opened 12 years ago

Closed 12 years ago

#764 closed defect (worksforme)

Increasing robustness of runlength decoding for scantable access in mpeg12.c

Reported by: erik Owned by:
Priority: normal Component: avcodec
Version: unspecified Keywords: mpeg robustness
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

When decoding australian MPEG2 video broadcast with many stream errors the decoder causes a memory error crash because the index into the scan table is far outside the array.
The simple dirty fix is to mask the index into the scan table to a value between 0 and 63 to prevent memory access error.
The attached diff files documents the required patch.
As a result the robustness of the decoder has increased.

Attachments (2)

diff.txt (6.7 KB ) - added by erik 12 years ago.
diff file documenting a suggested robustness patch to mpeg12.c
(2011-08-31 19-58) GEM.ts (2.0 MB ) - added by erik 12 years ago.

Change History (18)

by erik, 12 years ago

Attachment: diff.txt added

diff file documenting a suggested robustness patch to mpeg12.c

comment:1 by Carl Eugen Hoyos, 12 years ago

Can you provide a sample that crashes?

Please consider sending patches to ffmpeg-devel, they typically get more attention there.

comment:2 by erik, 12 years ago

Yes, but it is somewhere in a 4 GByte recording. Crashing depends on OS and memory allocation so I had great difficulty to get a reproducable crash, any change of code or code ordering can make it disappear and come back randomly in other corrupted recordings.
Libmpeg2 has a similar robustness precaution. There it is better implemented. They explicitly check on values <0 or >= 64 and have a fallback in case the runlength decoding is corrupt. My proposal is rude but simple.

On posting on ffmpeg-devel. I hesitate to claim the status to be able to contribute there.

comment:3 by reimar, 12 years ago

I had great difficulty to get a reproducable crash

Even if the crash is not reproducible a valgrind error usually is.
And you _should_ be able to extract a small sample from the place it crashes and it should still crash (though it might cost some time to find the right spot).
Concerning the patch: The maintainer will have to say, but it might make more sense to just move the existing i > 63 check up so it is done before the access instead of after.
Though maybe even better just extend the ScanTable struct so we can always read some more data - if "run" can e.g. never become more that 64 an extra 64 bytes at the end would fix it. Has the advantage of possibly helping for other codecs, too, should they have similar issues.
And FFmpeg has a "fallback" for such corruption, it is called error concealment and it is run afterwards and I don't remember anything that would indicate libmpeg2 is any better at it (which doesn't mean too much though).

comment:4 by erik, 12 years ago

I will put some range check debug message writing code before the scan table access to quickly detect where the error is so I do not need to replicate a crash.
The wrong index was -31023 or something in that range so adding a bit to the table won't help.

by erik, 12 years ago

Attachment: (2011-08-31 19-58) GEM.ts added

comment:5 by erik, 12 years ago

OK, I have a 2 MByte .ts file that demonstrates the array bound error on the scan table. I have attached it to this ticket.
It does not crash as the index reaches a maximum value of around 700.

comment:6 by Carl Eugen Hoyos, 12 years ago

Your sample does not show any invalid memory access with valgrind here.

You can upload larger samples to http://www.datafilehost.com/ or ftp://upload.ffmpeg.org/incoming

comment:7 by erik, 12 years ago

I instrumented the code to log the illegal values.
I use avformat for the demuxing and avcodec for decoding.
Will run the patched SW on a week worth of recording on 11 channels.
If there is no more crashing (and there used to be a lot) then at least I am happy
But there is of course the possibility that this index error is a side effect of another memory corruption error, then I am back to square 1

comment:8 by erik, 12 years ago

No more crashing after a couple of Terrabyte video processing.
I am happy.

comment:9 by erik, 12 years ago

One more question.
Can valgrind find errors in pointer calculation?
Part of the scantable access is done by type array indexing but the last use case is done by pointer arithmic

Code fragment from mpeg12.c:

uint8_t * scantable = s->intra_scantable.permutated;

....

if (level == 127) {

break;

} else if (level != 0) {

scantable += run;
j = *scantable;

comment:10 by reimar, 12 years ago

Part of the scantable access is done by type array indexing but the last use case is done by pointer arithmic

Valgrind runs the executable on an emulated CPU. What the source code looks like has no relevance. However valgind can sometimes miss a bad access if there is valid memory, but it is rather unlikely.

The wrong index was -31023 or something in that range so adding a bit to the table won't help.

The run comes out of a VLC table. The VLC table should never contain an entry for a run length outside the 0 - 63 range or something like that.
Thus I can't see how you should be able to get that value except by corrupt VLC tables.
Haven't yet tested the sample.

comment:11 by reimar, 12 years ago

It does not crash as the index reaches a maximum value of around 700.

Where did you get this from? The maximum I get with that sample is 77:
[mpeg2video @ 0x2c5db40] 3ac-tex damaged at 64 2 67 5
[mpeg2video @ 0x2c5db40] 3ac-tex damaged at 64 2 67 5
[mpeg2video @ 0x2c5db40] 3ac-tex damaged at 67 7 77 65
[mpeg2video @ 0x2c5db40] 3ac-tex damaged at 67 19 66 65
[mpeg2video @ 0x2c5db40] 2ac-tex damaged at 48 37 65 65

Output generated by this patch:

--- a/libavcodec/mpeg12.c
+++ b/libavcodec/mpeg12.c
@@ -140,7 +140,7 @@ static inline int mpeg1_decode_block_intra(MpegEncContext *s, DCTELEM *block, in
                 }
             }
             if (i > 63) {
-                av_log(s->avctx, AV_LOG_ERROR, "ac-tex damaged at %d %d\n", s->mb_x, s->mb_y);
+                av_log(s->avctx, AV_LOG_ERROR, "0ac-tex damaged at %d %d %d %d\n", s->mb_x, s->mb_y, i, run);
                 return -1;
             }
 
@@ -215,7 +215,7 @@ static inline int mpeg1_decode_block_inter(MpegEncContext *s, DCTELEM *block, in
                 }
             }
             if (i > 63) {
-                av_log(s->avctx, AV_LOG_ERROR, "ac-tex damaged at %d %d\n", s->mb_x, s->mb_y);
+                av_log(s->avctx, AV_LOG_ERROR, "1ac-tex damaged at %d %d %d %d\n", s->mb_x, s->mb_y, i, run);
                 return -1;
             }
 
@@ -363,7 +363,7 @@ static inline int mpeg2_decode_block_non_intra(MpegEncContext *s, DCTELEM *block
                 }
             }
             if (i > 63) {
-                av_log(s->avctx, AV_LOG_ERROR, "ac-tex damaged at %d %d\n", s->mb_x, s->mb_y);
+                av_log(s->avctx, AV_LOG_ERROR, "2ac-tex damaged at %d %d %d %d\n", s->mb_x, s->mb_y, i, run);
                 return -1;
             }
 
@@ -508,7 +508,7 @@ static inline int mpeg2_decode_block_intra(MpegEncContext *s, DCTELEM *block, in
                 }
             }
             if (i > 63) {
-                av_log(s->avctx, AV_LOG_ERROR, "ac-tex damaged at %d %d\n", s->mb_x, s->mb_y);
+                av_log(s->avctx, AV_LOG_ERROR, "3ac-tex damaged at %d %d %d %d\n", s->mb_x, s->mb_y, i, run);
                 return -1;
             }
 
Last edited 12 years ago by reimar (previous) (diff)

comment:12 by erik, 12 years ago

The ac-tex check is missing in
mpeg1_fast_decode_block_inter
mpeg2_fast_decode_block_non_intra

When I put

 if (i < 0 || i >= 64 ) av_log(s->avctx, AV_LOG_ERROR,  "scantable error, i = %d\n", i);

just before the scan table access I get many of these error messages
example:
[mpeg2video @ 003d69c0] scantable error, i = 330

comment:13 by reimar, 12 years ago

Don't you think it would have helped a lot if you had mentioned you used the "fast" flag?
While its effect for decoding may not be fully documented the point if "fast" is to decode as fast as possible. It is not intended to be used for corrupted streams. As long as it is not exploitable crashing is perfectly acceptable if it allows for faster code, and error concealment is unlikely to work.

comment:14 by erik, 12 years ago

Your remark clearly proves I should NOT post on the developers mailing list.
Do accept my apologies.
As the users of Comskip want fastest possible processing I always have "fast" set. Also lowres is either 1 or 2. But if possible I would like to prevent crashing on off-air recordings so I am perfectly happy with the small check I added as it stopped all crashing on many terrabytes of DVB-T recordings.

No more requests from my side then.

comment:15 by reimar, 12 years ago

Do you have any numbers on how much "fast" helps? MPEG-2 decoding should be really cheap by today's standards and I wouldn't expect it to really help much.
Worse, with errors as in your case it will even process a large amount of data for no purpose.
My recommendation would be
1) try without fast flag. You get error concealment and it might not have a relevant cost. (note: only for decoding, if you are also encoding you should keep it for that part).
2) Should it be relevantly slower, try leaving the fast flag in but use the non-fast functions for this case.
3) Add the > 63 check as in the other cases instead. It should be only little to not at all slower than your change for the normal decoding case and it should be _much_ faster for the error case (if you reached values like 7000 that would indicate that you looped around that loop about 100 times more than necessary).

And I don't want you to apologize, and I don't want you to give up because I sometimes get a bit annoyed. But it is a really good idea for you to be aware of any non-default configuration you have chosen and to mention it in bug reports. And also to make sure you have a solid justification for it, since they might cause you a significant maintenance cost in the long term. Apart from that there clearly is a lack of documentation we are to blame for.

comment:16 by reimar, 12 years ago

Resolution: worksforme
Status: newclosed

I hope you found an acceptable solution, I am closing as worksforme since current behaviour seems sensible to me.

Note: See TracTickets for help on using tickets.