#1163 closed defect (fixed)
ffprobe can produce invalid XML
Reported by: | Ian Ashley | Owned by: | stefano |
---|---|---|---|
Priority: | normal | Component: | ffprobe |
Version: | git-master | Keywords: | utf8 |
Cc: | n@ndj7.com, eml+ffmpeg@tupil.com | Blocked By: | |
Blocking: | Reproduced by developer: | yes | |
Analyzed by developer: | yes |
Description
ffprobe can output invalid XML as xml_escape_str only handles < > ' " and &. For example most escape characters below 32 are invalid UTF-8.
This replacement version of the function replaces any invalid UTF-8 characters by the inverted question mark.
static const char *xml_escape_str(char **dst, size_t *dst_size, const char *src, void *log_ctx) { // the unknown character (inverted question mark) const unsigned char BAD_CHARACTER_1 = 194, BAD_CHARACTER_2 = 191; const char *p; char *q; int copyAll = 1; size_t size = 1; /* precompute size */ for (p = src; *p;) { int badChar = 0; unsigned char byte; ESCAPE_CHECK_SIZE(src, size, SIZE_MAX-10); byte = (unsigned char)*p; if (byte < 32 && byte != 9 && byte != 10 && byte != 13) { badChar = 1; ++p; } else if (byte < 128) { switch (byte) { case '&' : size += 5; /* & */ copyAll = 0; break; case '<' : size += 4; /* < */ copyAll = 0; break; case '>' : size += 4; /* > */ copyAll = 0; break; case '\"': size += 6; /* " */ copyAll = 0; break; case '\'': size += 6; /* ' */ copyAll = 0; break; default: size++; } ++p; ++size; } else if (byte < 0xC0) { badChar = 1; ++p; } else { int extra; copyAll = 0; if (byte < 0xe0) extra = 1; else if (byte < 0xf0) extra = 2; else if (byte < 0xf8) extra = 3; else badChar = 1; if (badChar) ++p; else { ++p; for (int i = 0; i < extra && *p; ++i, ++p) { byte = (unsigned char)*p; if ((byte & 0xc0) != 0x80) badChar = 1; } if (!badChar) size += extra; } } if (badChar) { size += 2; copyAll = 0; } } ESCAPE_REALLOC_BUF(dst_size, dst, src, size); #define COPY_STR(str) { \ const char *s = str; \ while (*s) \ *q++ = *s++; \ } p = src; q = *dst; if (copyAll) COPY_STR(p) else { while (*p) { int badChar = 0; unsigned char byte; byte = (unsigned char)*p; if (byte < 32 && byte != 9 && byte != 10 && byte != 13) { badChar = 1; ++p; } else if (byte < 128) { switch (byte) { case '&' : COPY_STR("&"); break; case '<' : COPY_STR("<"); break; case '>' : COPY_STR(">"); break; case '\"': COPY_STR("""); break; case '\'': COPY_STR("'"); break; default: *q++ = *p; } ++p; ++size; } else if (byte < 0xC0) { badChar = 1; ++p; } else { int extra; copyAll = 0; if (byte < 0xe0) extra = 1; else if (byte < 0xf0) extra = 2; else if (byte < 0xf8) extra = 3; else badChar = 1; if (badChar) ++p; else { const char *startChar = p; int i; ++p; for (i = 0; i < extra && *p; ++i, ++p) { byte = (unsigned char)*p; if ((byte & 0xc0) != 0x80) badChar = 1; } if (!badChar) { for (i = 0; i < extra;) *q++ = *startChar++; } } } if (badChar) { *q++ = BAD_CHARACTER_1; *q++ = BAD_CHARACTER_2; } } } *q = 0; return *dst; }
Attachments (7)
Change History (23)
comment:1 by , 13 years ago
comment:2 by , 12 years ago
Hi, I can reproduce it with ffprobe version N-42704-g85761ef built on Jul 20 2012 20:39:19
Test file can be downloaded here: http://pangea.rferl.org/videoroot/Pangeavideo/2012/08/2/28/28efb444-f6bc-430c-ae8c-50d613f7fde9.mp4
follow-ups: 4 5 comment:3 by , 12 years ago
Cc: | added |
---|
I would be willing to fund development of a patch if it can be completed/integrated soon. Anyone interested?
This is currently blocking our integration of ffmpeg thumbnailing within http://imageresizing.net. A large percentage (~40%) of videos cause ffprobe to produce invalid xml.
comment:4 by , 12 years ago
Replying to nathanaeljones:
I would be willing to fund development of a patch if it can be completed/integrated soon. Anyone interested?
First step would be to make this a reproducible ticket by providing a failing command line together with complete, uncut console output (current git head).
comment:5 by , 12 years ago
Replying to nathanaeljones:
I would be willing to fund development of a patch if it can be completed/integrated soon. Anyone interested?
This is currently blocking our integration of ffmpeg thumbnailing within http://imageresizing.net. A large percentage (~40%) of videos cause ffprobe to produce invalid xml.
I'm interested in this, but I need more samples. Also the problem seems more related to invalid input data and/or encoding, so it looks more like a libavformat problem. Data should be probably validated and sanitized at the demuxing level, before rendering is performed at the application level.
comment:6 by , 11 years ago
comment:7 by , 11 years ago
Cc: | added |
---|
by , 11 years ago
Attachment: | 0001-ffprobe-check-for-errors-and-abort-immediately.patch added |
---|
by , 11 years ago
Attachment: | 0002-lavu-avstring-add-av_get_utf8-function.patch added |
---|
by , 11 years ago
Attachment: | 0003-ffprobe-implement-string-validation-policy-setting.patch added |
---|
comment:8 by , 11 years ago
Analyzed by developer: | set |
---|---|
Keywords: | utf8 added |
Reproduced by developer: | set |
Status: | new → open |
Version: | 0.10.2 → unspecified |
Please test the patches in attachment and report.
follow-up: 11 comment:10 by , 11 years ago
Hm, not so good as I thought. On corrupted files I got segmentation fault.
by , 11 years ago
Attachment: | 0001-lavu-avstring-add-av_utf8_decode-function.patch added |
---|
by , 11 years ago
Attachment: | 0002-ffprobe-check-for-errors-and-abort-immediately.patch added |
---|
by , 11 years ago
Attachment: | 0003-ffprobe-implement-string-validation-policy-setting.2.patch added |
---|
comment:11 by , 11 years ago
Replying to mente:
Hm, not so good as I thought. On corrupted files I got segmentation fault.
Check new patchset, in case of error please provide the sample.
comment:12 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | open → closed |
I believed this should be fixed in:
commit cbba331aa02f29870581ff0b7ded7477b279ae2c Author: Stefano Sabatini <stefasab@gmail.com> Date: Wed Oct 2 16:22:17 2013 +0200 ffprobe: implement string validation setting This should fix trac tickets #1163, #2502.
Feel free to test and reopen the ticket in case of issues.
follow-up: 14 comment:13 by , 10 years ago
Analyzed by developer: | unset |
---|---|
Reproduced by developer: | unset |
Resolution: | fixed |
Status: | closed → reopened |
Version: | unspecified → git-master |
Uploaded 1163-utf8bug.rm to incoming.
ffprobe -print_format xml -show_format 1163-utf8bug.rm > 1163-utf8bug.xml ffprobe version N-63665-g921d5ae Copyright (c) 2007-2014 the FFmpeg developers built on Jun 1 2014 12:14:19 with Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn) configuration: --prefix=/Users/eelco/Projects/Beamer/FFmpeg/build --disable-shared --enable-static --disable-ffserver --disable-doc --enable-gpl --extra-cflags='-DMACOSX_DEPLOYMENT_TARGET=10.7 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk -mmacosx-version-min=10.7 -I/Users/eelco/Projects/Beamer/FFmpeg/build/include' --extra-ldflags='-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk -mmacosx-version-min=10.7 -L/Users/eelco/Projects/Beamer/FFmpeg/build/lib' libavutil 52. 88.100 / 52. 88.100 libavcodec 55. 66.100 / 55. 66.100 libavformat 55. 42.100 / 55. 42.100 libavdevice 55. 13.101 / 55. 13.101 libavfilter 4. 5.100 / 4. 5.100 libswscale 2. 6.100 / 2. 6.100 libswresample 0. 19.100 / 0. 19.100 libpostproc 52. 3.100 / 52. 3.100 [rm @ 0x7f9ae1015800] Failed to fully read block Input #0, rm, from '/Users/eelco/Desktop/1163-utf8bug.rm': Metadata: Audiences : BatchRealProducer 's Audience; audioMode : music Creation Date : 2/14/2013 1:40:55 Description : This RealMedia File Compressed by Batch Real Producer Generated By : Helix Producer SDK 10.0 for Windows, Build 10.0.0.335 Modification Date: 2/14/2013 1:40:55 videoMode : sharp Keywords : File ID : 31e4c7ec-29b7-4ffd-8ea5-40f11991751a title : tsks author : ????????TSKS??2014?? copyright : ??????ʹTSKS???????tskscn.com?? comment : Duration: 00:59:01.08, start: 0.000000, bitrate: 0 kb/s Stream #0:0: Audio: cook (cook / 0x6B6F6F63), 44100 Hz, stereo, fltp, 44 kb/s Stream #0:1: Video: rv40 (RV40 / 0x30345652), yuv420p, 624x352, 536 kb/s, 29.97 fps, 29.97 tbr, 1k tbn, 1k tbc [xml @ 0x7f9ae1012e00] 10 invalid UTF-8 sequence(s) found in string '????????TSKS??2014??', replaced with '�' [xml @ 0x7f9ae1012e00] 18 invalid UTF-8 sequence(s) found in string '??????ʹTSKS???????tskscn.com??', replaced with '�'
xmllint 1163-utf8bug.xml 1163-utf8bug.xml:14: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xC0 0xA1 0xEF 0xBF <tag key="author" value="�����??�TSKS��2014��"/> ^
by , 10 years ago
Attachment: | 1163-utf8bug.rm added |
---|
follow-up: 15 comment:14 by , 10 years ago
Replying to eelco:
Uploaded 1163-utf8bug.rm to incoming.
[...]
xmllint 1163-utf8bug.xml 1163-utf8bug.xml:14: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xC0 0xA1 0xEF 0xBF <tag key="author" value="�����??�TSKS��2014��"/> ^
C0A1
is an overlong encoding for character !
(decimal number 33), using two bytes in place of one. Since this is not allowed we should probably fix the decoder to disallow it.
comment:15 by , 10 years ago
Analyzed by developer: | set |
---|---|
Component: | ffprobe → avutil |
Reproduced by developer: | set |
Resolution: | → fixed |
Status: | reopened → closed |
Replying to saste:
[...]
C0A1
is an overlong encoding for character!
(decimal number 33), using two bytes in place of one. Since this is not allowed we should probably fix the decoder to disallow it.
Should be fixed in:
commit d4ec07dfe7dbc86e8f6403781c511b9463a526d2 Author: Stefano Sabatini <stefasab@gmail.com> Date: Thu Aug 28 17:37:27 2014 +0200 lavu/avstring: check for overlong encodings in av_utf8_decode() Fix reopened trac ticket #1163.
comment:16 by , 10 years ago
Component: | avutil → ffprobe |
---|
Is the problem also reproducible with current git head?
Please send unified patches to ffmpeg-devel and please provide an example (including console) on how to produce the invalid xml.