#1163 closed defect (fixed)
ffprobe can produce invalid XML
| Reported by: | Ian Ashley | Owned by: | stefano |
|---|---|---|---|
| Priority: | normal | Component: | ffprobe |
| Version: | git-master | Keywords: | utf8 |
| Cc: | n@ndj7.com, eml+ffmpeg@tupil.com | Blocked By: | |
| Blocking: | Reproduced by developer: | yes | |
| Analyzed by developer: | yes |
Description
ffprobe can output invalid XML as xml_escape_str only handles < > ' " and &. For example most escape characters below 32 are invalid UTF-8.
This replacement version of the function replaces any invalid UTF-8 characters by the inverted question mark.
static const char *xml_escape_str(char **dst, size_t *dst_size, const char *src, void *log_ctx) { // the unknown character (inverted question mark) const unsigned char BAD_CHARACTER_1 = 194, BAD_CHARACTER_2 = 191; const char *p; char *q; int copyAll = 1; size_t size = 1; /* precompute size */ for (p = src; *p;) { int badChar = 0; unsigned char byte; ESCAPE_CHECK_SIZE(src, size, SIZE_MAX-10); byte = (unsigned char)*p; if (byte < 32 && byte != 9 && byte != 10 && byte != 13) { badChar = 1; ++p; } else if (byte < 128) { switch (byte) { case '&' : size += 5; /* & */ copyAll = 0; break; case '<' : size += 4; /* < */ copyAll = 0; break; case '>' : size += 4; /* > */ copyAll = 0; break; case '\"': size += 6; /* " */ copyAll = 0; break; case '\'': size += 6; /* ' */ copyAll = 0; break; default: size++; } ++p; ++size; } else if (byte < 0xC0) { badChar = 1; ++p; } else { int extra; copyAll = 0; if (byte < 0xe0) extra = 1; else if (byte < 0xf0) extra = 2; else if (byte < 0xf8) extra = 3; else badChar = 1; if (badChar) ++p; else { ++p; for (int i = 0; i < extra && *p; ++i, ++p) { byte = (unsigned char)*p; if ((byte & 0xc0) != 0x80) badChar = 1; } if (!badChar) size += extra; } } if (badChar) { size += 2; copyAll = 0; } } ESCAPE_REALLOC_BUF(dst_size, dst, src, size); #define COPY_STR(str) { \ const char *s = str; \ while (*s) \ *q++ = *s++; \ } p = src; q = *dst; if (copyAll) COPY_STR(p) else { while (*p) { int badChar = 0; unsigned char byte; byte = (unsigned char)*p; if (byte < 32 && byte != 9 && byte != 10 && byte != 13) { badChar = 1; ++p; } else if (byte < 128) { switch (byte) { case '&' : COPY_STR("&"); break; case '<' : COPY_STR("<"); break; case '>' : COPY_STR(">"); break; case '\"': COPY_STR("""); break; case '\'': COPY_STR("'"); break; default: *q++ = *p; } ++p; ++size; } else if (byte < 0xC0) { badChar = 1; ++p; } else { int extra; copyAll = 0; if (byte < 0xe0) extra = 1; else if (byte < 0xf0) extra = 2; else if (byte < 0xf8) extra = 3; else badChar = 1; if (badChar) ++p; else { const char *startChar = p; int i; ++p; for (i = 0; i < extra && *p; ++i, ++p) { byte = (unsigned char)*p; if ((byte & 0xc0) != 0x80) badChar = 1; } if (!badChar) { for (i = 0; i < extra;) *q++ = *startChar++; } } } if (badChar) { *q++ = BAD_CHARACTER_1; *q++ = BAD_CHARACTER_2; } } } *q = 0; return *dst; }
Attachments (7)
Change History (23)
comment:1 by , 14 years ago
comment:2 by , 13 years ago
Hi, I can reproduce it with ffprobe version N-42704-g85761ef built on Jul 20 2012 20:39:19
Test file can be downloaded here: http://pangea.rferl.org/videoroot/Pangeavideo/2012/08/2/28/28efb444-f6bc-430c-ae8c-50d613f7fde9.mp4
follow-ups: 4 5 comment:3 by , 13 years ago
| Cc: | added |
|---|
I would be willing to fund development of a patch if it can be completed/integrated soon. Anyone interested?
This is currently blocking our integration of ffmpeg thumbnailing within http://imageresizing.net. A large percentage (~40%) of videos cause ffprobe to produce invalid xml.
comment:4 by , 13 years ago
Replying to nathanaeljones:
I would be willing to fund development of a patch if it can be completed/integrated soon. Anyone interested?
First step would be to make this a reproducible ticket by providing a failing command line together with complete, uncut console output (current git head).
comment:5 by , 13 years ago
Replying to nathanaeljones:
I would be willing to fund development of a patch if it can be completed/integrated soon. Anyone interested?
This is currently blocking our integration of ffmpeg thumbnailing within http://imageresizing.net. A large percentage (~40%) of videos cause ffprobe to produce invalid xml.
I'm interested in this, but I need more samples. Also the problem seems more related to invalid input data and/or encoding, so it looks more like a libavformat problem. Data should be probably validated and sanitized at the demuxing level, before rendering is performed at the application level.
comment:6 by , 12 years ago
comment:7 by , 12 years ago
| Cc: | added |
|---|
by , 12 years ago
| Attachment: | 0001-ffprobe-check-for-errors-and-abort-immediately.patch added |
|---|
by , 12 years ago
| Attachment: | 0002-lavu-avstring-add-av_get_utf8-function.patch added |
|---|
by , 12 years ago
| Attachment: | 0003-ffprobe-implement-string-validation-policy-setting.patch added |
|---|
comment:8 by , 12 years ago
| Analyzed by developer: | set |
|---|---|
| Keywords: | utf8 added |
| Reproduced by developer: | set |
| Status: | new → open |
| Version: | 0.10.2 → unspecified |
Please test the patches in attachment and report.
follow-up: 11 comment:10 by , 12 years ago
Hm, not so good as I thought. On corrupted files I got segmentation fault.
by , 12 years ago
| Attachment: | 0001-lavu-avstring-add-av_utf8_decode-function.patch added |
|---|
by , 12 years ago
| Attachment: | 0002-ffprobe-check-for-errors-and-abort-immediately.patch added |
|---|
by , 12 years ago
| Attachment: | 0003-ffprobe-implement-string-validation-policy-setting.2.patch added |
|---|
comment:11 by , 12 years ago
Replying to mente:
Hm, not so good as I thought. On corrupted files I got segmentation fault.
Check new patchset, in case of error please provide the sample.
comment:12 by , 12 years ago
| Resolution: | → fixed |
|---|---|
| Status: | open → closed |
I believed this should be fixed in:
commit cbba331aa02f29870581ff0b7ded7477b279ae2c
Author: Stefano Sabatini <stefasab@gmail.com>
Date: Wed Oct 2 16:22:17 2013 +0200
ffprobe: implement string validation setting
This should fix trac tickets #1163, #2502.
Feel free to test and reopen the ticket in case of issues.
follow-up: 14 comment:13 by , 12 years ago
| Analyzed by developer: | unset |
|---|---|
| Reproduced by developer: | unset |
| Resolution: | fixed |
| Status: | closed → reopened |
| Version: | unspecified → git-master |
Uploaded 1163-utf8bug.rm to incoming.
ffprobe -print_format xml -show_format 1163-utf8bug.rm > 1163-utf8bug.xml
ffprobe version N-63665-g921d5ae Copyright (c) 2007-2014 the FFmpeg developers
built on Jun 1 2014 12:14:19 with Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
configuration: --prefix=/Users/eelco/Projects/Beamer/FFmpeg/build --disable-shared --enable-static --disable-ffserver --disable-doc --enable-gpl --extra-cflags='-DMACOSX_DEPLOYMENT_TARGET=10.7 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk -mmacosx-version-min=10.7 -I/Users/eelco/Projects/Beamer/FFmpeg/build/include' --extra-ldflags='-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk -mmacosx-version-min=10.7 -L/Users/eelco/Projects/Beamer/FFmpeg/build/lib'
libavutil 52. 88.100 / 52. 88.100
libavcodec 55. 66.100 / 55. 66.100
libavformat 55. 42.100 / 55. 42.100
libavdevice 55. 13.101 / 55. 13.101
libavfilter 4. 5.100 / 4. 5.100
libswscale 2. 6.100 / 2. 6.100
libswresample 0. 19.100 / 0. 19.100
libpostproc 52. 3.100 / 52. 3.100
[rm @ 0x7f9ae1015800] Failed to fully read block
Input #0, rm, from '/Users/eelco/Desktop/1163-utf8bug.rm':
Metadata:
Audiences : BatchRealProducer 's Audience;
audioMode : music
Creation Date : 2/14/2013 1:40:55
Description : This RealMedia File Compressed by Batch Real Producer
Generated By : Helix Producer SDK 10.0 for Windows, Build 10.0.0.335
Modification Date: 2/14/2013 1:40:55
videoMode : sharp
Keywords :
File ID : 31e4c7ec-29b7-4ffd-8ea5-40f11991751a
title : tsks
author : ????????TSKS??2014??
copyright : ??????ʹTSKS???????tskscn.com??
comment :
Duration: 00:59:01.08, start: 0.000000, bitrate: 0 kb/s
Stream #0:0: Audio: cook (cook / 0x6B6F6F63), 44100 Hz, stereo, fltp, 44 kb/s
Stream #0:1: Video: rv40 (RV40 / 0x30345652), yuv420p, 624x352, 536 kb/s, 29.97 fps, 29.97 tbr, 1k tbn, 1k tbc
[xml @ 0x7f9ae1012e00] 10 invalid UTF-8 sequence(s) found in string '????????TSKS??2014??', replaced with '�'
[xml @ 0x7f9ae1012e00] 18 invalid UTF-8 sequence(s) found in string '??????ʹTSKS???????tskscn.com??', replaced with '�'
xmllint 1163-utf8bug.xml
1163-utf8bug.xml:14: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xC0 0xA1 0xEF 0xBF
<tag key="author" value="�����??�TSKS��2014��"/>
^
by , 12 years ago
| Attachment: | 1163-utf8bug.rm added |
|---|
follow-up: 15 comment:14 by , 11 years ago
Replying to eelco:
Uploaded 1163-utf8bug.rm to incoming.
[...]
xmllint 1163-utf8bug.xml
1163-utf8bug.xml:14: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xC0 0xA1 0xEF 0xBF
<tag key="author" value="�����??�TSKS��2014��"/>
^
C0A1 is an overlong encoding for character ! (decimal number 33), using two bytes in place of one. Since this is not allowed we should probably fix the decoder to disallow it.
comment:15 by , 11 years ago
| Analyzed by developer: | set |
|---|---|
| Component: | ffprobe → avutil |
| Reproduced by developer: | set |
| Resolution: | → fixed |
| Status: | reopened → closed |
Replying to saste:
[...]
C0A1is an overlong encoding for character!(decimal number 33), using two bytes in place of one. Since this is not allowed we should probably fix the decoder to disallow it.
Should be fixed in:
commit d4ec07dfe7dbc86e8f6403781c511b9463a526d2
Author: Stefano Sabatini <stefasab@gmail.com>
Date: Thu Aug 28 17:37:27 2014 +0200
lavu/avstring: check for overlong encodings in av_utf8_decode()
Fix reopened trac ticket #1163.
comment:16 by , 11 years ago
| Component: | avutil → ffprobe |
|---|



Is the problem also reproducible with current git head?
Please send unified patches to ffmpeg-devel and please provide an example (including console) on how to produce the invalid xml.