Opened 10 years ago

Last modified 7 years ago

#3718 open enhancement

ffmpeg does not correctly read input text file.

Reported by: Maxwell175 Owned by:
Priority: wish Component: avformat
Version: git-master Keywords: concat
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:
How to reproduce:

> ffmpeg -f concat -i t
mp.txt -c copy output.wav
ffmpeg version N-60592-gfd982f2 Copyright (c) 2000-2014 the FFmpeg developers
  built on Feb 13 2014 22:01:02 with gcc 4.8.2 (GCC)
  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfi
g --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libcaca --enable-libfreetyp
e --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopenco
re-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libsp
eex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-li
bvorbis --enable-libvpx --enable-libwavpack --enable-libx264 --enable-libxavs --enable-libxvid --enable-zlib
  libavutil      52. 63.101 / 52. 63.101
  libavcodec     55. 52.101 / 55. 52.101
  libavformat    55. 32.101 / 55. 32.101
  libavdevice    55.  9.100 / 55.  9.100
  libavfilter     4.  1.102 /  4.  1.102
  libswscale      2.  5.101 /  2.  5.101
  libswresample   0. 17.104 /  0. 17.104
  libpostproc    52.  3.100 / 52.  3.100
[concat @ 003b36e0] Line 1: unknown keyword 'file'
tmp.txt: Invalid data found when processing input

This is the Windows Zeranoe Build downloaded from here: http://ffmpeg.zeranoe.com/builds/win32/static/ffmpeg-20140612-git-3a1c895-win32-static.7z

The file is written from a self-made Visual Basic program using the method described here: http://msdn.microsoft.com/en-us/library/ms128035(v=vs.110).aspx. As you can see under the Remarks section, it uses the UTF-8 encoding.

Turns out, that method also writes 3 extra chars to the file, ef bb bf. It seems like this throws off FFMPEG and it gives the error above.

Attachments (1)

tmp.txt (537 bytes ) - added by Maxwell175 10 years ago.
File used in the command

Download all attachments as: .zip

Change History (10)

by Maxwell175, 10 years ago

Attachment: tmp.txt added

File used in the command

comment:1 by Carl Eugen Hoyos, 10 years ago

Component: undeterminedavformat
Keywords: concat added
Resolution: worksforme
Status: newclosed
Version: unspecifiedgit-master

Sounds as if FFmpeg behaves as expected.
Or is there anything in the documentation that implies that you may put random bytes in front of the file keyword?

Last edited 10 years ago by Carl Eugen Hoyos (previous) (diff)

comment:2 by Maxwell175, 10 years ago

I am NOT putting in random bytes!
http://www.pcreview.co.uk/forums/extra-characters-beginning-file-ef-bb-bf-t3902307.html
The 2nd post there clearly states that is a "byte-order mark (BOM)" and I think that this SHOULD be supported.
Also see this page: http://www.unicode.org/faq/utf_bom.html#bom1. As you can see there, it is an official spec.

Last edited 10 years ago by Maxwell175 (previous) (diff)

comment:3 by Maxwell175, 10 years ago

Resolution: worksforme
Status: closedreopened

comment:4 by Cigaes, 10 years ago

Priority: normalwish
Type: defectenhancement

A byte order mark is an invisible neutral character as human-readable text goes, but it is nonetheless a character, and therefore included in a computer-readable text. Supporting it wold be possible, but verly low in my priority list, and only if it can be done in a generic way that does not require changing all parts of the code that read text files.

As a side note, you will get the same problem from a lot of other program, so you should definitely try to learn how to produce files with just what you want in them and not what any random API decides to add.

in reply to:  4 comment:5 by Maxwell175, 10 years ago

Replying to Cigaes:

A byte order mark is an invisible neutral character as human-readable text goes, but it is nonetheless a character, and therefore included in a computer-readable text. Supporting it wold be possible, but verly low in my priority list, and only if it can be done in a generic way that does not require changing all parts of the code that read text files.

Since I am also a programmer, though not a C programmer, I looked around a bit and found some code samples: https://workspaces.codeproject.com/user-8645021/reading-utf-8-with-c-streams. Also, shouldn't there be a separate function that gets called from all the places to read text files, so you would not have the same code repeated many times.



As a side note, you will get the same problem from a lot of other program, so you should definitely try to learn how to produce files with just what you want in them and not what any random API decides to add.

This is NOT some "random" API. Many people use this method, especially beginners. Since I know other methods, I can use them, but still...

Last edited 10 years ago by Carl Eugen Hoyos (previous) (diff)

comment:6 by Carl Eugen Hoyos, 10 years ago

Status: reopenedopen

comment:7 by gjdfgh, 10 years ago

If ffmpeg refuses to deal with broken Microsoft bullshit (because that's what the BOM is), so be it.

Though It would be only 1 line of code or so to skip the Microsoft bullshit.

in reply to:  7 comment:8 by Cigaes, 10 years ago

Replying to gjdfgh:

Though It would be only 1 line of code or so to skip the Microsoft bullshit.

Send the patch.

comment:9 by xamarin, 7 years ago

To me shows the error:
Line 1: unknown keyword ' ■f'
input.txt: Invalid data found when processing input

Use Notepad++ to change the mp.txt encoding from UCS-2 Little Endian to ANSI or UTF-8.

Last edited 7 years ago by xamarin (previous) (diff)
Note: See TracTickets for help on using tickets.