Opened 4 years ago

Last modified 6 months ago

#3718 open enhancement

ffmpeg does not correctly read input text file.

Reported by: Maxwell175 Owned by:
Priority: wish Component: avformat
Version: git-master Keywords: concat
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:
How to reproduce:

> ffmpeg -f concat -i t
mp.txt -c copy output.wav
ffmpeg version N-60592-gfd982f2 Copyright (c) 2000-2014 the FFmpeg developers
  built on Feb 13 2014 22:01:02 with gcc 4.8.2 (GCC)
  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-avisynth --enable-bzlib --enable-fontconfi
g --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libcaca --enable-libfreetyp
e --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopenco
re-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libsp
eex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-aacenc --enable-libvo-amrwbenc --enable-li
bvorbis --enable-libvpx --enable-libwavpack --enable-libx264 --enable-libxavs --enable-libxvid --enable-zlib
  libavutil      52. 63.101 / 52. 63.101
  libavcodec     55. 52.101 / 55. 52.101
  libavformat    55. 32.101 / 55. 32.101
  libavdevice    55.  9.100 / 55.  9.100
  libavfilter     4.  1.102 /  4.  1.102
  libswscale      2.  5.101 /  2.  5.101
  libswresample   0. 17.104 /  0. 17.104
  libpostproc    52.  3.100 / 52.  3.100
[concat @ 003b36e0] Line 1: unknown keyword 'file'
tmp.txt: Invalid data found when processing input

This is the Windows Zeranoe Build downloaded from here: http://ffmpeg.zeranoe.com/builds/win32/static/ffmpeg-20140612-git-3a1c895-win32-static.7z

The file is written from a self-made Visual Basic program using the method described here: http://msdn.microsoft.com/en-us/library/ms128035(v=vs.110).aspx. As you can see under the Remarks section, it uses the UTF-8 encoding.

Turns out, that method also writes 3 extra chars to the file, ef bb bf. It seems like this throws off FFMPEG and it gives the error above.

Attachments (1)

tmp.txt (537 bytes) - added by Maxwell175 4 years ago.
File used in the command

Download all attachments as: .zip

Change History (10)

Changed 4 years ago by Maxwell175

File used in the command

comment:1 Changed 4 years ago by cehoyos

  • Component changed from undetermined to avformat
  • Keywords concat added
  • Resolution set to worksforme
  • Status changed from new to closed
  • Version changed from unspecified to git-master

Sounds as if FFmpeg behaves as expected.
Or is there anything in the documentation that implies that you may put random bytes in front of the file keyword?

Last edited 4 years ago by cehoyos (previous) (diff)

comment:2 Changed 4 years ago by Maxwell175

I am NOT putting in random bytes!
http://www.pcreview.co.uk/forums/extra-characters-beginning-file-ef-bb-bf-t3902307.html
The 2nd post there clearly states that is a "byte-order mark (BOM)" and I think that this SHOULD be supported.
Also see this page: http://www.unicode.org/faq/utf_bom.html#bom1. As you can see there, it is an official spec.

Last edited 4 years ago by Maxwell175 (previous) (diff)

comment:3 Changed 4 years ago by Maxwell175

  • Resolution worksforme deleted
  • Status changed from closed to reopened

comment:4 follow-up: Changed 4 years ago by Cigaes

  • Priority changed from normal to wish
  • Type changed from defect to enhancement

A byte order mark is an invisible neutral character as human-readable text goes, but it is nonetheless a character, and therefore included in a computer-readable text. Supporting it wold be possible, but verly low in my priority list, and only if it can be done in a generic way that does not require changing all parts of the code that read text files.

As a side note, you will get the same problem from a lot of other program, so you should definitely try to learn how to produce files with just what you want in them and not what any random API decides to add.

comment:5 in reply to: ↑ 4 Changed 4 years ago by Maxwell175

Replying to Cigaes:

A byte order mark is an invisible neutral character as human-readable text goes, but it is nonetheless a character, and therefore included in a computer-readable text. Supporting it wold be possible, but verly low in my priority list, and only if it can be done in a generic way that does not require changing all parts of the code that read text files.

Since I am also a programmer, though not a C programmer, I looked around a bit and found some code samples: https://workspaces.codeproject.com/user-8645021/reading-utf-8-with-c-streams. Also, shouldn't there be a separate function that gets called from all the places to read text files, so you would not have the same code repeated many times.



As a side note, you will get the same problem from a lot of other program, so you should definitely try to learn how to produce files with just what you want in them and not what any random API decides to add.

This is NOT some "random" API. Many people use this method, especially beginners. Since I know other methods, I can use them, but still...

Last edited 4 years ago by cehoyos (previous) (diff)

comment:6 Changed 4 years ago by cehoyos

  • Status changed from reopened to open

comment:7 follow-up: Changed 4 years ago by gjdfgh

If ffmpeg refuses to deal with broken Microsoft bullshit (because that's what the BOM is), so be it.

Though It would be only 1 line of code or so to skip the Microsoft bullshit.

comment:8 in reply to: ↑ 7 Changed 4 years ago by Cigaes

Replying to gjdfgh:

Though It would be only 1 line of code or so to skip the Microsoft bullshit.

Send the patch.

comment:9 Changed 6 months ago by xamarin

To me shows the error:
Line 1: unknown keyword ' ■f'
input.txt: Invalid data found when processing input

Use Notepad++ to change the mp.txt encoding from UCS-2 Little Endian to ANSI or UTF-8.

Last edited 6 months ago by xamarin (previous) (diff)
Note: See TracTickets for help on using tickets.