Opened 11 years ago

Closed 10 years ago

#3327 closed defect (fixed)

libavformat fails to detect some mp3 files reliably

Reported by: gjdfgh Owned by:
Priority: normal Component: avformat
Version: git-master Keywords: mp3
Cc: Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: no

Description

Summary of the bug:

For a small number of mp3 files, libavformat detects them with a probe score of only 24. This is below the "reliable" and recommended minimum score (AFAIK AVPROBE_SCORE_MAX/4 + 1, which is 26), and causes problems for applications. Applications want to respect the recommended minimum score to avoid playing random non-media files as media files, which would result in rather crappy behavior compared to just rejecting the file. Thus, for reasonably non-broken mp3 files, this definitely should return a higher probe score.

How to reproduce:

$ build_libs/bin/ffprobe /tmp/sample.mp3 
ffprobe version N-59969-g5295735 Copyright (c) 2007-2014 the FFmpeg developers
  built on Jan 19 2014 12:58:57 with gcc 4.8 (Debian 4.8.2-12)
  configuration: --prefix=/tmp/upstream_mpv/build_libs --enable-static --disable-shared --enable-gpl --enable-avresample --disable-debug --disable-doc
  libavutil      52. 62.100 / 52. 62.100
  libavcodec     55. 48.101 / 55. 48.101
  libavformat    55. 24.100 / 55. 24.100
  libavdevice    55.  5.102 / 55.  5.102
  libavfilter     4.  1.100 /  4.  1.100
  libavresample   1.  1.  0 /  1.  1.  0
  libswscale      2.  5.101 /  2.  5.101
  libswresample   0. 17.104 /  0. 17.104
  libpostproc    52.  3.100 / 52.  3.100
[mp3 @ 0xa8b8880] Format mp3 detected only with low score of 24, misdetection possible!
[mp3 @ 0xa8b8880] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/tmp/sample.mp3':
  Metadata:
    track           : 01
    genre           : Oldies
  Duration: 00:01:36.00, start: 0.000000, bitrate: 256 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 256 kb/s

Patches should be submitted to the ffmpeg-devel mailing list and not this bug tracker.

Attachments (1)

issue3327-libc-2.17.so (1.5 MB ) - added by Carl Eugen Hoyos 11 years ago.

Download all attachments as: .zip

Change History (10)

comment:1 by gjdfgh, 11 years ago

Sample uploaded as "sample.mp3" on the incoming ftp.

comment:2 by Carl Eugen Hoyos, 11 years ago

Keywords: mp3 added

comment:3 by gjdfgh, 11 years ago

The first test file I sent isn't very good: it has a broken id3 tag (not recognized by the id3v2 tool) followed by large zero padding.

Uploaded another file (issue3327_2.mp3) which has a large (this time valid) ID3 tag at the beginning.

One could argue that these files are hard to detect, because the start of the files don't even contain a single mp3 frame, and possibly go over the normal maximum probe size, but even "ffprobe -probesize 10485760 issue3327_2.mp3" returns a score of 24.

Note that I think it should return at least AVPROBE_SCORE_EXTENSION if the file extension matches (which it currently doesn't).

in reply to:  3 comment:4 by Carl Eugen Hoyos, 11 years ago

Replying to gjdfgh:

Note that I think it should return at least AVPROBE_SCORE_EXTENSION if the file extension matches (which it currently doesn't).

Are you sure that if a format is currently only detected by extension AVPROBE_SCORE_EXTENSION is returned? And do you think it would be good if AVPROBE_SCORE_EXTENSION would be returned although no indication except the extension existed? I suspect you are just being fooled by a bad name of a preprocessor definition (and the bad probing of avconv that fails for several user-reported samples like this one, making it necessary for them to return 25 just for matching extensions).

$ rm test.mp3
$ touch test.mp3
$ ffmpeg -i test.mp3

comment:5 by gjdfgh, 11 years ago

Are you sure that if a format is currently only detected by extension AVPROBE_SCORE_EXTENSION is returned?

I don't know it's what I would expect.

avformat.h says: #define AVPROBE_SCORE_EXTENSION 50 ///< score for file extension

So yes, I would expect that if the extension matches, that value is returned. Maybe I'm wrong; it does look like the score is a little bit too high for that, as it's much higher than the "safe" score AVPROBE_SCORE_RETRY+1, which is 26.

As for whether to return 25 or 26 on a "good guess", I think it would be good to return 26 is there are signs that this might be a good mp3 file, probing uses a good amount of data (e.g. 1 MB), and there are no other formats with better score.

Anyway, currently mp3 probing is pretty bad. For example, libavformat recognizes a random ELF binary as mp3 with probe score 50 after only 64KB of data! And that while legitimate mp3s are not recognized reliably, even when passing the whole file to the probe function! Using the file extension instead of the sophisticated probing in mp3_read_probe would be more reliable.

avconv

Didn't perform better on this sample, AFAIR.

in reply to:  5 comment:6 by Carl Eugen Hoyos, 11 years ago

Replying to gjdfgh:

Are you sure that if a format is currently only detected by extension AVPROBE_SCORE_EXTENSION is returned?

I don't know it's what I would expect.

avformat.h says: #define AVPROBE_SCORE_EXTENSION 50 ///< score for file extension

I completely forgot: Another good example on how great the "cosmetics" are, thank you for pointing that out!

So yes, I would expect that if the extension matches, that value is returned.

It is just a wrong commit (that nobody so far complained about but you fortunately know where to direct your flames).

Maybe I'm wrong; it does look like the score is a little bit too high for that, as it's much higher than the "safe" score AVPROBE_SCORE_RETRY+1, which is 26.

As for whether to return 25 or 26 on a "good guess", I think it would be good to return 26 is there are signs that this might be a good mp3 file, probing uses a good amount of data (e.g. 1 MB), and there are no other formats with better score.

So you agree that what FFmpeg reports is correct? After all, it does report the correct file type or do I miss something?

Maybe unrelated: Note that you cannot simply raise a score because mp3, mpegvideo, mpeg-ps, mpeg-ts etc. fight for the "best" score for many real-world samples.

Anyway, currently mp3 probing is pretty bad. For example, libavformat recognizes a random ELF binary as mp3 with probe score 50 after only 64KB of data!

*Please* provide such samples!

And that while legitimate mp3s are not recognized reliably, even when passing the whole file to the probe function!

Define "reliably": Afaict, for the file in question, FFmpeg succeeds with auto-detection (which is apparently not trivial, see "other" applications), I don't find "I am not 100% sure" so bad for mp3 files with large attachments.

Using the file extension instead of the sophisticated probing in mp3_read_probe would be more reliable.

This opinion seems to be in strong contrast to what FFmpeg stands for (since forever) and most people always saw that as one of the largest advantages over WMP etc.
And it was always considered a bug that there is no auto-detection for image types.

avconv

Didn't perform better on this sample, AFAIR.

It performs extremely bad on this (and some related) sample.

Note that I am not saying there is no issue: If the extension and the auto-detection agree, the score should probably be raised.

comment:7 by gjdfgh, 11 years ago

So you agree that what FFmpeg reports is correct? After all, it does report the correct file type or do I miss something?

I'm doing this for a video player. I do not want libavformat to "recognize" random binary files as media files (which it does often with low probescore/probesize), OTOH I sure want it do detect simple mp3 files. Usually this works great by requiring at least AVPROBE_SCORE_RETRY+1, except for some mp3 files.

So yes, I see a problem here.

Define "reliably": Afaict, for the file in question, FFmpeg succeeds with auto-detection (which is apparently not trivial, see "other" applications), I don't find "I am not 100% sure" so bad for mp3 files with large attachments.

I wouldn't mind feeding a few megabytes of mp3 data to the probe function in the case of large attachments, but even then it doesn't "work".

No, I do not want to play back ELF files as mp3, which is why I think some work is needed here.

*Please* provide such samples!

Uploaded issue3327-libc-2.17.so. It's recognized as mp3, and even decodes some audio artifacts.

Using the file extension instead of the sophisticated probing in mp3_read_probe would be more reliable.

This opinion seems to be in strong contrast to what FFmpeg stands for (since forever) and most people always saw that as one of the largest advantages over WMP etc.

I generally agree that it's better to determine the file format by content and not by extension. However, although this is quite a pathological case, the bad mp3 detection does lead to user complaints. And lowering the probescore for file detection isn't a solution either, because you get too many false positives.

(For now I've special cased mp3 in my player, so strictly speaking this is not an issue for me anymore.)

by Carl Eugen Hoyos, 11 years ago

Attachment: issue3327-libc-2.17.so added

comment:8 by Carl Eugen Hoyos, 11 years ago

Not a regression.

$ file issue3327-libc-2.17.so
issue3327-libc-2.17.so: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), BuildID[sha1]=0xa788c2ee6393d50dc870d6c1c77f3bf50063f774, for GNU/Linux 2.6.32, stripped
$ ffmpeg -i issue3327-libc-2.17.so
ffmpeg version N-60202-g6369766 Copyright (c) 2000-2014 the FFmpeg developers
  built on Jan 27 2014 11:19:29 with gcc 4.7 (SUSE Linux)
  configuration: --enable-gpl
  libavutil      52. 63.100 / 52. 63.100
  libavcodec     55. 49.100 / 55. 49.100
  libavformat    55. 26.100 / 55. 26.100
  libavdevice    55.  5.102 / 55.  5.102
  libavfilter     4.  1.101 /  4.  1.101
  libswscale      2.  5.101 /  2.  5.101
  libswresample   0. 17.104 /  0. 17.104
  libpostproc    52.  3.100 / 52.  3.100
[mp3 @ 0x331c100] Format mp3 detected only with low score of 25, misdetection possible!
[mp3 @ 0x331cb40] Header missing
    Last message repeated 88 times
[mp3 @ 0x331c100] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from 'issue3327-libc-2.17.so':
  Duration: 00:00:47.54, start: 0.000000, bitrate: 256 kb/s
    Stream #0:0: Audio: mp1, 32000 Hz, stereo, s16p, 256 kb/s
At least one output file must be specified

comment:9 by Michael Niedermayer, 10 years ago

Reproduced by developer: set
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.