Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#2877 closed defect (invalid)

Unable to rewind/seek a mp3

Reported by: jpo38 Owned by:
Priority: normal Component: undetermined
Version: unspecified Keywords: seek
Cc: Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: yes

Description

Summary of the bug:
I'm working on a program to simply read a mp3. For some reason (optimization), I want to read the file by blocks (possibly non consecutive and I often move forward and backward). But I am unable to safely navigate through the file, I call av_seek_frame to move within the file, but then, extracted data are incorrect.

I could isolate the problem in a very basic sample program. This program opens a file, reads the audio stream (saves it to a vector of doubles), then calls av_seek_frame to move back to the beginning of the file and finally try to read it a second time (saving it to a new vector of doubles). The resulting two vectors of double are slighlty different.....

How to reproduce:

extern "C" {
#include "libavformat/avformat.h"
#include "libavutil/samplefmt.h"
#include <libswresample/swresample.h>
#include <libavutil/opt.h>
}

#include <iostream>
#include <sstream>
#include <fstream>
#include <vector>

AVFormatContext* container = NULL;
AVCodecContext *ctx = NULL;
int audio_stream = 0;

bool open_audio_stream( char* file )
{
    bool bRes = false;

    av_register_all();
    
    container = avformat_alloc_context();
    AVCodec *codec = NULL;
    if ( avformat_open_input(&container,file,NULL,NULL) < 0 ||
         av_find_stream_info(container) < 0 ||
         (audio_stream = av_find_best_stream(container, AVMEDIA_TYPE_AUDIO, -1, -1, NULL, 0)) < 0 ||
         (ctx = container->streams[audio_stream]->codec) == NULL ||
         (codec = avcodec_find_decoder(ctx->codec_id)) == NULL ||
         avcodec_open2( ctx, codec, NULL ) < 0 )
    {
        std::cout << "Failed to open audio stream" << std::endl;
        return false;
    }
    else
    {
        return true;
    }
}

bool read_audio_stream( std::vector<std::vector<double>>& extracted )
{
    bool bRes = false;

    extracted.resize( ctx->channels ); // prepare container

    // read to read the file!
    AVPacket packet;
    av_init_packet( &packet );
    packet.pos = 0;
    packet.data = NULL;
    packet.size = 0;

    AVFrame *frame = avcodec_alloc_frame();

    int tempdatabuffer_size = 0;
    double* tempdatabuffer = NULL;
    
    struct SwrContext *swr_ctx;
    swr_ctx = swr_alloc();
    if ( swr_ctx )
    {
        // prepare object used for data conversion
        av_opt_set_int(swr_ctx, "in_channel_count", ctx->channels, 0);
        av_opt_set_int(swr_ctx, "in_channel_layout", ctx->channel_layout, 0);
        av_opt_set_int(swr_ctx, "in_sample_rate", ctx->sample_rate, 0);
        av_opt_set_sample_fmt(swr_ctx, "in_sample_fmt", ctx->sample_fmt, 0);
        av_opt_set_int(swr_ctx, "out_channel_layout", ctx->channel_layout, 0);
        av_opt_set_int(swr_ctx, "out_channel_count", ctx->channels, 0);
        av_opt_set_int(swr_ctx, "out_sample_rate", ctx->sample_rate, 0);
        av_opt_set_sample_fmt(swr_ctx, "out_sample_fmt", AV_SAMPLE_FMT_DBL, 0);
        if ( swr_init(swr_ctx) >= 0 )
        {
            int tempdatabuffer_size = 0;
            double* tempdatabuffer = NULL;

            bRes = true;

            int frameFinished = 0;
            int sample = 0;
            while ( av_read_frame( container, &packet ) >= 0 &&
                    packet.stream_index == audio_stream )
            {
                if ( avcodec_decode_audio4( ctx, frame, &frameFinished, &packet ) >= 0 )
                {
                    if ( frame->nb_samples*ctx->channels > tempdatabuffer_size )
                    {
                        // update temp buffer, it's actually too small!!
                        tempdatabuffer_size = frame->nb_samples*ctx->channels;
                        if ( tempdatabuffer )
                            delete [] tempdatabuffer;
                        tempdatabuffer = new double[tempdatabuffer_size];
                    }

                    if ( swr_convert(swr_ctx, (uint8_t **)&tempdatabuffer, tempdatabuffer_size, (const uint8_t **)&(frame->data[0]), frame->nb_samples) )
                    {
                        for ( sample = 0; sample != frame->nb_samples; ++sample )
                        {
                            for ( int chan = 0; chan != ctx->channels; ++chan )
                            {
                                extracted[chan].push_back( tempdatabuffer[sample*ctx->channels + chan] );
                            }
                        }
                    }
                }
            }

            if ( tempdatabuffer ) delete [] tempdatabuffer;
        }

    }
    swr_free(&swr_ctx);
    avcodec_free_frame( &frame );

    return bRes;
}

int main(int argc, char **argv)
{
    if ( argc == 2 )
    {
        if ( open_audio_stream( argv[1] ) )
        {
            std::vector<std::vector<double>> first_read;
            std::vector<std::vector<double>> second_read;
            if ( read_audio_stream( first_read ) )
            {
                // rewind!
                av_seek_frame( container, audio_stream, 0, AVSEEK_FLAG_FRAME );

                if ( read_audio_stream( second_read ) )
                {
                    if ( first_read.size() == second_read.size() )
                    {
                        std::cout << "Comparing " << first_read.size() << " channels" << std::endl;
                        for ( unsigned int chan = 0; chan != first_read.size(); ++chan )
                        {
                            if ( first_read[chan].size() == second_read[chan].size() )
                            {
                                std::cout << "Comparing " << first_read[chan].size() << " samples for channel " << chan << std::endl;
                                for ( unsigned int sample = 0; sample != first_read[chan].size(); ++sample )
                                {
                                    if ( first_read[chan][sample] != second_read[chan][sample] )
                                    {
                                        std::cout << "Different data found for sample " << sample << " of channel " << chan << std::endl;
                                        return 1;
                                    }
                                }
                                std::cout << "Results are equivalent!!!" << std::endl;
                                return 0;
                            }
                            else
                            {
                                std::cout << "Did not find the same number of samples" << std::endl;
                            }
                        }
                    }
                    else
                    {
                        std::cout << "Did not find the same number of channels" << std::endl;
                    }
                }

                av_close_input_file(container);
            }
        }

        return 1;
    }
    else
    {
        printf("usage: %s file\n",
               argv[0]);
        return 1;
    }
}

Change History (13)

comment:1 Changed 4 years ago by cehoyos

  • Component changed from avutil to undetermined
  • Keywords seek added; av_seek_frame removed

Is the problem reproducible with ffmpeg (the application)?
If it is, please provide the failing command line together with the complete, uncut console output and provide the input sample.

comment:2 Changed 4 years ago by jpo38

Hi cehoyos,

I'm afraid it's not reproductible with ffmeg application. To reproduce you must read the file once and then try to rewind in order to read it a second time. I don't see how such behaviour could be accessible using the ffmpeg program command line.

Jean

comment:3 Changed 4 years ago by cehoyos

What about ffplay?

comment:4 follow-up: Changed 4 years ago by jpo38

Hello again,

Unfortunately, I'm not familiar enough with ffplay to tell if and how this bug could be reproduced here. But reproducing from the code I posted is very easy.

Thanks

Jean

Last edited 4 years ago by jpo38 (previous) (diff)

comment:5 in reply to: ↑ 4 Changed 4 years ago by cehoyos

Replying to jpo38:

Unfortunately, I'm not familiar enough with ffplay to tell if and how this bug could be reproduced here.

Could you elaborate?
Maybe we have to improve the documentation for ffplay.

comment:6 Changed 4 years ago by jpo38

I just never used ffplay and did not have it compiled. Now I downloaded it and had no problem using it. I tested and I can move within the file without any problem. Looks like file can be seek without problem from this tool.

But ffplay.c does not use av_seek_frame (my sample does), it uses avformat_seek_file.

Jean

comment:7 Changed 4 years ago by cehoyos

Is a specific sample required or is the problem reproducible with every mp3 file?

comment:8 follow-up: Changed 4 years ago by jpo38

I could reproduce it with all mp3 file I tried (3 or 4, part of my test suite). I even see it when reading audio from avi files.

comment:9 in reply to: ↑ 8 Changed 4 years ago by saste

Replying to jpo38:

I could reproduce it with all mp3 file I tried (3 or 4, part of my test suite). I even see it when reading audio from avi files.

What happens if you use avformat_seek_file() instead? Also are you sure that your files start with timestamp 0?

Today I added an option to ffprobe which may help to test this, try this command and inspect the output:

ffprobe IN.mp3 -read_intervals "%2,0%2" -of compact -show_packets 

comment:10 follow-up: Changed 4 years ago by jpo38

Thank you for investigating this problem.

When I read my mp3 files, AVFormatContext::start_time is 0, so I suspect my file starts with timestamp 0, right?

I replaced
av_seek_frame( container, audio_stream, 0, AVSEEK_FLAG_FRAME );
by
avformat_seek_file( container, audio_stream, INT64_MIN, container->start_time, INT64_MAX, 0 )

No change. I can still see the bug.

I ran ffprobe IN.mp3 -read_intervals "%2,0%2" -of compact -show_packets

Here is the output (truncated):

packet|codec_type=audio|stream_index=0|pts=0|pts_time=0.000000|dts=0|dts_time=0.000000|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=417|pos=461|flags=K
packet|codec_type=audio|stream_index=0|pts=368640|pts_time=0.026122|dts=368640|dts_time=0.026122|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=878|flags=K
packet|codec_type=audio|stream_index=0|pts=737280|pts_time=0.052245|dts=737280|dts_time=0.052245|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=1296|flags=K
packet|codec_type=audio|stream_index=0|pts=1105920|pts_time=0.078367|dts=1105920|dts_time=0.078367|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=1714|flags=K
packet|codec_type=audio|stream_index=0|pts=1474560|pts_time=0.104490|dts=1474560|dts_time=0.104490|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=2132|flags=K
packet|codec_type=audio|stream_index=0|pts=1843200|pts_time=0.130612|dts=1843200|dts_time=0.130612|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=2550|flags=K
packet|codec_type=audio|stream_index=0|pts=2211840|pts_time=0.156735|dts=2211840|dts_time=0.156735|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=2968|flags=K
packet|codec_type=audio|stream_index=0|pts=2580480|pts_time=0.182857|dts=2580480|dts_time=0.182857|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=3386|flags=K
packet|codec_type=audio|stream_index=0|pts=2949120|pts_time=0.208980|dts=2949120|dts_time=0.208980|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=3804|flags=K
packet|codec_type=audio|stream_index=0|pts=3317760|pts_time=0.235102|dts=3317760|dts_time=0.235102|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=4222|flags=K
packet|codec_type=audio|stream_index=0|pts=3686400|pts_time=0.261224|dts=3686400|dts_time=0.261224|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=4640|flags=K
packet|codec_type=audio|stream_index=0|pts=4055040|pts_time=0.287347|dts=4055040|dts_time=0.287347|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=5058|flags=K
packet|codec_type=audio|stream_index=0|pts=4423680|pts_time=0.313469|dts=4423680|dts_time=0.313469|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=5476|flags=K
packet|codec_type=audio|stream_index=0|pts=4792320|pts_time=0.339592|dts=4792320|dts_time=0.339592|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=5894|flags=K
packet|codec_type=audio|stream_index=0|pts=5160960|pts_time=0.365714|dts=5160960|dts_time=0.365714|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=6312|flags=K
packet|codec_type=audio|stream_index=0|pts=5529600|pts_time=0.391837|dts=5529600|dts_time=0.391837|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=6730|flags=K
packet|codec_type=audio|stream_index=0|pts=5898240|pts_time=0.417959|dts=5898240|dts_time=0.417959|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=7148|flags=K

...

packet|codec_type=audio|stream_index=0|pts=26542080|pts_time=1.880816|dts=26542080|dts_time=1.880816|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=30554|flags=K
packet|codec_type=audio|stream_index=0|pts=26910720|pts_time=1.906939|dts=26910720|dts_time=1.906939|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=30972|flags=K
packet|codec_type=audio|stream_index=0|pts=27279360|pts_time=1.933061|dts=27279360|dts_time=1.933061|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=417|pos=31390|flags=K
packet|codec_type=audio|stream_index=0|pts=27648000|pts_time=1.959184|dts=27648000|dts_time=1.959184|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=31807|flags=K
packet|codec_type=audio|stream_index=0|pts=28016640|pts_time=1.985306|dts=28016640|dts_time=1.985306|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=32225|flags=K
packet|codec_type=audio|stream_index=0|pts=0|pts_time=0.000000|dts=0|dts_time=0.000000|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=417|pos=461|flags=K
packet|codec_type=audio|stream_index=0|pts=368640|pts_time=0.026122|dts=368640|dts_time=0.026122|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=878|flags=K
packet|codec_type=audio|stream_index=0|pts=737280|pts_time=0.052245|dts=737280|dts_time=0.052245|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=1296|flags=K
packet|codec_type=audio|stream_index=0|pts=1105920|pts_time=0.078367|dts=1105920|dts_time=0.078367|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=1714|flags=K
packet|codec_type=audio|stream_index=0|pts=1474560|pts_time=0.104490|dts=1474560|dts_time=0.104490|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=2132|flags=K
packet|codec_type=audio|stream_index=0|pts=1843200|pts_time=0.130612|dts=1843200|dts_time=0.130612|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=2550|flags=K
packet|codec_type=audio|stream_index=0|pts=2211840|pts_time=0.156735|dts=2211840|dts_time=0.156735|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=2968|flags=K
packet|codec_type=audio|stream_index=0|pts=2580480|pts_time=0.182857|dts=2580480|dts_time=0.182857|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=3386|flags=K
packet|codec_type=audio|stream_index=0|pts=2949120|pts_time=0.208980|dts=2949120|dts_time=0.208980|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=3804|flags=K
packet|codec_type=audio|stream_index=0|pts=3317760|pts_time=0.235102|dts=3317760|dts_time=0.235102|duration=368640|duration_time=0.026122|convergence_duration=N/A|convergence_duration_time=N/A|size=418|pos=4222|flags=K

...

Looks like you read the file twice and you're getting the same number of data. It works the same for me, only the data I read (when converted to float/double) are not the same....
Did you try to compile my sample? It's very straightforward. It reads the file once, rewinds and reads again. It then compares data. We read the same number of samples but not the same values...

Jean


comment:11 in reply to: ↑ 10 ; follow-up: Changed 4 years ago by saste

Replying to jpo38:

Thank you for investigating this problem.

When I read my mp3 files, AVFormatContext::start_time is 0, so I suspect my file starts with timestamp 0, right?

I replaced
av_seek_frame( container, audio_stream, 0, AVSEEK_FLAG_FRAME );
by
avformat_seek_file( container, audio_stream, INT64_MIN, container->start_time, INT64_MAX, 0 )

No change. I can still see the bug.

I ran ffprobe IN.mp3 -read_intervals "%2,0%2" -of compact -show_packets

Here is the output (truncated):

[...]

Looks like you read the file twice and you're getting the same number of data. It works the same for me, only the data I read (when converted to float/double) are not the same....
Did you try to compile my sample? It's very straightforward. It reads the file once, rewinds and reads again. It then compares data. We read the same number of samples but not the same values...

It doesn't explain the issue, but you could try to flush the data in the decoder before re-using it in the second loop, since it is the only object which is reused between the two sessions. Also, with which formats does it happen, how much is the data different, how many samples differ?

comment:12 in reply to: ↑ 11 Changed 4 years ago by saste

  • Analyzed by developer set
  • Reproduced by developer set
  • Resolution set to invalid
  • Status changed from new to closed

Replying to saste:

Replying to jpo38:

[...]

Looks like you read the file twice and you're getting the same number of data. It works the same for me, only the data I read (when converted to float/double) are not the same....
Did you try to compile my sample? It's very straightforward. It reads the file once, rewinds and reads again. It then compares data. We read the same number of samples but not the same values...

It doesn't explain the issue, but you could try to flush the data in the decoder before re-using it in the second loop, since it is the only object which is reused between the two sessions. Also, with which formats does it happen, how much is the data different, how many samples differ?

And since I was curious I did a few experiments, and noticed that there is a very small imperceptible difference, so the two buffers are different but will sound the same to human ears.

This is apparently how the MP3 decoder (and many other lossy decoders) work, they keep an internal state which affects decoding, when you seek back and decode again the decoder has a different state, which causes it to produce a slightly different output, although no human listener should be able to notice the difference.

In conclusion, it's not a bug in libavcodec but rather the way the MP3 decoder works. Feel free to reopen the ticket if you don't agree.

comment:13 Changed 4 years ago by jpo38

Hello,

Thanks you for the time you spent investigating the problem.

I'm not an expert in audio codec, so I cannot "disagree". I understand your point and it probably makes sense...

In my application, I need to read the file twice and check I got the same data. So, to workaround this problem, I'll have to close and reopen the file to make the decoder start from the same state.

Thanks again

Jean

Note: See TracTickets for help on using tickets.