Opened 2 years ago

Closed 16 months ago

#8762 closed defect (fixed)

dv remux loses sync

Reported by: dave rice Owned by:
Priority: normal Component: avformat
Version: git-master Keywords: dvvideo
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

When reading a concealed dv frame, ffmpeg outputs one frame of video but no audio, and thus the output loses sync.

How to reproduce:

Doing a stream copy of video and audio from dv to mkv.

ffmpeg -y -i 1670520000_12.dv -map 0 -c copy 1670520000_12.mkv 
ffmpeg version git-2020-06-22-44ce333 Copyright (c) 2000-2020 the FFmpeg developers
  built with Apple clang version 11.0.3 (clang-1103.0.32.62)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/HEAD-44ce333_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
  libavutil      56. 55.100 / 56. 55.100
  libavcodec     58. 93.100 / 58. 93.100
  libavformat    58. 47.100 / 58. 47.100
  libavdevice    58. 11.100 / 58. 11.100
  libavfilter     7. 86.100 /  7. 86.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  8.100 /  5.  8.100
  libswresample   3.  8.100 /  3.  8.100
  libpostproc    55.  8.100 / 55.  8.100
[dv @ 0x7faef3812400] Estimating duration from bitrate, this may be inaccurate
Input #0, dv, from '1670520000_12.dv':
  Metadata:
    timecode        : 00:07:44;16
  Duration: 00:00:00.40, start: 0.000000, bitrate: 28771 kb/s
    Stream #0:0: Video: dvvideo, yuv411p, 720x480 [SAR 8:9 DAR 4:3], 25000 kb/s, 29.97 fps, 29.97 tbr, 29.97 tbn, 29.97 tbc
    Stream #0:1: Audio: pcm_s16le, 32000 Hz, stereo, s16, 1024 kb/s
    Stream #0:2: Audio: pcm_s16le, 32000 Hz, stereo, s16, 1024 kb/s
Output #0, matroska, to '1670520000_12.mkv':
  Metadata:
    timecode        : 00:07:44;16
    encoder         : Lavf58.47.100
    Stream #0:0: Video: dvvideo (dvsd / 0x64737664), yuv411p, 720x480 [SAR 8:9 DAR 4:3], q=2-31, 25000 kb/s, 29.97 fps, 29.97 tbr, 1k tbn, 29.97 tbc
    Stream #0:1: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 32000 Hz, stereo, s16, 1024 kb/s
    Stream #0:2: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 32000 Hz, stereo, s16, 1024 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
  Stream #0:2 -> #0:2 (copy)
Press [q] to stop, [?] for help
frame=   12 fps=0.0 q=-1.0 Lsize=    1433kB time=00:00:00.36 bitrate=31892.7kbits/s speed= 152x    
video:1406kB audio:25kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.098183%

The resulting file has 0.4 seconds of video and 0.1 seconds of audio.

% ffmpeg -i 1670520000_12.mkv 
ffmpeg version git-2020-06-22-44ce333 Copyright (c) 2000-2020 the FFmpeg developers
  built with Apple clang version 11.0.3 (clang-1103.0.32.62)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/HEAD-44ce333_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
  libavutil      56. 55.100 / 56. 55.100
  libavcodec     58. 93.100 / 58. 93.100
  libavformat    58. 47.100 / 58. 47.100
  libavdevice    58. 11.100 / 58. 11.100
  libavfilter     7. 86.100 /  7. 86.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  8.100 /  5.  8.100
  libswresample   3.  8.100 /  3.  8.100
  libpostproc    55.  8.100 / 55.  8.100
Guessed Channel Layout for Input Stream #0.1 : stereo
Guessed Channel Layout for Input Stream #0.2 : stereo
Input #0, matroska,webm, from '1670520000_12.mkv':
  Metadata:
    TIMECODE        : 00:07:44;16
    ENCODER         : Lavf58.47.100
  Duration: 00:00:00.40, start: 0.000000, bitrate: 29341 kb/s
    Stream #0:0: Video: dvvideo (dvsd / 0x64737664), yuv411p, 720x480 [SAR 8:9 DAR 4:3], 29.97 fps, 29.97 tbr, 1k tbn, 29.97 tbc (default)
    Metadata:
      DURATION        : 00:00:00.400000000
    Stream #0:1: Audio: pcm_s16le, 32000 Hz, stereo, s16, 1024 kb/s (default)
    Metadata:
      DURATION        : 00:00:00.100000000
    Stream #0:2: Audio: pcm_s16le, 32000 Hz, stereo, s16, 1024 kb/s
    Metadata:
      DURATION        : 00:00:00.100000000
At least one output file must be specified

In the example file frames 1 and 12 (the first and last) are typical frames as read from a dv tape; however, frames 2-11 are concealed. In this case, the playback deck copies all of the video dif blocks of the last valid video frame (#1) and sets the STA value (the first 4 bits of the 4th byte of each video dif block) to 0xa to note that the video is concealed. However the audio is not accordingly concealed so frames 2-11 have no audio source pack or audio metadata at all, the audio dif block payloads are set to the error code for dv audio (which plays as silence).

When ffmpeg decodes this dv stream, it presents audio and video for the first frame, then video only for the middle frames, and audio and video for the last frames, so the resulting output file is out of sync. The arrangement of this can be seen via ffprobe with show frames like:

ffprobe 1670520000_12.dv -show_frames -of csv 
ffprobe version git-2020-06-22-44ce333 Copyright (c) 2007-2020 the FFmpeg developers
  built with Apple clang version 11.0.3 (clang-1103.0.32.62)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/HEAD-44ce333_1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
  libavutil      56. 55.100 / 56. 55.100
  libavcodec     58. 93.100 / 58. 93.100
  libavformat    58. 47.100 / 58. 47.100
  libavdevice    58. 11.100 / 58. 11.100
  libavfilter     7. 86.100 /  7. 86.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  8.100 /  5.  8.100
  libswresample   3.  8.100 /  3.  8.100
  libpostproc    55.  8.100 / 55.  8.100
[dv @ 0x7fc0ce813c00] Estimating duration from bitrate, this may be inaccurate
Input #0, dv, from '1670520000_12.dv':
  Metadata:
    timecode        : 00:07:44;16
  Duration: 00:00:00.40, start: 0.000000, bitrate: 28771 kb/s
    Stream #0:0: Video: dvvideo, yuv411p, 720x480 [SAR 8:9 DAR 4:3], 25000 kb/s, 29.97 fps, 29.97 tbr, 29.97 tbn, 29.97 tbc
    Stream #0:1: Audio: pcm_s16le, 32000 Hz, stereo, s16, 1024 kb/s
    Stream #0:2: Audio: pcm_s16le, 32000 Hz, stereo, s16, 1024 kb/s
frame,video,0,1,0,0.000000,0,0.000000,0,0.000000,1,0.033367,0,120000,720,480,yuv411p,8:9,I,0,0,1,0,0,unknown,unknown,unknown,unknown,topleft
frame,audio,1,1,0,0.000000,0,0.000000,0,0.000000,1001,0.033367,0,4272,s16,1068,2,stereo
frame,audio,2,1,0,0.000000,0,0.000000,0,0.000000,1001,0.033367,0,4272,s16,1068,2,stereo
frame,video,0,1,1,0.033367,1,0.033367,1,0.033367,1,0.033367,120000,120000,720,480,yuv411p,8:9,I,0,0,0,0,0,unknown,unknown,unknown,unknown,topleft
frame,video,0,1,2,0.066733,2,0.066733,2,0.066733,1,0.033367,240000,120000,720,480,yuv411p,8:9,I,0,0,0,0,0,unknown,unknown,unknown,unknown,topleft
frame,video,0,1,3,0.100100,3,0.100100,3,0.100100,1,0.033367,360000,120000,720,480,yuv411p,8:9,I,0,0,0,0,0,unknown,unknown,unknown,unknown,topleft
frame,video,0,1,4,0.133467,4,0.133467,4,0.133467,1,0.033367,480000,120000,720,480,yuv411p,8:9,I,0,0,0,0,0,unknown,unknown,unknown,unknown,topleft
frame,video,0,1,5,0.166833,5,0.166833,5,0.166833,1,0.033367,600000,120000,720,480,yuv411p,8:9,I,0,0,0,0,0,unknown,unknown,unknown,unknown,topleft
frame,video,0,1,6,0.200200,6,0.200200,6,0.200200,1,0.033367,720000,120000,720,480,yuv411p,8:9,I,0,0,0,0,0,unknown,unknown,unknown,unknown,topleft
frame,video,0,1,7,0.233567,7,0.233567,7,0.233567,1,0.033367,840000,120000,720,480,yuv411p,8:9,I,0,0,0,0,0,unknown,unknown,unknown,unknown,topleft
frame,video,0,1,8,0.266933,8,0.266933,8,0.266933,1,0.033367,960000,120000,720,480,yuv411p,8:9,I,0,0,0,0,0,unknown,unknown,unknown,unknown,topleft
frame,video,0,1,9,0.300300,9,0.300300,9,0.300300,1,0.033367,1080000,120000,720,480,yuv411p,8:9,I,0,0,0,0,0,unknown,unknown,unknown,unknown,topleft
frame,video,0,1,10,0.333667,10,0.333667,10,0.333667,1,0.033367,1200000,120000,720,480,yuv411p,8:9,I,0,0,1,0,0,unknown,unknown,unknown,unknown,topleft
frame,audio,1,1,1001,0.033367,1001,0.033367,1001,0.033367,1000,0.033333,1200000,4268,s16,1067,2,stereo
frame,audio,2,1,1001,0.033367,1001,0.033367,1001,0.033367,1000,0.033333,1200000,4268,s16,1067,2,stereo
frame,video,0,1,11,0.367033,11,0.367033,11,0.367033,1,0.033367,1320000,120000,720,480,yuv411p,8:9,I,0,0,1,0,0,unknown,unknown,unknown,unknown,topleft
frame,audio,1,1,2001,0.066700,2001,0.066700,2001,0.066700,1001,0.033367,1320000,4272,s16,1068,2,stereo
frame,audio,2,1,2001,0.066700,2001,0.066700,2001,0.066700,1001,0.033367,1320000,4272,s16,1068,2,stereo

This particular scenario happens often with dv tape players reading damaged tape, but causes the output to be out of sync. I'm uncertain what the best solution here is, are there options to force all of the frames to be decoded in a particular way (so that the error codes of the audio dif blocks of the middle frames would serve as silent audio).

Attachments (1)

1670520000_12.dv (1.4 MB ) - added by dave rice 2 years ago.

Download all attachments as: .zip

Change History (8)

by dave rice, 2 years ago

Attachment: 1670520000_12.dv added

comment:1 by Carl Eugen Hoyos, 2 years ago

Version: unspecifiedgit-master

Does the following help?

diff --git a/libavcodec/dv.h b/libavcodec/dv.h
index 0205d72347..fd509e9377 100644
--- a/libavcodec/dv.h
+++ b/libavcodec/dv.h
@@ -36,6 +36,7 @@
 typedef struct DVwork_chunk {
     uint16_t buf_offset;
     uint16_t mb_coordinates[5];
+    int got_frame;
 } DVwork_chunk;
 
 typedef struct DVVideoContext {
diff --git a/libavcodec/dvdec.c b/libavcodec/dvdec.c
index c526091eb4..3fc738c702 100644
--- a/libavcodec/dvdec.c
+++ b/libavcodec/dvdec.c
@@ -339,6 +339,7 @@ static int dv_decode_video_segment(AVCodecContext *avctx, void *arg)
     av_assert1((((int) mb_bit_buffer) & 7) == 0);
     av_assert1((((int) vs_bit_buffer) & 7) == 0);
 
+    work_chunk->got_frame = 1;
 retry:
 
     memset(sblock, 0, 5 * DV_MAX_BPM * sizeof(*sblock));
@@ -356,6 +357,10 @@ retry:
                 vs_bit_buffer_damaged = 1;
             if (!mb_index) {
                 sta = buf_ptr[3] >> 4;
+                if (sta == 0xa) {
+                    work_chunk->got_frame = 0;
+                    return -1;
+                }
             } else if (sta != (buf_ptr[3] >> 4))
                 vs_bit_buffer_damaged = 1;
         }
@@ -613,7 +618,7 @@ static int dvvideo_decode_frame(AVCodecContext *avctx, void *data,
     emms_c();
 
     /* return image */
-    *got_frame = 1;
+    *got_frame = s->work_chunks->got_frame;
 
     return s->sys->frame_size;
 }

comment:2 by Carl Eugen Hoyos, 2 years ago

Apparently not a duplicate of ticket #2340...

comment:3 by dave rice, 2 years ago

Hi @cehoyos, I've added that patch to git-master; however the audio output of the decoder is the same.

comment:4 by Carl Eugen Hoyos, 2 years ago

I was under the impression that you think too many video frames are output, did I misunderstand?

The video frames all have timestamps, so even if the broken frames are not output by the decoder (this may or may not be more correct), there will always be a disruption which I believe is unavoidable.

Could it be that you did not explain your (actual) issue and instead tried to analyze (which you shouldn't do: If you could analyze it, you would not have to post here) and that this ticket is a duplicate of ticket #4674 (or closely related)?

comment:5 by Elon Musk, 2 years ago

This is not about dropping frames to keep A/V sync but to concealing and non-dropping audio frames that have issues.

comment:6 by dave rice, 2 years ago

Yes, I'm wondering if a bitstream filter would make sense, as in setting the stream to match the characteristics found in a valid frame. If I copy the AAUX audio source pack from the first frame into the subsequent frames, then the output of the audio data is correct.

Alternate to a bitstream, is there an option to re-apply the characteristics of the first frame to all subsequent frames rather than re-probe those characteristics for every frame?

comment:7 by Carl Eugen Hoyos, 16 months ago

Component: avcodecavformat
Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.