#9804 closed defect (worksforme)

MP4 captions break when extracted to scc

Reported by: Zach Owned by:
Priority: normal Component: undetermined
Version: git-master Keywords: cc
Cc: Zach Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description (last modified by Zach)

Summary of the bug:

I have mp4 files coming in from outside content providers that contain user space 608 and 708 captions. When trying to extract an scc file, some of the captions data is lost, resulting in mispellings in the resulting captions file when opened in Adobe Premiere 2022. SRT extraction and transcode to mpegts transport stream or mxf files also breaks captions in different ways.

How to reproduce:

% ffmpeg -f lavfi -i "movie=\'D:\\IIW242.mp4\'"[out+subcc] -map 0:1 -c:s copy "D:\IIW242.scc"
ffmpeg version 2022-06-06-git-73302aa193-full_build-www.gyan.dev

Link to sample file:
https://drive.google.com/file/d/1zdX-Vw3iU37xWR2SPRwvonMhoT1gPEUz/

Attachments (4)

IIW242.scc (268.5 KB ) - added by Zach 23 months ago.
scc extraction from longer program
iiw242.html (8.5 KB ) - added by Zach 23 months ago.
source file mediainfo export
IIW242pp.scc (82.2 KB ) - added by Zach 23 months ago.
SCC Converted from MCC via Premiere Pro
IIW242mcc.zip (577.0 KB ) - added by Zach 23 months ago.
MCC File extracted with ccextract (drastic tv)

Download all attachments as: .zip

Change History (10)

by Zach, 23 months ago

Attachment: IIW242.scc added

scc extraction from longer program

by Zach, 23 months ago

Attachment: iiw242.html added

source file mediainfo export

comment:1 by Zach, 23 months ago

Description: modified (diff)

comment:2 by Balling, 23 months ago

What misspelings? The only thing I can see is that the srt file produced from scc is slightly diferent from srt file produced directly, but only in timestamps:

00:00:00,567 --> 00:00:06,006

vs

00:00:00,561 --> 00:00:06,000

or
00:00:27,427 --> 00:00:29,763

vs
00:00:27,396 --> 00:00:29,759

Last edited 23 months ago by Balling (previous) (diff)

by Zach, 23 months ago

Attachment: IIW242pp.scc added

SCC Converted from MCC via Premiere Pro

by Zach, 23 months ago

Attachment: IIW242mcc.zip added

MCC File extracted with ccextract (drastic tv)

comment:3 by Zach, 23 months ago

The spelling errors are in the decoded captions. Ezekiel becomes Ezekl, etc. That is the easiest example to see in the first 45 seconds of the program. I uploaded some other files to help explain this that are from the entire program. The proper output should be very similar to the scc file I obtained by extracting with ccextract from Drastic TV and converted to SCC via Premiere Pro.

comment:4 by Balling, 23 months ago

Ezekiel becomes Ezekl, etc.

No, it does not. Ezekiel is correctly preserved.

I obtained by extracting with ccextract from Drastic TV and converted to SCC via Premiere Pro.

Well, it has some problems, like

getting what you deserve."</font><font face="Monospace">{\an7}Well, yes,</font>

is one one line, even though, looks like it is wrong. Also some \h problems but that is a known bug in ffmpeg.

Oh and also I see, "this is It is written" is not preserved in part "this is".

Last edited 23 months ago by Balling (previous) (diff)

comment:5 by Marton Balint, 22 months ago

Priority: importantnormal

Not a regression or a crash, so does not qualify as important.

comment:6 by Carl Eugen Hoyos, 21 months ago

Keywords: cc added
Resolution: worksforme
Status: newclosed

There are definitely no captions missing.

Note: See TracTickets for help on using tickets.