Opened 15 hours ago

Last modified 62 minutes ago

#11462 new enhancement

Cannot embed .scc file into .mp4 using -c:s copy

Reported by: Zach Owned by:
Priority: important Component: undetermined
Version: git-master Keywords:
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:
How to reproduce:

ffmpeg.exe" -i video.mp4 -i CaptionMaker.scc -map 0:v -map 0:a -map 1 -c:v copy -c:a copy -c:s copy output.mp4

ffmpeg version 2025-02-06-git-6da82b4485-full_build-www.gyan.dev Copyright (c) 2000-2025 the FFmpeg developers
  built with gcc 14.2.0 (Rev1, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-lcms2 --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-libdvdnav --enable-libdvdread --enable-sdl2 --enable-libaribb24 --enable-libaribcaption --enable-libdav1d --enable-libdavs2 --enable-libopenjpeg --enable-libquirc --enable-libuavs3d --enable-libxevd --enable-libzvbi --enable-libqrencode --enable-librav1e --enable-libsvtav1 --enable-libvvenc --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxeve --enable-libxvid --enable-libaom --enable-libjxl --enable-libvpx --enable-mediafoundation --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-liblensfun --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-dxva2 --enable-d3d11va --enable-d3d12va --enable-ffnvcodec --enable-libvpl --enable-nvdec --enable-nvenc --enable-vaapi --enable-libshaderc --enable-vulkan --enable-libplacebo --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libcodec2 --enable-libilbc --enable-libgsm --enable-liblc3 --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
  libavutil      59. 56.100 / 59. 56.100
  libavcodec     61. 32.101 / 61. 32.101
  libavformat    61.  9.106 / 61.  9.106
  libavdevice    61.  4.100 / 61.  4.100
  libavfilter    10.  9.100 / 10.  9.100
  libswscale      8. 13.100 /  8. 13.100
  libswresample   5.  4.100 /  5.  4.100
  libpostproc    58.  4.100 / 58.  4.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from video.mp4:
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42mp41
    creation_time   : 2025-02-09T19:21:47.000000Z
  Duration: 00:28:31.71, start: 0.000000, bitrate: 18300 kb/s
  Stream #0:0[0x1](eng): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 17979 kb/s, 29.97 fps, 29.97 tbr, 30k tbn (default)
    Metadata:
      creation_time   : 2025-02-09T19:21:47.000000Z
      handler_name    : ?Mainconcept Video Media Handler
      vendor_id       : [0][0][0][0]
      encoder         : AVC Coding
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 317 kb/s (default)
    Metadata:
      creation_time   : 2025-02-09T19:21:47.000000Z
      handler_name    : #Mainconcept MP4 Sound Media Handler
      vendor_id       : [0][0][0][0]
Input #1, scc, from CaptionMaker.scc:
  Duration: N/A, bitrate: N/A
  Stream #1:0: Subtitle: eia_608 (cc_dec)
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #0:1 -> #0:1 (copy)
  Stream #1:0 -> #0:2 (copy)
[mp4 @ 000001998ad1f040] Could not find tag for codec eia_608 in stream #2, codec not currently supported in container
[out#0/mp4 @ 000001998a703380] Could not write header (incorrect codec parameters ?): Invalid argument
Conversion failed!

using flag -c:s copy fails, but c:s mov_text creates a file but modifies the stream to a subtitles stream instead of captions.

Please add the required eia-608 or cea-608 codec to allow embedding of captions into mp4 files.

Change History (26)

comment:1 by Marth64, 15 hours ago

It is not possible in this way.
Closed Caption bytes are embedded into the video stream's SEI bytes or user data, so a stream copy will not achieve a presentable result (also I do not believe MP4 can house a discrete EIA608 stream, not that it would be useful). Likely this will have to be a BSF filter.

in reply to:  1 comment:2 by Zach, 14 hours ago

Replying to Marth64:

It is not possible in this way.
Closed Caption bytes are embedded into the video stream's SEI bytes or user data, so a stream copy will not achieve a presentable result (also I do not believe MP4 can house a discrete EIA608 stream, not that it would be useful). Likely this will have to be a BSF filter.

I don't know what exact mechanism is required, but I do know it is possible to embed 608 captions into the user data of an mp4 file. I have provided links to such files with issue #11461 that can be referenced. I used the term copy because I want to preserve and copy the 608 data. I would expect such a bitstream filter to insert the captions codes into their respective frames user data according to timecode and present the new video stream to next part of the code.

Last edited 14 hours ago by Zach (previous) (diff)

comment:3 by softworkz, 14 hours ago

A bitstream filter doesn't help as it has one input and one output, but this requires a filter with two inputs and one output.
On the ML, I have already described how it can work, but this ticket is invalid, not only because it's not pussible but also because you cannot "embed" CEA-608/708 in an mp4 container - it needs to get "embedded" in the videostream instead, opposed to most other subtitle formats.

mov_text for example gets muxed into the mp4 container. Just CEA-608/70 is a different story.

Last edited 14 hours ago by softworkz (previous) (diff)

comment:4 by Marth64, 14 hours ago

A filter would require encoding. The user is trying to stream copy the video.

in reply to:  4 comment:5 by Zach, 13 hours ago

Replying to Marth64:

A filter would require encoding. The user is trying to stream copy the video.

Good point, we are wanting to insert the caption data commands into the user data fields of the compressed stream. In a sense this would require a muxer that would take the elementry stream and either insert captions data into the user data fields or replace user data padding (whichever method is required)

.scc files already contain the list of data chunks for each frame of the video, they will just need to be wrapped in the appropriate user data packet and inserted into the video elementary stream user data fields. Some HD elementary streams may need an additional wrapper for 608 to sit inside of a dummy 708 packet.

I think it should be safe to assume that only one captions input for every video input would be required. Manipulating individual captions tracks and the contents of the captions data packet should be done by the editor and included in the sidecar file intended for embedding.

Last edited 13 hours ago by Zach (previous) (diff)

comment:6 by softworkz, 13 hours ago

The user is trying to stream copy the video.

The caption data needs to get into the video stream. This means that the video stream gets altered, so it cannot be a stream copy.

in reply to:  6 comment:7 by Marth64, 13 hours ago

Replying to softworkz:

The user is trying to stream copy the video.

The caption data needs to get into the video stream. This means that the video stream gets altered, so it cannot be a stream copy.

Yes, I realize this. But the SEI or user data can be conceptually altered with a BSF without needing to encode the video. Either BSF facilities can get extended to support the complexity needed for this (if they have not already), or a more primitive method is to make a BSF which embeds the data and has an option to read an SCC given a file path.

comment:8 by softworkz, 13 hours ago

or a more primitive method is to make a BSF which embeds the data and has an option to read an SCC given a file path.

Yup, that would work. But you would need a different bitstream filter for each video codec.

Though, this would address this very specific case only and won't be useful for other cases.
And it would require an SCC file that matches the rate of the video to work properly. That's why the benefit would be pretty isolated.

comment:9 by softworkz, 13 hours ago

Can't libcaption do that anyway?

https://github.com/szatmary/libcaption

comment:10 by Zach, 13 hours ago

This whole thing is working on the assumption that the captions have the same frame rate as the video. The use case for this is edit suites like Premiere Pro that support exporting mp4 with sidecar .scc files but not embedded in the user data of the video stream.

By the way, I believe the frame rate dependency of captions data is solely related to the time code that specifies which frame to embed the data in. All that would be required to embed separate frame rates would be a calculation that would translate and align the two time codes without discarding data. 608 captions data simply acts on the frame it is embedded in not at a seperate specific time.

Last edited 13 hours ago by Zach (previous) (diff)

in reply to:  8 comment:11 by softworkz, 13 hours ago

Replying to softworkz:

or a more primitive method is to make a BSF which embeds the data and has an option to read an SCC given a file path.

Yup, that would work. But you would need a different bitstream filter for each video codec.

For codecs, you need MP2 video, H.264 and HEVC. For the latter two, the bitstream also differs depending on the container, whether it's mp4 or mpegts (Annex B), so there's already 5 different cases to handle when implementing a bitstream filter for this.

in reply to:  10 ; comment:12 by softworkz, 13 hours ago

Replying to Zach:

This whole thing is working on the assumption that the captions have the same frame rate as the video. The use case for this is edit suites like Premiere Pro that support exporting mp4 with sidecar .scc files but not embedded in the user data of the video stream.

Yes, the use case is totally valid - no doubt. It's also not that it's not possible - the big problem is that this task doesn't really fit well into ffmpeg's architecture.
Have you tried libcaption? Maybe a standalone tool for this specific task is a better (and much cheaper) option...

in reply to:  10 ; comment:13 by softworkz, 13 hours ago

By the way, I believe the frame rate dependency of captions data is solely related to the time code that specifies which frame to embed the data in. All that would be required to embed separate frame rates would be a calculation that would translate and align the two time codes without discarding data. 608 captions data simply acts on the frame it is embedded in not at a seperate specific time.

The spec mandates a continuous (DTVCC) stream of data. It must not be on/off (sometimes there, sometimes not).

in reply to:  13 comment:14 by Zach, 13 hours ago

Replying to softworkz:

By the way, I believe the frame rate dependency of captions data is solely related to the time code that specifies which frame to embed the data in. All that would be required to embed separate frame rates would be a calculation that would translate and align the two time codes without discarding data. 608 captions data simply acts on the frame it is embedded in not at a seperate specific time.

The spec mandates a continuous (DTVCC) stream of data. It must not be on/off (sometimes there, sometimes not).

That is what padding is for in the stream. If you look at .scc files I provided from premiere pro or caption maker in a text editor you will notice that not every frame time code is listed in the file. It is obviously necissary to insert the padding code 8080 into any frame slots that are not present in the source. On the other hand if the source has anything other than 8080 on a particular frame, it needs passed on the next frame and the next instance of 8080 would get skipped.

comment:15 by softworkz, 13 hours ago

That is what padding is for in the stream.

And when there's no gap? How do you know at which points you can add padding data?

in reply to:  15 comment:16 by Zach, 12 hours ago

Replying to softworkz:

That is what padding is for in the stream.

And when there's no gap? How do you know at which points you can add padding data?

With an .scc file as an input you would see something like this if padding can be added.

00:00:02:04	942c

00:00:02:07	942f

is equivalent to the following in an .scc file.

00:00:02:04	942c 8080 8080 942f

Including too many frame time codes and not organizing according to start and end codes causes compatibility issues with other software. My understanding is that each frame gets one of those 4 digit hex codes. The first code goes on the frame that matches the timecode for a line, and each subsequent code goes on the following frame. When the particular caption message is complete another line is started with the next time code that captions commands are wished for. If there is a frame gap between the last command of the previous line and the next listed time code, the captions embedder is responsible for repeating 8080 until the timecode listed next in the .scc file.

This understanding was derived from the documents listed under #11461.

in reply to:  12 comment:17 by Zach, 12 hours ago

Replying to softworkz:

Yes, the use case is totally valid - no doubt. It's also not that it's not possible - the big problem is that this task doesn't really fit well into ffmpeg's architecture.
Have you tried libcaption? Maybe a standalone tool for this specific task is a better (and much cheaper) option...

I am ok with considering another separate command line tool for this, but libcaption as currently appears is more of a library than a command line utility as their releases are source code only.

Would it make more sense to have the captions be merged in a filter in the uncompressed domain? This risks loosing visual quality for compressed files, but also simplifies development complexity and removes the requirement for the files to share aligned time codes and frame rates.

Many playout solutions can handle the sidecar files directly that would be the source for the solution mentioned above. I may end up needing the functions to merge a captions stream and video stream in the uncompressed domain anyway for my playout implementation that is based on ffmpeg.

Last edited 12 hours ago by Zach (previous) (diff)

comment:18 by galad, 10 hours ago

By the way, Apple has used discrete 608/708 tracks in MOV and MP4 for almost 20 years.

in reply to:  18 comment:19 by Zach, 10 hours ago

Replying to galad:

By the way, Apple has used discrete 608/708 tracks in MOV and MP4 for almost 20 years.

Do you have any sample files for those? Are you sure those are not user data tracks built in to the video stream?

in reply to:  18 comment:20 by softworkz, 9 hours ago

Replying to galad:

By the way, Apple has used discrete 608/708 tracks in MOV and MP4 for almost 20 years.

This doesn't help for his task being content creation for public TV broadcast.

comment:21 by galad, 9 hours ago

Pretty sure, I wrote a muxer for those type of tracks years ago. Here's a sample: https://subler.org/downloads/608.mp4

in reply to:  21 comment:22 by Zach, 9 hours ago

Replying to galad:

Pretty sure, I wrote a muxer for those type of tracks years ago. Here's a sample: https://subler.org/downloads/608.mp4

That is interesting as it is a different method than any of my content providers are currently providing. Could you provide a sample with a video track also? It is hard to verify the muxing method for sure with only captions present.

comment:23 by Devin Heitmueller, 3 hours ago

Ok, so a few comments.

The MP4 format can certainly do a dedicated 608 track (i.e. it doesn't have to be embedded in SEI). In fact, the MP4 muxer (movenc.c) supports this, although looking at the code I wouldn't say it's very robust. Also, I've implemented correct output of 608 with decklink when the file source is MP4 and it has a c608 track.

So in terms of the various standards, it's definitely possible to embed a 608 stream in an MP4 without it being in the H.264 SEI.

See the following link from Apple on the cdat atom:

https://developer.apple.com/documentation/quicktime-file-format/closed_captioning_sample_data

The cc_fifo mechanism I wrote for ffmpeg was never intended arbitrary 608 padding if not present in the source stream. It just ensures the 608 tuples are embedded with the proper rate control assuming it's already present in the source. That said, it wouldn't be hard to have the cc_fifo insert padding.

My suggestion would be we should add a BSF which takes in the 608 stream from the SCC, reformats the 608 tuples to have the proper rate, and then feeds into the MP4 mux. We already have the vf_ccrepack filter which operates on AVFrames, but we would need to create a BSF equivalent which works on C608 AVPackets.

comment:24 by Devin Heitmueller, 3 hours ago

I took a look at the actual command line provided by the OP, as well as the code, and it does work as expected when the MP4 is created in MOV mode.

To create the file in MOV mode, specify the filename as ".mov" rather than ".mp4". For example:

./ffmpeg -i ../../whatever.ts -i ../../AIM-2301_premiereExport.scc -map 0:v -map 0:a -map 1 -c:v copy -c:a copy -c:s copy output.mov

Once you do that, it creates the output file with a caption track, and the result is playable in Quicktime:

Example output:

Stream #0:7[0x8]: Subtitle: eia_608 (c608 / 0x38303663), 0 kb/s (default)

Metadata:

handler_name : ClosedCaptionHandler

in reply to:  23 ; comment:25 by softworkz, 73 minutes ago

Replying to Devin Heitmueller:

Ok, so a few comments.

The MP4 format can certainly do a dedicated 608 track (i.e. it doesn't have to be embedded in SEI).

Sure, but that doesn't help Zach for broadcasting, where the CCs need to be in the video stream.

in reply to:  25 comment:26 by Devin Heitmueller, 62 minutes ago

Replying to softworkz:

Sure, but that doesn't help Zach for broadcasting, where the CCs need to be in the video stream.

We don't have definitive requirements about how the file should be formatted. Some playout systems are fine with the captions being in a separate stream (in particular if the MP4 will go through a playout system that outputs as SDI). Hence hearing "for broadcast" doesn't really mean much, since you would never broadcast an MP4 container without some form of processing.

That said, if Zach really does need it in the SEI stream, then agreed we would probably have to make some code changes. The subcc lavfi feature lets you go from SEI to an AVPacket stream, but we don't really have a solution today to go in the other direction.

And yes, the "one in, one out" limitation of the BSF framework really does make this harder, since you can't write a BSF to mix data from two separate streams.

Also, I'm assuming that he wants to do this entirely in the compressed domain without re-encoding the video. I would probably suggest two different approaches depending on whether it needs to be done entirely in the compressed domain versus if we just have to be able to get the 608 stream recombined with the video AVFrames prior to encoding.

Note: See TracTickets for help on using tickets.