Opened 8 years ago

Last modified 3 years ago

#5851 open defect

Option to remove tags from Closed Captions

Reported by: edumj Owned by:
Priority: minor Component: avcodec
Version: git-master Keywords: cc
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

I can extract Closed Captions from this NTSC DVD sample Starship_Troopers.vob with this:

"ffmpeg" -f lavfi -i "movie=Starship_Troopers.vob[out0+subcc]" -map s "output_map-s.srt"

output:

ffmpeg version N-81452-g01aee81 Copyright (c) 2000-2016 the FFmpeg developers
  built with gcc 5.4.0 (GCC)
  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-dxva2 --enable-libmfx --enable-nvenc --enable-avisynth --enable-bzlib --enable-libebur128 --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-lzma --enable-decklink --enable-zlib
  libavutil      55. 29.100 / 55. 29.100
  libavcodec     57. 54.100 / 57. 54.100
  libavformat    57. 48.100 / 57. 48.100
  libavdevice    57.  0.102 / 57.  0.102
  libavfilter     6. 54.100 /  6. 54.100
  libswscale      4.  1.100 /  4.  1.100
  libswresample   2.  1.100 /  2.  1.100
  libpostproc    54.  0.100 / 54.  0.100
Input #0, lavfi, from 'movie=Starship_Troopers.vob[out0+subcc]':
  Duration: N/A, start: 1986.626100, bitrate: N/A
    Stream #0:0: Video: rawvideo (I420 / 0x30323449), yuv420p, 720x480 [SAR 1:1 DAR 3:2], 59.94 tbr, 90k tbn, 90k tbc
    Stream #0:1: Subtitle: eia_608
[srt @ 0612b2c0] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
[null @ 0608cfa0] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
Output #0, srt, to 'output_map-s.srt':
  Metadata:
    encoder         : Lavf57.48.100
    Stream #0:0: Subtitle: subrip (srt)
    Metadata:
      encoder         : Lavc57.54.100 srt
Output #1, null, to 'nul':
  Metadata:
    encoder         : Lavf57.48.100
    Stream #1:0: Video: wrapped_avframe, yuv420p, 720x480 [SAR 1:1 DAR 3:2], q=2-31, 200 kb/s, 59.94 fps, 59.94 tbn, 59.94 tbc
    Metadata:
      encoder         : Lavc57.54.100 wrapped_avframe
Stream mapping:
  Stream #0:1 -> #0:0 (eia_608 (cc_dec) -> subrip (srt))
  Stream #0:0 -> #1:0 (rawvideo (native) -> wrapped_avframe (native))
Press [q] to stop, [?] for help
frame=  467 fps=0.0 q=-0.0 size=       0kB time=00:00:19.43 bitrate=   0.1kbits/s speed=38.9x    
frame=  973 fps=973 q=-0.0 size=       1kB time=00:00:40.54 bitrate=   0.1kbits/s speed=40.5x    
[mpeg2video @ 060527a0] ac-tex damaged at 3 27
[mpeg2video @ 060527a0] Warning MVs not available
[mpeg2video @ 060527a0] concealing 135 DC, 135 AC, 135 MV errors in I frame
frame= 1229 fps=980 q=-0.0 Lsize=       1kB time=00:00:51.30 bitrate=   0.2kbits/s speed=40.9x    
video:461kB audio:0kB subtitle:1kB other streams:0kB global headers:0kB muxing overhead: unknown

but, srt has font tags, and some strange position tags:

1
00:00:11,745 --> 00:00:15,249
<font face="Monospace">{\an7}PILOT TRAINEE IBANEZ
REPORTING FOR DUTY, MA’AM.</font>

2
00:00:15,249 --> 00:00:18,252
<font face="Monospace">{\an7}- TAKE THE NUMBER TWO CHAIR,
\h\hIBANEZ.
- YES, MA’AM.</font>

3
00:00:22,756 --> 00:00:27,761
<font face="Monospace">{\an7}\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\hIDENTIFY.
IBANEZ, "T"-THREE-TWO-FIVE-"A,"
CLEAR.</font>

4
00:00:30,764 --> 00:00:34,768
<font face="Monospace">{\an7}[ Laughs ]
WHAT ARE YOU DOING HERE ?</font>

5
00:00:36,270 --> 00:00:39,273
<font face="Monospace">{\an7}I’M THE GUY WHO’S GONNA
TEACH YOU TO FLY THIS CRATE.</font>

6
00:00:39,273 --> 00:00:41,776
<font face="Monospace">{\an7}<i>AH.
ASSISTANT INSTRUCTOR.</i></font>

7
00:00:41,775 --> 00:00:44,778
<font face="Monospace">{\an7}SHOULD I CALL YOU
"SIR" ?</font>

8
00:00:44,778 --> 00:00:47,281
<font face="Monospace">{\an7}ONLY WHEN I GIVE YOU
AN ORDER.</font>

9
00:00:47,281 --> 00:00:49,283
<font face="Monospace">{\an7}PREPARE FOR DEPARTURE.</font>


These tags are not allowed by TXT2VobSub because subtitles are too long, and if I harsub them with this:

"ffmpeg" -i "Starship_Troopers.vob" -vf "subtitles=output_map-s.srt:force_style='FontName=Microsoft Sans Serif,Fontsize=18,Outline=1,PrimaryColour=&HFFFFFF'" -f avi -c:v libxvid -b:v 1500k -vtag XVID -c:a libmp3lame -b:a 128k "Starship_Troopers-ffmpeg.avi"

output:

ffmpeg version N-81452-g01aee81 Copyright (c) 2000-2016 the FFmpeg developers
  built with gcc 5.4.0 (GCC)
  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-dxva2 --enable-libmfx --enable-nvenc --enable-avisynth --enable-bzlib --enable-libebur128 --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-lzma --enable-decklink --enable-zlib
  libavutil      55. 29.100 / 55. 29.100
  libavcodec     57. 54.100 / 57. 54.100
  libavformat    57. 48.100 / 57. 48.100
  libavdevice    57.  0.102 / 57.  0.102
  libavfilter     6. 54.100 /  6. 54.100
  libswscale      4.  1.100 /  4.  1.100
  libswresample   2.  1.100 /  2.  1.100
  libpostproc    54.  0.100 / 54.  0.100
Input #0, mpeg, from 'Starship_Troopers.vob':
  Duration: 00:00:51.30, start: 1986.626100, bitrate: 4618 kb/s
    Stream #0:0[0x1e0]: Video: mpeg2video (Main), yuv420p(tv), 720x480 [SAR 32:27 DAR 16:9], Closed Captions, 29.97 fps, 59.94 tbr, 90k tbn, 59.94 tbc
    Stream #0:1[0x83]: Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s
    Stream #0:2[0x82]: Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s
    Stream #0:3[0x80]: Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s
    Stream #0:4[0x81]: Audio: ac3, 48000 Hz, 5.1(side), fltp, 384 kb/s
    Stream #0:5[0x20]: Subtitle: dvd_subtitle
    Stream #0:6[0x22]: Subtitle: dvd_subtitle
[Parsed_subtitles_0 @ 049ef6e0] Shaper: FriBidi 0.19.6 (SIMPLE)
[Parsed_subtitles_0 @ 049ef6e0] Using font provider directwrite
[avi @ 04942f60] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
    Last message repeated 1 times
[null @ 04942120] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
    Last message repeated 1 times
Output #0, avi, to 'Starship_Troopers-ffmpeg.avi':
  Metadata:
    ISFT            : Lavf57.48.100
    Stream #0:0: Video: mpeg4 (libxvid) (XVID / 0x44495658), yuv420p, 720x480 [SAR 32:27 DAR 16:9], q=2-31, 1500 kb/s, 29.97 fps, 29.97 tbn, 29.97 tbc
    Metadata:
      encoder         : Lavc57.54.100 libxvid
    Stream #0:1: Audio: mp3 (libmp3lame) (U[0][0][0] / 0x0055), 48000 Hz, stereo, fltp, delay 1105, padding 0, 128 kb/s
    Metadata:
      encoder         : Lavc57.54.100 libmp3lame
Output #1, null, to 'nul':
  Metadata:
    encoder         : Lavf57.48.100
    Stream #1:0: Video: wrapped_avframe, yuv420p, 720x480 [SAR 32:27 DAR 16:9], q=2-31, 200 kb/s, 29.97 fps, 29.97 tbn, 29.97 tbc
    Metadata:
      encoder         : Lavc57.54.100 wrapped_avframe
    Stream #1:1: Audio: pcm_s16le, 48000 Hz, 5.1(side), s16, 4608 kb/s
    Metadata:
      encoder         : Lavc57.54.100 pcm_s16le
Stream mapping:
  Stream #0:0 -> #0:0 (mpeg2video (native) -> mpeg4 (libxvid))
  Stream #0:4 -> #0:1 (ac3 (native) -> mp3 (libmp3lame))
  Stream #0:0 -> #1:0 (mpeg2video (native) -> wrapped_avframe (native))
  Stream #0:4 -> #1:1 (ac3 (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
[ac3 @ 04de9c80] frame sync error
Error while decoding stream #0:4: Invalid data found when processing input
[null @ 04942120] Application provided invalid, non monotonically increasing dts to muxer in stream 1: 1891 >= 1891
[libmp3lame @ 04debec0] Queue input is backward in time
frame=   95 fps=0.0 q=6.0 q=-0.0 size=     671kB time=00:00:03.94 bitrate=1394.7kbits/s speed=7.83x    
frame=  185 fps=184 q=6.0 q=-0.0 size=    1326kB time=00:00:07.71 bitrate=1407.7kbits/s speed=7.67x    
frame=  276 fps=183 q=9.0 q=-0.0 size=    2029kB time=00:00:11.49 bitrate=1446.2kbits/s speed=7.62x    
[Parsed_subtitles_0 @ 049ef6e0] fontselect: (Microsoft Sans Serif, 400, 0) -> MicrosoftSansSerif, 0, MicrosoftSansSerif
[Parsed_subtitles_0 @ 049ef6e0] fontselect: (Monospace, 400, 0) -> CourierNewPSMT, 0, CourierNewPSMT
[mpeg @ 002eb780] New subtitle stream 0:7 at pos:8497166 and DTS:1999.51s
frame=  372 fps=185 q=5.0 q=-0.0 size=    2752kB time=00:00:15.52 bitrate=1451.8kbits/s speed=7.73x    
frame=  459 fps=183 q=9.0 q=-0.0 size=    3439kB time=00:00:19.14 bitrate=1471.6kbits/s speed=7.63x    
frame=  557 fps=185 q=7.0 q=-0.0 size=    4135kB time=00:00:23.18 bitrate=1460.6kbits/s speed= 7.7x    
frame=  645 fps=184 q=9.0 q=-0.0 size=    4824kB time=00:00:26.88 bitrate=1469.7kbits/s speed=7.65x    
frame=  733 fps=181 q=6.0 q=-0.0 size=    5313kB time=00:00:30.53 bitrate=1425.2kbits/s speed=7.53x    
frame=  837 fps=184 q=4.0 q=-0.0 size=    5933kB time=00:00:34.88 bitrate=1393.0kbits/s speed=7.66x    
frame=  935 fps=185 q=5.0 q=-0.0 size=    6631kB time=00:00:38.98 bitrate=1393.4kbits/s speed=7.71x    
[Parsed_subtitles_0 @ 049ef6e0] fontselect: (Monospace, 400, 100) -> CourierNewPS-ItalicMT, 0, CourierNewPS-ItalicMT
frame= 1035 fps=186 q=5.0 q=-0.0 size=    7311kB time=00:00:43.17 bitrate=1387.1kbits/s speed=7.77x    
frame= 1139 fps=188 q=6.0 q=-0.0 size=    8053kB time=00:00:47.48 bitrate=1389.5kbits/s speed=7.84x    
[mpeg2video @ 049477c0] ac-tex damaged at 3 27
[mpeg2video @ 049477c0] Warning MVs not available
[mpeg2video @ 049477c0] concealing 135 DC, 135 AC, 135 MV errors in I frame
[ac3 @ 04de9c80] incomplete frame
frame= 1229 fps=189 q=6.0 Lq=-0.0 size=    8736kB time=00:00:51.31 bitrate=1394.6kbits/s speed= 7.9x    
video:8300kB audio:29601kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

Those font tags overwrite FontName from subtitles filter, and position tags puts subs above and aligned like this:

http://i47.photobucket.com/albums/f169/edumj/Starship_Troopers-ffmpeg.png

CCextrator removes those tags, and looks like this:

http://i47.photobucket.com/albums/f169/edumj/Starship_Troopers-ccextractor.png

Is there an option to remove those tags, like "-txt_format text" does with other embed text subs? That way, we could also do soft subs (XSUBs) and not only hard subs from CC.

Change History (7)

comment:1 by Carl Eugen Hoyos, 8 years ago

Component: undeterminedavcodec
Keywords: cc added
Priority: normalminor
Reproduced by developer: set
Status: newopen
Type: enhancementdefect
Version: unspecifiedgit-master

I suspect there is a bug but I am unable to analyze it further: What is \h and why is it put there by capture_screen()?

comment:2 by Carl Eugen Hoyos, 8 years ago

Reproduced by developer: unset

Otoh, the output looks very similar to what vlc produces, so the tabs may simply be correct.

comment:3 by edumj, 8 years ago

Sorry, what is "Otoh"?

Well, VLC shows original CC like this:

http://i47.photobucket.com/albums/f169/edumj/Starship_Troopers.vob.png

So, it also ignores those tags.

It would be useful to could ignore them if you don't like default style, or if it's problematic because of long lines (Txt2Vobsub rejects them, and SubtitleWorkshop freezes!).

comment:4 by Balling, 3 years ago

\h is non-break space. See: https://github.com/libass/libass/issues/2

And {\an7} is Top-left and ASS tag, not SSA, that means it is pretty modern thingy.

Last edited 3 years ago by Balling (previous) (diff)

comment:5 by Balling, 3 years ago

What IS VERY FUNNY is that apparently

[Parsed_movie_0 @ 00000182e3099400] EOF timestamp not reliable

is true, since now we have this too:

10
00:00:49,283 --> 00:00:51,963
<font face="Monospace">{\an7}IT’S AMAZING, US RUNNING
INTO EACH OTHER LIKE THIS.
MAYBE IT’S FATE.</font>

comment:6 by Balling, 3 years ago

Okay, so that is what it is. Positioning in 608 is of course different from used in ass but it is supported when converting (VLC does not support it, BTW, what a joke, but \h is supported there). You can for sure use raw binary cc format, see #4767.

Understood it from here: https://github.com/CCExtractor/ccextractor/issues/1108

comment:7 by Balling, 3 years ago

I found the opposite example where some tags are not inserted! Ha. See

https://github.com/CCExtractor/ccextractor/issues/1277

Note: See TracTickets for help on using tickets.