Opened 7 years ago
Last modified 20 months ago
#5851 open defect
Option to remove tags from Closed Captions
Reported by: | edumj | Owned by: | |
---|---|---|---|
Priority: | minor | Component: | avcodec |
Version: | git-master | Keywords: | cc |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
I can extract Closed Captions from this NTSC DVD sample Starship_Troopers.vob with this:
"ffmpeg" -f lavfi -i "movie=Starship_Troopers.vob[out0+subcc]" -map s "output_map-s.srt"
output:
ffmpeg version N-81452-g01aee81 Copyright (c) 2000-2016 the FFmpeg developers built with gcc 5.4.0 (GCC) configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-dxva2 --enable-libmfx --enable-nvenc --enable-avisynth --enable-bzlib --enable-libebur128 --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-lzma --enable-decklink --enable-zlib libavutil 55. 29.100 / 55. 29.100 libavcodec 57. 54.100 / 57. 54.100 libavformat 57. 48.100 / 57. 48.100 libavdevice 57. 0.102 / 57. 0.102 libavfilter 6. 54.100 / 6. 54.100 libswscale 4. 1.100 / 4. 1.100 libswresample 2. 1.100 / 2. 1.100 libpostproc 54. 0.100 / 54. 0.100 Input #0, lavfi, from 'movie=Starship_Troopers.vob[out0+subcc]': Duration: N/A, start: 1986.626100, bitrate: N/A Stream #0:0: Video: rawvideo (I420 / 0x30323449), yuv420p, 720x480 [SAR 1:1 DAR 3:2], 59.94 tbr, 90k tbn, 90k tbc Stream #0:1: Subtitle: eia_608 [srt @ 0612b2c0] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead. [null @ 0608cfa0] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead. Output #0, srt, to 'output_map-s.srt': Metadata: encoder : Lavf57.48.100 Stream #0:0: Subtitle: subrip (srt) Metadata: encoder : Lavc57.54.100 srt Output #1, null, to 'nul': Metadata: encoder : Lavf57.48.100 Stream #1:0: Video: wrapped_avframe, yuv420p, 720x480 [SAR 1:1 DAR 3:2], q=2-31, 200 kb/s, 59.94 fps, 59.94 tbn, 59.94 tbc Metadata: encoder : Lavc57.54.100 wrapped_avframe Stream mapping: Stream #0:1 -> #0:0 (eia_608 (cc_dec) -> subrip (srt)) Stream #0:0 -> #1:0 (rawvideo (native) -> wrapped_avframe (native)) Press [q] to stop, [?] for help frame= 467 fps=0.0 q=-0.0 size= 0kB time=00:00:19.43 bitrate= 0.1kbits/s speed=38.9x frame= 973 fps=973 q=-0.0 size= 1kB time=00:00:40.54 bitrate= 0.1kbits/s speed=40.5x [mpeg2video @ 060527a0] ac-tex damaged at 3 27 [mpeg2video @ 060527a0] Warning MVs not available [mpeg2video @ 060527a0] concealing 135 DC, 135 AC, 135 MV errors in I frame frame= 1229 fps=980 q=-0.0 Lsize= 1kB time=00:00:51.30 bitrate= 0.2kbits/s speed=40.9x video:461kB audio:0kB subtitle:1kB other streams:0kB global headers:0kB muxing overhead: unknown
but, srt has font tags, and some strange position tags:
1 00:00:11,745 --> 00:00:15,249 <font face="Monospace">{\an7}PILOT TRAINEE IBANEZ REPORTING FOR DUTY, MA’AM.</font> 2 00:00:15,249 --> 00:00:18,252 <font face="Monospace">{\an7}- TAKE THE NUMBER TWO CHAIR, \h\hIBANEZ. - YES, MA’AM.</font> 3 00:00:22,756 --> 00:00:27,761 <font face="Monospace">{\an7}\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\h\hIDENTIFY. IBANEZ, "T"-THREE-TWO-FIVE-"A," CLEAR.</font> 4 00:00:30,764 --> 00:00:34,768 <font face="Monospace">{\an7}[ Laughs ] WHAT ARE YOU DOING HERE ?</font> 5 00:00:36,270 --> 00:00:39,273 <font face="Monospace">{\an7}I’M THE GUY WHO’S GONNA TEACH YOU TO FLY THIS CRATE.</font> 6 00:00:39,273 --> 00:00:41,776 <font face="Monospace">{\an7}<i>AH. ASSISTANT INSTRUCTOR.</i></font> 7 00:00:41,775 --> 00:00:44,778 <font face="Monospace">{\an7}SHOULD I CALL YOU "SIR" ?</font> 8 00:00:44,778 --> 00:00:47,281 <font face="Monospace">{\an7}ONLY WHEN I GIVE YOU AN ORDER.</font> 9 00:00:47,281 --> 00:00:49,283 <font face="Monospace">{\an7}PREPARE FOR DEPARTURE.</font>
These tags are not allowed by TXT2VobSub because subtitles are too long, and if I harsub them with this:
"ffmpeg" -i "Starship_Troopers.vob" -vf "subtitles=output_map-s.srt:force_style='FontName=Microsoft Sans Serif,Fontsize=18,Outline=1,PrimaryColour=&HFFFFFF'" -f avi -c:v libxvid -b:v 1500k -vtag XVID -c:a libmp3lame -b:a 128k "Starship_Troopers-ffmpeg.avi"
output:
ffmpeg version N-81452-g01aee81 Copyright (c) 2000-2016 the FFmpeg developers built with gcc 5.4.0 (GCC) configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-dxva2 --enable-libmfx --enable-nvenc --enable-avisynth --enable-bzlib --enable-libebur128 --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-lzma --enable-decklink --enable-zlib libavutil 55. 29.100 / 55. 29.100 libavcodec 57. 54.100 / 57. 54.100 libavformat 57. 48.100 / 57. 48.100 libavdevice 57. 0.102 / 57. 0.102 libavfilter 6. 54.100 / 6. 54.100 libswscale 4. 1.100 / 4. 1.100 libswresample 2. 1.100 / 2. 1.100 libpostproc 54. 0.100 / 54. 0.100 Input #0, mpeg, from 'Starship_Troopers.vob': Duration: 00:00:51.30, start: 1986.626100, bitrate: 4618 kb/s Stream #0:0[0x1e0]: Video: mpeg2video (Main), yuv420p(tv), 720x480 [SAR 32:27 DAR 16:9], Closed Captions, 29.97 fps, 59.94 tbr, 90k tbn, 59.94 tbc Stream #0:1[0x83]: Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s Stream #0:2[0x82]: Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s Stream #0:3[0x80]: Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s Stream #0:4[0x81]: Audio: ac3, 48000 Hz, 5.1(side), fltp, 384 kb/s Stream #0:5[0x20]: Subtitle: dvd_subtitle Stream #0:6[0x22]: Subtitle: dvd_subtitle [Parsed_subtitles_0 @ 049ef6e0] Shaper: FriBidi 0.19.6 (SIMPLE) [Parsed_subtitles_0 @ 049ef6e0] Using font provider directwrite [avi @ 04942f60] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead. Last message repeated 1 times [null @ 04942120] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead. Last message repeated 1 times Output #0, avi, to 'Starship_Troopers-ffmpeg.avi': Metadata: ISFT : Lavf57.48.100 Stream #0:0: Video: mpeg4 (libxvid) (XVID / 0x44495658), yuv420p, 720x480 [SAR 32:27 DAR 16:9], q=2-31, 1500 kb/s, 29.97 fps, 29.97 tbn, 29.97 tbc Metadata: encoder : Lavc57.54.100 libxvid Stream #0:1: Audio: mp3 (libmp3lame) (U[0][0][0] / 0x0055), 48000 Hz, stereo, fltp, delay 1105, padding 0, 128 kb/s Metadata: encoder : Lavc57.54.100 libmp3lame Output #1, null, to 'nul': Metadata: encoder : Lavf57.48.100 Stream #1:0: Video: wrapped_avframe, yuv420p, 720x480 [SAR 32:27 DAR 16:9], q=2-31, 200 kb/s, 29.97 fps, 29.97 tbn, 29.97 tbc Metadata: encoder : Lavc57.54.100 wrapped_avframe Stream #1:1: Audio: pcm_s16le, 48000 Hz, 5.1(side), s16, 4608 kb/s Metadata: encoder : Lavc57.54.100 pcm_s16le Stream mapping: Stream #0:0 -> #0:0 (mpeg2video (native) -> mpeg4 (libxvid)) Stream #0:4 -> #0:1 (ac3 (native) -> mp3 (libmp3lame)) Stream #0:0 -> #1:0 (mpeg2video (native) -> wrapped_avframe (native)) Stream #0:4 -> #1:1 (ac3 (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help [ac3 @ 04de9c80] frame sync error Error while decoding stream #0:4: Invalid data found when processing input [null @ 04942120] Application provided invalid, non monotonically increasing dts to muxer in stream 1: 1891 >= 1891 [libmp3lame @ 04debec0] Queue input is backward in time frame= 95 fps=0.0 q=6.0 q=-0.0 size= 671kB time=00:00:03.94 bitrate=1394.7kbits/s speed=7.83x frame= 185 fps=184 q=6.0 q=-0.0 size= 1326kB time=00:00:07.71 bitrate=1407.7kbits/s speed=7.67x frame= 276 fps=183 q=9.0 q=-0.0 size= 2029kB time=00:00:11.49 bitrate=1446.2kbits/s speed=7.62x [Parsed_subtitles_0 @ 049ef6e0] fontselect: (Microsoft Sans Serif, 400, 0) -> MicrosoftSansSerif, 0, MicrosoftSansSerif [Parsed_subtitles_0 @ 049ef6e0] fontselect: (Monospace, 400, 0) -> CourierNewPSMT, 0, CourierNewPSMT [mpeg @ 002eb780] New subtitle stream 0:7 at pos:8497166 and DTS:1999.51s frame= 372 fps=185 q=5.0 q=-0.0 size= 2752kB time=00:00:15.52 bitrate=1451.8kbits/s speed=7.73x frame= 459 fps=183 q=9.0 q=-0.0 size= 3439kB time=00:00:19.14 bitrate=1471.6kbits/s speed=7.63x frame= 557 fps=185 q=7.0 q=-0.0 size= 4135kB time=00:00:23.18 bitrate=1460.6kbits/s speed= 7.7x frame= 645 fps=184 q=9.0 q=-0.0 size= 4824kB time=00:00:26.88 bitrate=1469.7kbits/s speed=7.65x frame= 733 fps=181 q=6.0 q=-0.0 size= 5313kB time=00:00:30.53 bitrate=1425.2kbits/s speed=7.53x frame= 837 fps=184 q=4.0 q=-0.0 size= 5933kB time=00:00:34.88 bitrate=1393.0kbits/s speed=7.66x frame= 935 fps=185 q=5.0 q=-0.0 size= 6631kB time=00:00:38.98 bitrate=1393.4kbits/s speed=7.71x [Parsed_subtitles_0 @ 049ef6e0] fontselect: (Monospace, 400, 100) -> CourierNewPS-ItalicMT, 0, CourierNewPS-ItalicMT frame= 1035 fps=186 q=5.0 q=-0.0 size= 7311kB time=00:00:43.17 bitrate=1387.1kbits/s speed=7.77x frame= 1139 fps=188 q=6.0 q=-0.0 size= 8053kB time=00:00:47.48 bitrate=1389.5kbits/s speed=7.84x [mpeg2video @ 049477c0] ac-tex damaged at 3 27 [mpeg2video @ 049477c0] Warning MVs not available [mpeg2video @ 049477c0] concealing 135 DC, 135 AC, 135 MV errors in I frame [ac3 @ 04de9c80] incomplete frame frame= 1229 fps=189 q=6.0 Lq=-0.0 size= 8736kB time=00:00:51.31 bitrate=1394.6kbits/s speed= 7.9x video:8300kB audio:29601kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Those font tags overwrite FontName from subtitles filter, and position tags puts subs above and aligned like this:
CCextrator removes those tags, and looks like this:
Is there an option to remove those tags, like "-txt_format text" does with other embed text subs? That way, we could also do soft subs (XSUBs) and not only hard subs from CC.
Change History (7)
comment:1 by , 7 years ago
Component: | undetermined → avcodec |
---|---|
Keywords: | cc added |
Priority: | normal → minor |
Reproduced by developer: | set |
Status: | new → open |
Type: | enhancement → defect |
Version: | unspecified → git-master |
comment:2 by , 7 years ago
Reproduced by developer: | unset |
---|
Otoh, the output looks very similar to what vlc produces, so the tabs may simply be correct.
comment:3 by , 7 years ago
Sorry, what is "Otoh"?
Well, VLC shows original CC like this:
So, it also ignores those tags.
It would be useful to could ignore them if you don't like default style, or if it's problematic because of long lines (Txt2Vobsub rejects them, and SubtitleWorkshop freezes!).
comment:4 by , 23 months ago
\h is non-break space. See: https://github.com/libass/libass/issues/2
And {\an7} is Top-left and ASS tag, not SSA, that means it is pretty modern thingy.
comment:5 by , 23 months ago
What IS VERY FUNNY is that apparently
[Parsed_movie_0 @ 00000182e3099400] EOF timestamp not reliable
is true, since now we have this too:
10 00:00:49,283 --> 00:00:51,963 <font face="Monospace">{\an7}IT’S AMAZING, US RUNNING INTO EACH OTHER LIKE THIS. MAYBE IT’S FATE.</font>
comment:6 by , 23 months ago
Okay, so that is what it is. Positioning in 608 is of course different from used in ass but it is supported when converting (VLC does not support it, BTW, what a joke, but \h is supported there). You can for sure use raw binary cc format, see #4767.
Understood it from here: https://github.com/CCExtractor/ccextractor/issues/1108
I suspect there is a bug but I am unable to analyze it further: What is
\h
and why is it put there by capture_screen()?