Opened 10 years ago

Last modified 7 years ago

#3118 open enhancement

SAMI: multiple languages not detected

Reported by: eelco Owned by:
Priority: normal Component: avformat
Version: git-master Keywords:
Cc: nfxjfg@googlemail.com Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

SAMI files can contain multiple languages, but handles the file as containing a single stream with no way to filter only one language.

How to reproduce:

./ffmpeg -i multiple_languages.smi out.srt
ffmpeg version N-57932-g89a3be8 Copyright (c) 2000-2013 the FFmpeg developers
  built on Nov  5 2013 16:30:18 with Apple LLVM version 5.0 (clang-500.2.78) (based on LLVM 3.3svn)
  configuration: --prefix=/Users/eelco/Projects/Beamer/FFmpeg/build --disable-shared
  libavutil      52. 52.100 / 52. 52.100
  libavcodec     55. 41.100 / 55. 41.100
  libavformat    55. 21.100 / 55. 21.100
  libavdevice    55.  5.100 / 55.  5.100
  libavfilter     3. 90.102 /  3. 90.102
  libswscale      2.  5.101 /  2.  5.101
  libswresample   0. 17.104 /  0. 17.104
Input #0, sami, from 'multiple_languages.smi':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Subtitle: sami
Output #0, srt, to 'out.srt':
  Metadata:
    encoder         : Lavf55.21.100
    Stream #0:0: Subtitle: subrip
Stream mapping:
  Stream #0:0 -> #0:0 (sami -> subrip)
Press [q] to stop, [?] for help
size=      38kB time=00:11:43.56 bitrate=   0.4kbits/s    
video:0kB audio:0kB subtitle:23 global headers:0kB muxing overhead 63.508757%

The input file (multiple_languages.smi) defines the different language in the ‘style sheet’:

...
            <STYLE TYPE="text/css">
            <!--
            P { margin-left:2pt; margin-right:2pt; margin-bottom:1pt;
                font-size:20pt; text-align:center; font-weight:bold;
                color:white; }
            .ENCC { Name:English; lang:en-US; SAMIType:CC; }
            .KRCC { Name:한국어; lang:ko-KR; SAMIType:CC; }
            -->
            </STYLE>
...

And uses the classes to mark the language:

...
<SYNC Start=10109><P Class=KRCC>
<br>사랑과 배신<br>탐욕과 살육의 이야기죠
<SYNC Start=13977><P Class=KRCC>&nbsp;
<SYNC Start=17667><P Class=KRCC>
<br>선악의 정의에 대해서<br>대립하는 가치관을 가진
...

The output however, mixes both languages:

...
4
00:00:10,109 --> 00:00:13,979

There is love and betrayal,
greed and murder. 

5
00:00:17,667 --> 00:00:17,667

선악의 정의에 대해서
대립하는 가치관을 가진 

6
00:00:17,667 --> 00:00:21,717

It's set in this interesting
world of contrasting ideology, 
...

Attachments (3)

multiple_languages.smi (56.4 KB ) - added by eelco 10 years ago.
out.srt (37.7 KB ) - added by eelco 10 years ago.
0001-avformat-basic-language-support-in-SAMI-subtitles.patch (12.2 KB ) - added by Jehan 7 years ago.
Basic language support in SAMI files.

Download all attachments as: .zip

Change History (10)

by eelco, 10 years ago

Attachment: multiple_languages.smi added

by eelco, 10 years ago

Attachment: out.srt added

comment:1 by Cigaes, 10 years ago

Component: undeterminedavformat
Status: newopen
Type: defectenhancement
Version: unspecifiedgit-master

As said above, the lang is set using a CSS property applied using classes.

Currently, the SAMI demuxer just copies the CSS stylesheet into the extradata, and the decoder ignores it.

Handling styled text is still an open issue.

comment:2 by gjdfgh, 10 years ago

The different languages should really handled as separate subtitle tracks.

This is also how they're meant to be used AFAIK.

Note that SAMI has other issues, and the ffmpeg SAMI decoder is apparently completely unusable for Korean users.

comment:3 by gjdfgh, 10 years ago

Cc: nfxjfg@googlemail.com added

comment:4 by Jehan, 7 years ago

I very often get smi files with several languages in it, and it is very annoying because our only solution right now to view them well with media players based on ffmpeg is to edit and delete the unwanted language. It would be good to have multi-lang support.

Here a link to the spec which explains how this works: https://msdn.microsoft.com/en-us/library/ms971327.aspx

in reply to:  2 comment:5 by Jehan, 7 years ago

Replying to gjdfgh:

Note that SAMI has other issues, and the ffmpeg SAMI decoder is apparently completely unusable for Korean users.

By the way about this specific sentence. Maybe it was true 3 years ago when this was written (though not sure about this either, because for a dozen years, all my media players of choice were based on ffmpeg and I have read Korean subtitles for many years). But currently, apart from the fact that I have to edit out the English part of SAMI files before playing, SAMI subtitles work ok. I play videos on mpv (hence with ffmpeg) with SAMI subtitles in Korean nearly every day (with a native Korean person reading the said subtitles).

by Jehan, 7 years ago

Basic language support in SAMI files.

comment:6 by Jehan, 7 years ago

So I have been having a look at the code for SAMI support and I propose this first basic version. It is not yet a full support which extracts several subtitle tracks from a single SAMI. Instead it simply extracts only the subtitles of the default language (the first language in the list, cf. section "Class and Localization" in the spec).

Not perfect yet still better than the current situation.

comment:7 by Carl Eugen Hoyos, 7 years ago

Please send your patch to the development mailing list, it will be ignored here.

Note: See TracTickets for help on using tickets.