Opened 11 months ago

Last modified 11 months ago

#10378 new enhancement

Feature Request: positional ocr of dvd and bd subs (in combination with tesseract)

Reported by: techguru Owned by:
Priority: important Component: undetermined
Version: unspecified Keywords: frame data ocr
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

if you combine ffmpeg and tesseract(and maybe opencv)....it should be possible to get positional frame data of the text in the frame
and its been a BADLY needed feature for those of us that do OCRing of dvd and bd subs for many many years now

I have attached a very good example of the sub files that has a lot of this positional use

(and yes I understand theres no positional data in the sub file...thats where ffmpeg in conjunction with tesseract should come in...it should be able to find that data in relation to the size of the frame)

the output subtitle would have to be .ass since its one of the more popular types with positional awareness

Attachments (1)

subs.zip (1.7 MB ) - added by techguru 11 months ago.
sub files

Download all attachments as: .zip

Change History (4)

by techguru, 11 months ago

Attachment: subs.zip added

sub files

comment:1 by Balling, 11 months ago

That is all done in Subtitle Edit 3.6.12.

comment:2 by techguru, 11 months ago

no its not...it can only ocr it and put it at the bottom
it doesn't know positioning

comment:3 by techguru, 11 months ago

NISKE...author of SE..WANTS to add this feature...but the frame needs to be processed by a ffmpeg/libav to pass that data to tesseract first

Note: See TracTickets for help on using tickets.