Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#4901 closed defect (invalid)

mjpeg codec ignores -threads 1

Reported by: jvd66 Owned by:
Priority: normal Component: undetermined
Version: unspecified Keywords:
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:
It seems to be impossible to get the mjpeg codec to use only 1 thread:
How to reproduce:

% strace -e trace=clone ffmpeg -loglevel debug -i a_video.avi -threads 1 -threads:1 1 -threads:v 1 -vf null -af null -f mjpeg -threads 1 -threads:p:mjpeg 1 -vframes 1 -ss 00:00:10.000 -threads 1 -threads:1 1 -threads:d 1 -threads:a 1 -threads:v 1 -threads:s 1 -threads:t 1 -threads:#0 1 -threads:p:mjpeg 1 -an -y a_video.jpg 2>&1 | egrep 'cores|clone' 
detected 8 logical cores
clone(child_stack=0x7f826370dfd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f826370e9d0, tls=0x7f826370e700, child_tidptr=0x7f826370e9d0) = 4602
clone(child_stack=0x7f8262f0cfd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f8262f0d9d0, tls=0x7f8262f0d700, child_tidptr=0x7f8262f0d9d0) = 4603
clone(child_stack=0x7f826270bfd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f826270c9d0, tls=0x7f826270c700, child_tidptr=0x7f826270c9d0) = 4604
clone(child_stack=0x7f8261f0afd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f8261f0b9d0, tls=0x7f8261f0b700, child_tidptr=0x7f8261f0b9d0) = 4605
clone(child_stack=0x7f8261709fd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f826170a9d0, tls=0x7f826170a700, child_tidptr=0x7f826170a9d0) = 4606
clone(child_stack=0x7f8260f08fd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f8260f099d0, tls=0x7f8260f09700, child_tidptr=0x7f8260f099d0) = 4607
clone(child_stack=0x7f8260707fd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f82607089d0, tls=0x7f8260708700, child_tidptr=0x7f82607089d0) = 4608
clone(child_stack=0x7f825ff06fd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f825ff079d0, tls=0x7f825ff07700, child_tidptr=0x7f825ff079d0) = 4609
clone(child_stack=0x7f825f705fd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f825f7069d0, tls=0x7f825f706700, child_tidptr=0x7f825f7069d0) = 4610
%

FFMPEG VERSION :
ffmpeg 2.2.16 with no patches, built for RHEL-6.4 on an x86_64 with gcc-5.2.0
(for newer CPU optimization support, optimized for Haswell architecture, with
 CFLAGS: '-march=x86-64 -mtune=haswell -O3 -g' ) .

This is a regression from ffmpeg 0.8.5 , which uses only 1 thread given the
same effective arguments, and is about 288% more efficient:

example ffmpeg-2.2.16 time measurement:

$ time_hi_res ffmpeg -i a_video.avi -vf null -af null -f mjpeg -vframes 1 -ss 00:00:10.000 -an -y a_video.jpg
...
[elapsed]=0.192097 [cpu%]=119.34 [sys]=0.008992 [user]=0.220266 [rss]=30540 [csw]=378 [vcsw]=296 [fltmaj]=0 [fltmin]=7241 [inblk]=0 [outblk]=8 [exit]=0

example ffmpeg-0.8.5 time measurement:

$ time_hi_res ffmpeg-0.8.5 -i a_video.avi -vf null -af null -f mjpeg -vframes 1 -ss 00:00:10.000 -an -y a_video.jpg
...
[elapsed]=0.057240 [cpu%]=99.83 [sys]=0.003985 [user]=0.053159 [rss]=15704 [csw]=1 [vcsw]=218 [fltmaj]=0 [fltmin]=3969 [inblk]=0 [outblk]=0 [exit]=0

(time_hi_res is a bash shell loadable built-in that is similar to the time

built-in but uses clock_gettime() to make high resolution time measurements,
and uses getrusage() to print out the 'struct rusage' fields shown above) .

"Efficiency" calculations:

ffmpeg 2.2.16 / ffmpeg 0.8.5 :

elapsed time : 0.192097/0.066678 = 2.880965, or @ 288%
user cpu time : 0.220266/0.053159 = 4.143531, or @ 414%

These results are confirmed by the average of many runs of the mjpeg codec.

Please , is there any way of getting the mjpeg codec to use only one thread?

It is very difficult to convince my company to move to using ffmpeg-2.2.16,
which we'd like to do for many obvious reasons, when confronted with
performance results such as those above .

Patches should be submitted to the ffmpeg-devel mailing list and not this bug tracker.

Change History (5)

comment:1 Changed 4 years ago by cehoyos

  • Resolution set to invalid
  • Status changed from new to closed

Please post all usage questions on the user mailing list (this is a bug tracker, not a support forum) and please understand that version 2.2 is outdated and should not be used, especially not as a new version.

comment:2 Changed 4 years ago by cehoyos

  • Resolution invalid deleted
  • Status changed from closed to reopened

Sorry, I misunderstood the ticket: Please test current FFmpeg git head and please provide the command line that allows to reproduce the issue together with the complete, uncut console output (this also makes sure that the ticket cannot be misunderstood).

comment:3 Changed 4 years ago by jvd66

We cannot use current git head because we have a large body of scripts that use the ffmpeg 2.2.x command line arguments format , and the git head is too unstable for our use.

Inspecting the code with GDB shows that with the above -threads* options, which represent all
the possibly relevant combinations I could think of:

-threads 1 -threads:1 1 -threads:0 1 -threads:v 1 -threads:1 1 -threads:d 1 -threads:a 1
-threads:v 1 -threads:s 1 -threads:t 1 -threads:#0 1

only the OUTPUT codec 'thread_count' is getting set to 1 :

@ffmpeg.c , line 2337:

codec = ost->st->codec;

(gdb) p ost->st->codec->thread_count
$51 = 1

but the INPUT codec still has a thread_count of 0, meaning that a number of threads equal to
the number of CPU cores found will be used :

(gdb) p ist->st->codec->thread_count
$52 = 0

This seems nonsensical ; why would one use 8 threads to extract one frame from the input stream
for capture to a JPEG file ( the purpose of the transcode ) ?

There appears to be no way to set the number of threads used for the input codec with
command line arguments.

A quick fix I'm going to adopt until you FFMPEG developers release a better fix is:

if( ocodec->thread_count is non-zero

&& (ocodec->codec_id == AV_CODEC_ID_MJPEG)
) icodec->thread_count = ocodec->thread_count ;

I'll test with this patch and post the results back here shortly.

comment:4 Changed 4 years ago by cehoyos

  • Resolution set to invalid
  • Status changed from reopened to closed

As said, please feel free to post all usage questions on the user mailing list, don't forget to post command line and console output there, there is no way for ffmpeg to know that you want to use an output option for your input file.

Personally, I am interested which past changes make using current git head difficult for you, please feel free to post examples here, thank you!

Last edited 4 years ago by cehoyos (previous) (diff)

comment:5 Changed 4 years ago by jvd66

Problem fixed temporarily with patch to ffmpeg.c version 2.2.16 :

--- ffmpeg.c~   2015-06-18 18:55:40.000000000 +0000
+++ ffmpeg.c    2015-10-02 14:58:55.153062041 +0000
@@ -2167 +2167,2 @@
-        if (!av_dict_get(ist->opts, "threads", NULL, 0))
+        if ((!ist->st->codec->thread_count) &&
+            !av_dict_get(ist->opts, "threads", NULL, 0))
@@ -2355,0 +2357,3 @@
+        if( codec && icodec && codec->thread_count && ! icodec->thread_count )
+            icodec->thread_count = codec->thread_count;
+

At least this now prevents ffmpeg spawning 8 threads to process input stream:

   $ strace -e trace=clone ./ffmpeg  -i a_video.avi -threads 1  -vf null -af null -f mjpeg  -vframes 1 -ss 00:00:10.000 -an -y a_video.jpg 2>&1 | grep clone
   $
   # no output - no threads cloned.

But, alas the performance is still terrible WRT to 0.8.5 :

   $ ./ffmpeg -i a_video.avi -threads 1 -vf null -af null -f mjpeg -vframes 1 -ss 00:00:10.000 -an -y a_video.jpg 
   [elapsed]=0.206965 [cpu%]=99.72 [sys]=0.006971 [user]=0.199424 [rss]=20744 [csw]=1 [vcsw]=273 [fltmaj]=0 [fltmin]=4471 [inblk]=0 [outblk]=0 [exit]=0

Now

Note: See TracTickets for help on using tickets.