#4901 closed defect (invalid)
mjpeg codec ignores -threads 1
Reported by: | Jason Vas Dias | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | undetermined |
Version: | unspecified | Keywords: | |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Summary of the bug:
It seems to be impossible to get the mjpeg codec to use only 1 thread:
How to reproduce:
% strace -e trace=clone ffmpeg -loglevel debug -i a_video.avi -threads 1 -threads:1 1 -threads:v 1 -vf null -af null -f mjpeg -threads 1 -threads:p:mjpeg 1 -vframes 1 -ss 00:00:10.000 -threads 1 -threads:1 1 -threads:d 1 -threads:a 1 -threads:v 1 -threads:s 1 -threads:t 1 -threads:#0 1 -threads:p:mjpeg 1 -an -y a_video.jpg 2>&1 | egrep 'cores|clone' detected 8 logical cores clone(child_stack=0x7f826370dfd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f826370e9d0, tls=0x7f826370e700, child_tidptr=0x7f826370e9d0) = 4602 clone(child_stack=0x7f8262f0cfd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f8262f0d9d0, tls=0x7f8262f0d700, child_tidptr=0x7f8262f0d9d0) = 4603 clone(child_stack=0x7f826270bfd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f826270c9d0, tls=0x7f826270c700, child_tidptr=0x7f826270c9d0) = 4604 clone(child_stack=0x7f8261f0afd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f8261f0b9d0, tls=0x7f8261f0b700, child_tidptr=0x7f8261f0b9d0) = 4605 clone(child_stack=0x7f8261709fd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f826170a9d0, tls=0x7f826170a700, child_tidptr=0x7f826170a9d0) = 4606 clone(child_stack=0x7f8260f08fd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f8260f099d0, tls=0x7f8260f09700, child_tidptr=0x7f8260f099d0) = 4607 clone(child_stack=0x7f8260707fd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f82607089d0, tls=0x7f8260708700, child_tidptr=0x7f82607089d0) = 4608 clone(child_stack=0x7f825ff06fd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f825ff079d0, tls=0x7f825ff07700, child_tidptr=0x7f825ff079d0) = 4609 clone(child_stack=0x7f825f705fd0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f825f7069d0, tls=0x7f825f706700, child_tidptr=0x7f825f7069d0) = 4610 % FFMPEG VERSION : ffmpeg 2.2.16 with no patches, built for RHEL-6.4 on an x86_64 with gcc-5.2.0 (for newer CPU optimization support, optimized for Haswell architecture, with CFLAGS: '-march=x86-64 -mtune=haswell -O3 -g' ) .
This is a regression from ffmpeg 0.8.5 , which uses only 1 thread given the
same effective arguments, and is about 288% more efficient:
example ffmpeg-2.2.16 time measurement:
$ time_hi_res ffmpeg -i a_video.avi -vf null -af null -f mjpeg -vframes 1 -ss 00:00:10.000 -an -y a_video.jpg
...
[elapsed]=0.192097 [cpu%]=119.34 [sys]=0.008992 [user]=0.220266 [rss]=30540 [csw]=378 [vcsw]=296 [fltmaj]=0 [fltmin]=7241 [inblk]=0 [outblk]=8 [exit]=0
example ffmpeg-0.8.5 time measurement:
$ time_hi_res ffmpeg-0.8.5 -i a_video.avi -vf null -af null -f mjpeg -vframes 1 -ss 00:00:10.000 -an -y a_video.jpg
...
[elapsed]=0.057240 [cpu%]=99.83 [sys]=0.003985 [user]=0.053159 [rss]=15704 [csw]=1 [vcsw]=218 [fltmaj]=0 [fltmin]=3969 [inblk]=0 [outblk]=0 [exit]=0
(time_hi_res is a bash shell loadable built-in that is similar to the time
built-in but uses clock_gettime() to make high resolution time measurements,
and uses getrusage() to print out the 'struct rusage' fields shown above) .
"Efficiency" calculations:
ffmpeg 2.2.16 / ffmpeg 0.8.5 :
elapsed time : 0.192097/0.066678 = 2.880965, or @ 288%
user cpu time : 0.220266/0.053159 = 4.143531, or @ 414%
These results are confirmed by the average of many runs of the mjpeg codec.
Please , is there any way of getting the mjpeg codec to use only one thread?
It is very difficult to convince my company to move to using ffmpeg-2.2.16,
which we'd like to do for many obvious reasons, when confronted with
performance results such as those above .
Patches should be submitted to the ffmpeg-devel mailing list and not this bug tracker.
Change History (5)
comment:1 by , 9 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 9 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
Sorry, I misunderstood the ticket: Please test current FFmpeg git head and please provide the command line that allows to reproduce the issue together with the complete, uncut console output (this also makes sure that the ticket cannot be misunderstood).
comment:3 by , 9 years ago
We cannot use current git head because we have a large body of scripts that use the ffmpeg 2.2.x command line arguments format , and the git head is too unstable for our use.
Inspecting the code with GDB shows that with the above -threads* options, which represent all
the possibly relevant combinations I could think of:
-threads 1 -threads:1 1 -threads:0 1 -threads:v 1 -threads:1 1 -threads:d 1 -threads:a 1
-threads:v 1 -threads:s 1 -threads:t 1 -threads:#0 1
only the OUTPUT codec 'thread_count' is getting set to 1 :
@ffmpeg.c , line 2337:
codec = ost->st->codec;
(gdb) p ost->st->codec->thread_count
$51 = 1
but the INPUT codec still has a thread_count of 0, meaning that a number of threads equal to
the number of CPU cores found will be used :
(gdb) p ist->st->codec->thread_count
$52 = 0
This seems nonsensical ; why would one use 8 threads to extract one frame from the input stream
for capture to a JPEG file ( the purpose of the transcode ) ?
There appears to be no way to set the number of threads used for the input codec with
command line arguments.
A quick fix I'm going to adopt until you FFMPEG developers release a better fix is:
if( ocodec->thread_count is non-zero
&& (ocodec->codec_id == AV_CODEC_ID_MJPEG)
) icodec->thread_count = ocodec->thread_count ;
I'll test with this patch and post the results back here shortly.
comment:4 by , 9 years ago
Resolution: | → invalid |
---|---|
Status: | reopened → closed |
As said, please feel free to post all usage questions on the user mailing list, don't forget to post command line and console output there, there is no way for ffmpeg
to know that you want to use an output option for your input file.
Personally, I am interested which past changes make using current git head difficult for you, please feel free to post examples here, thank you!
comment:5 by , 9 years ago
Problem fixed temporarily with patch to ffmpeg.c version 2.2.16 :
--- ffmpeg.c~ 2015-06-18 18:55:40.000000000 +0000 +++ ffmpeg.c 2015-10-02 14:58:55.153062041 +0000 @@ -2167 +2167,2 @@ - if (!av_dict_get(ist->opts, "threads", NULL, 0)) + if ((!ist->st->codec->thread_count) && + !av_dict_get(ist->opts, "threads", NULL, 0)) @@ -2355,0 +2357,3 @@ + if( codec && icodec && codec->thread_count && ! icodec->thread_count ) + icodec->thread_count = codec->thread_count; +
At least this now prevents ffmpeg spawning 8 threads to process input stream:
$ strace -e trace=clone ./ffmpeg -i a_video.avi -threads 1 -vf null -af null -f mjpeg -vframes 1 -ss 00:00:10.000 -an -y a_video.jpg 2>&1 | grep clone $ # no output - no threads cloned.
But, alas the performance is still terrible WRT to 0.8.5 :
$ ./ffmpeg -i a_video.avi -threads 1 -vf null -af null -f mjpeg -vframes 1 -ss 00:00:10.000 -an -y a_video.jpg [elapsed]=0.206965 [cpu%]=99.72 [sys]=0.006971 [user]=0.199424 [rss]=20744 [csw]=1 [vcsw]=273 [fltmaj]=0 [fltmin]=4471 [inblk]=0 [outblk]=0 [exit]=0
Now
Please post all usage questions on the user mailing list (this is a bug tracker, not a support forum) and please understand that version 2.2 is outdated and should not be used, especially not as a new version.