Opened 5 years ago

Last modified 5 years ago

#1882 open enhancement

Multi-threading wmv encoder

Reported by: txspaderz Owned by:
Priority: wish Component: avcodec
Version: git-master Keywords: wmv2
Cc: Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: no

Description

I'm having issues using multiple cores when using the wmv encoder. It appears to be locked to a single core only.

Any chance we could get support for multiple threads?

Please refer to:
http://forum.serviio.org/viewtopic.php?f=5&t=7698

Attachments (1)

ffmpeg.log.txt (1.6 MB) - added by cehoyos 5 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 5 years ago by cehoyos

  • Component changed from FFmpeg to avcodec
  • Keywords wmv2 added
  • Priority changed from normal to wish
  • Reproduced by developer set
  • Status changed from new to open
  • Version changed from unspecified to git-master

Please post the command line from the forum here together with complete, uncut console output.
(I suspect there is either a work-around for your actual problem or it is actually a different problem than wmv2 encoding speed.)

comment:2 Changed 5 years ago by txspaderz

Here's a tee'd output at this link (it's quite large, in a ascii format):
http://houstondad.com/ffmpeg.log.txt

When the process is running, it's actually switching cores (39th field of /proc/pid/stat shows the last executed core), here's a breakup if it matters:

root@media-center ~ # while true; do cat /proc/12514/stat | awk '{print $39}'; done > /home/perk/ffmpeg_cores.log
root@media-center ~ # sort /home/perk/ffmpeg_cores.log | uniq -c | sort -k2 -n
 136778 0
 409369 1
 405346 2
 406885 3

However during the encoding time, the process is continually using only 1 cores worth of cycles, and nothing more:

12514 root      20   0 91080  65m 4632 R  100  0.4  10:00.98 ffmpeg
12514 root      20   0 91548  65m 4632 R   97  0.4  10:01.96 ffmpeg
12514 root      20   0 90456  65m 4632 R  100  0.4  10:02.98 ffmpeg
12514 root      20   0 90456  65m 4632 R   98  0.4  10:03.97 ffmpeg
12514 root      20   0 90684  65m 4632 R  100  0.4  10:04.98 ffmpeg
12514 root      20   0 90684  65m 4632 R  100  0.4  10:06.00 ffmpeg
12514 root      20   0 90436  65m 4632 R   98  0.4  10:07.00 ffmpeg
12514 root      20   0 90436  65m 4632 R  100  0.4  10:08.01 ffmpeg
12514 root      20   0 90436  65m 4632 R   99  0.4  10:09.02 ffmpeg
12514 root      20   0 90436  65m 4632 R   99  0.4  10:10.03 ffmpeg
12514 root      20   0 90740  65m 4632 R   99  0.4  10:11.03 ffmpeg
12514 root      20   0 91176  65m 4632 R   99  0.4  10:12.03 ffmpeg
12514 root      20   0 91176  65m 4632 R  101  0.4  10:13.05 ffmpeg
12514 root      20   0 91176  65m 4632 R   96  0.4  10:14.03 ffmpeg
12514 root      20   0 91176  65m 4632 R  101  0.4  10:15.06 ffmpeg

Because of this, and the source files fps is 24, the encoder isn't able to stream in real time.

comment:3 Changed 5 years ago by cehoyos

Please always post all necessary information here on the bug tracker, do not use external resources, they may disappear!

There are two possibilities to fix your original problem (performance on transcoding from vc1 to wmv2 is too low):
The first is to implement wmv2 multi-threaded encoding. Given that the task is not trivial and this codec was deprecated by Microsoft years ago, I am not sure how likely this is to happen.
The second is to implement vc1 multi-threaded decoding. While this is probably not simpler, vc1 decoder is an important part of libavcodec, I would therefore suspect that the chances are (very slightly) higher, consider opening a second enhancement request (or wait for me to do it).

Changed 5 years ago by cehoyos

comment:4 follow-up: Changed 5 years ago by cehoyos

Out of curiosity: Could you explain why you are using -copyts ?
(Did you test if constant quantiser is faster?)

comment:5 in reply to: ↑ 4 Changed 5 years ago by txspaderz

Replying to cehoyos:

Out of curiosity: Could you explain why you are using -copyts ?
(Did you test if constant quantiser is faster?)

Thanks, if you could put in that request I'd greatly appreciate it.

I'm not sure why it's adding -copyts, the command line is being generated by a program that acts as a dlna server.

comment:6 Changed 5 years ago by cehoyos

Reading your forum post again:
Is this really a regression? Was the performance better for the same input file and an older FFmpeg version? That would be a serious bug, please report the previous FFmpeg version.

comment:7 follow-up: Changed 5 years ago by xnejp03

Last edited 5 years ago by xnejp03 (previous) (diff)

comment:7 follow-up: Changed 5 years ago by xnejp03

Hi, I'm the author of the software. wmv2 is the only wmv-based encoder that FFmpeg supports, and is needed for on-the-fly transcoding for Xbox (among others). The encoder doesn't support -threads with value other than 1. It's a major bottleneck, especially for HD videos. I'm not sure multi-threaded VC1 decoder will help, as the same will happen when transcoding (for example) MKV/H264 HD file.

I appreciate wmv2 is deprecated, but this would really help a lot, unless there is a chance of wmv3 encoder.

Regarding -copyts: it's an attempt to keep audio/video in sync, but it might be completely inappropriate, the documentation on this parameter is a bit scarce :-)

comment:8 in reply to: ↑ 7 Changed 5 years ago by cehoyos

Replying to xnejp03:

I'm not sure multi-threaded VC1 decoder will help, as the same will happen when transcoding (for example) MKV/H264 HD file.

But that is not what the OP reported, so please provide a sample and command line together with complete, uncut console output.

comment:9 follow-up: Changed 5 years ago by xnejp03

Actually it looks like the threads parameter is now not breaking the command any more. It also looks that when not supplied, it uses all cores (at least for the mpeg2video encoder) and the threads parameter is now used mostly to limit the usage of CPUs (can you confirm)?

When run with -threads 1 I'm getting about 39 fps on my example file, when run without -threads it does about 80 fps, so it'd seem that there is some parallelism implemented. It doesn't push all the cores to maximum, as mpeg2video does though, so there is possible room for improvement.

Also, does it make difference if -threads is provided before -i and after? Would that specify number of CPUs separately for decoder and encoder? Or is just one definition (before -i) enough?

comment:10 in reply to: ↑ 9 Changed 5 years ago by cehoyos

Replying to xnejp03:

Actually it looks like the threads parameter is now not breaking the command any more.

?
(I am curious: Could you point me to the bug report?)

It also looks that when not supplied, it uses all cores (at least for the mpeg2video encoder) and the threads parameter is now used mostly to limit the usage of CPUs (can you confirm)?

The threads parameter allows to specify the number of used threads, the default is "0" (auto).

When run with -threads 1 I'm getting about 39 fps on my example file, when run without -threads it does about 80 fps, so it'd seem that there is some parallelism implemented. It doesn't push all the cores to maximum, as mpeg2video does though, so there is possible room for improvement.

I am not sure if auto is always a good choice: It detects the number of cores, but in nearly all cases, you should specify a higher number for maximum performance.

Also, does it make difference if -threads is provided before -i and after? Would that specify number of CPUs separately for decoder and encoder? Or is just one definition (before -i) enough?

You can specify -threads for the decoder and the encoder.

Note: See TracTickets for help on using tickets.