Version 5 (modified by jakubk, 7 years ago) (diff)


Using ffmpeg in a VFX pipeline

Every VFX pipeline needs a way of converting still frames into motion sequences that can be played back on a large screen or projector (and/or directly on artists' workstations) for the purposes of doing a review or just seeing the resulting work in motion. It is perfectly possible to play back high resolution frames directly, but such a setup requires an incredible amount of throughput bandwidth (a 2K sequence will need 300MB/s for seamless playback) and an even larger amount of storage space (the average size for a 2K frame is about 12MB for a 10bit DPX, and a 16bit EXR is about 14MB in size). Encoding these frames into a single compressed video file provides the option to quickly preview the work, makes it portable and is much more suited for quick daily reviews of the work. Full resolution frames should always be used for final reviews and color correct final grades, but that would be performed on speciality hardware/software such as a Mistika suite or a Nucoda Filmmaster attached to a high performance SAN.

The basic requirements for generating movie clips in a VFX pipeline can be summed up into the following points:

  • Resulting clip should be playable on all three major OS's used in VFX: Mac OS X, Windows and most of all, Linux
  • The codec and review player should allow for frame-by-frame scrubbing of the clip
  • There should be some level of compression. This is obviously based on each studios capability in terms of storage space available.
  • Portable devices such as iPads are becoming increasingly more popular for doing film reviews. The clip encoding part of the VFX pipeline should be ideally capable of producing videos playable on these devices
  • The color must be as close as possible to the original source frames
  • The clip should play back on a reasonable hardware setup (you should not need a SAN to do daily reviews)
  • There should be a possibility of creating a mono and a stereo (3D) version of the clip

Out of all the codecs that are supported by ffmpeg, only some of them are usable in a VFX pipeline and possibly none of them meet all of the criteria outlined above, at the time of writing. Some areas will require compromise, but in general it is perfectly possible to successfully use ffmpeg in a such an environment.


The default image file format for a high-end VFX workflow is Lucasfilm's OpenEXR (.exr). In addition to the standard .exr file, Weta Digital has developed a stereo (3D) extension to the EXR standard called SXR, which is basically a container for both the left and the right eyes within one file. (Saves the data managers having to worry about 2 sequences of files per each stereo stream). Unfortunately, at the time of this writing, OpenEXR is not very well supported in ffmpeg (despite there being claims of OpenEXR support) and SXR is definitely not supported. This means that the first part of the workflow must be a file conversion from EXR to either DPX or TIFF (or lossless JPEG).

Describing this conversion process is outside the scope of this guide, but there are many ways to skin this cat. The easiest one is to use Nuke to create a reader node for the SXRs, attach a writer node to it and have it write out a sequence of converted frames. The one thing to keep in mind is that many VFX workflows work with a color LUT to manipulate the look of the resulting images. FFmpeg does not have a way of applying a text-based LUT to its inputs so the LUT must be applied during the conversion process. Once we have a sequence of DPX/TIFF/JPEG/whatever images, we can proceed with encoding them into a movie clip.

One thing to note is that the high dynamic range of EXR (and 16bits per channel) will be "flattened" once the frames are converted to DPX or some other format, but this is OK because the best most video codecs can do is a 10bit depth per channel anyway.


FFmpeg will take an image sequence on the input and will then output a movie clip file (usually in a .mov or .avi container). The VFX/film industry would mostly use the .mov container, especially for client deliveries, because it is the most commonly used format in the industry. Making these clips is pretty straightforward, but we'll look at the options that are of interest to this specific industry the most.

Mono vs. Stereo

The current trend in filmmaking is geared towards producing 3D (stereo) movies. In terms of a file container, a stereo movie is simply a single movie file (like .mov) that contains 2 video tracks - one for each eye. When choosing an output format one must make sure that it supports multiple video tracks (quicktime does, not sure about .avi). Creating a stereo movie requires two extra steps. The first one is that the correct input source must be specified for each eye. This basically means replicating the input path and any parameters for the 2nd eye. The second and most important step is to map both of the input streams into a single video stream. This is done using the "map" filter that comes with ffmpeg and by passing it the following parameters:

-map 0:0 -map 1:0 -metadata stereo_mode=left_right

This tells ffmpeg to take input stream #0 and input stream #1 and map both of them into output stream #0. It is possible to control which eye gets assigned to which track by changing the order of the -map arguments. Once a movie like this is opened in Quicktime, it will show as having 2 video tracks. It is then entirely up to the player that is used for playback to determine how to display this movie in 3D. RV for instance does it by default - all that needs to be done is to turn on stereo mode, but other players may require more tweaking. The metadata tag is potentially optional, I have not tested what happens if it is omitted.


Apple's Prores codec is a very good and efficient codec. It's main problem is that one of the industry's main review linux review tools (RV) does not support playback of Prores on any linux platform. In fact, it only supports playback with the 32bit version of RV on Mac OS X. Having said that, the main use of RV is its capability to do frame-by-frame scrubbing, but if this is not a feature that is necessary for the workflow you're trying to achieve, mplayer and other players will happily play back Prores on linux.

Prores is a 422 codec, with an existing 4444 variation. FFmpeg comes with 3 different prores encodes: "prores", "prores_kostya" and "prores_anatolyi". In our testing we've used the "prores" and the "prores_kostya" encoders and found "prores_kostya" to be the best encoder to use. It is the only one that supports the 4444 colorspace and although it may be slightly slower. The color quality of the videos produced by these two codecs was visually indistinguishable Because of the 4444 support we've decided to go with Kostya's version of prores.

There are 4 profiles that exist within Prores: Proxy, LT, SQ and HQ (and then optionally 4444). In ffmpeg these profiles are assigned numbers (0 is Proxy and 3 is HQ). See Apple's official Prores whitepaper for details on the codec and information associated with the profiles. For quick reference, the basic difference is the bitrates: (TODO). The other option that is used with prores is the -pix_fmt option. This is normally set to yuv422p10le or something like that, but if you want to use the 4444 prores you would set it to yuv4444p10le. (A list of possible pixel formats can be invoked by running ffmpeg -pix_fmts. Note that not all of these formats are actually supported with prores).

An example command line for generating a 2K mono clip with Prores is:

# 2k mono @ 48 fps (422) 
ffmpeg -y -probesize 5000000 -f image2 -r 48 -force_fps -i ${DPX_HERO} -c:v prores_kostya -profile:v 3 -qscale:v ${QSCALE} -vendor ap10 -pix_fmt yuv422p10le -s 2048x1152 -r 48

The options used here are standard and are explained in other documents, but let's elaborate a little bit more on the qscale paramater. This parameter determines the quality of the resulting prores movie - both the resulting size and bitrate. 0 means best and it goes up to 32 which is worst. From empirical testing we've found that a qscale of 9 - 13 gives a good result without exploding the space needed too much. 11 would be a good bet, 9 if a slightly better quality is required. When space is not a problem, go with qscale 5 or less, but approaching zero the resulting clip will be extremely large and the bitrate will be so high that it will stop being playable on normal equipment. The "vendor" argument when set to "ap10" tricks quicktime and Final Cut Pro into thinking that the movie was generated on using a quicktime prores encoder.

An example for generating a 3D (Stereo) 2K movie is:

# 2k stereo @ 48 fps (422)
ffmpeg -y -probesize 5000000 -f image2 -r 48 -force_fps -i ${DPX_HERO} -probesize 5000000 -f image2 -r 48 -force_fps -i ${DPX_2ND} -c:v prores_kostya -profile:v 3 -qscale:v ${QSCALE} -vendor ap10 -pix_fmt yuv422p10le -s 2048x1152 -r 48 -map 0:0 -map 1:0 -metadata stereo_mode=left_right

Photo JPEG

Photo JPEG is a reliable codec that produces movie clips readable on any architecture / OS. There may be problems with playback in high (2K and over) resolutions and there are obviously file space considerations with this codec. Using lossless JPEG does not create very well compressed movies. Based on empiric testing, resolutions up to 1K are perfectly ok with using Photo JPEG, but 2K and above do struggle quite a bit. When generating a Photo JPEG movie clip there is really only one setting which is relevant - qscale and it should be set to 1. The command line for generating a Photo JPEG movie is as follows:

# 2k mono @ 48 fps (422)
ffmpeg -y -probesize 5000000 -f image2 -r 48 -force_fps -i ${DPX_HERO} -c:v mjpeg -qscale:v 1 -vendor ap10 -pix_fmt yuvj422p -s 2048x1152 -r 48


H264 is the newest codec and it seems to be the plumbing that powers all of the videos on the internet. It's extremely efficient at compression - the resulting movie clips are easily 1/10th the size of the same clip made with prores, but it lacks in one critical area. Because of its heavy use of temporal compression, H264 encoded clips are very difficult to scrub frame-by-frame, especially going backwards. It needs to decode frames based on other nearby frames and this is not an easy task. It is very likely that H264 will not be able to be used in reviews that require frame-by-frame scrubbing, but it is an excellent and space-efficient codec for any playback only related workflows. And of course for mobile devices.

H.264 support in ffmpeg is done through VLC's libx264 which is most likely the best H264 encoder out there. If compiling ffmpeg/libx264 manually, please see some of the excellent guides such as the UbuntuCompilationGuide. Reasonably detailed instructions on the plethora of H264 options can be found in the existing x264EncodingGuide. We will detail some of the missing information in this guide.

Like prores, H264 understands the concept of "profiles". These are basically just encoding presets grouped together in a convenient keyword. Existing profiles are: baseline, main, high, high10, high422, high444. Apple's Quicktime only supports *baseline* and *main* profiles and it only supports the 420 colorspace. There are three types of quality settings for H264: bitrate, -qp and -crf. Bitrate is only useful for 2 pass encoding which is not really the best encoding method for this kind of workflow. -qp is the second one and -crf is the third. -qp and -crf are basically the same, with -crf resulting in a smaller file. This guide contains a good writeup and description for these options. Based on testing we found that -crf is definitely the way to go and recommend using -crf as the main quality control parameter for H264 encoded movies. A crf value of 0 produces lossless and very large movies which are unplayable at high (2K+) resolutions. The playback problem continues with increasing CRF values, but becomes manageable at around crf 15 and higher. We found that a crf value of 19 produces very good movies with a very small file size and they should generally play back on reasonable hardware (apple laptops made in the last 2-3 years).

As stated before, the main problem with H264 is the frame-by-frame scrubbing, but aside from that we found that it produces the most color correct output and clips of very high quality. For purely playback applications it is definitely the codec to go with, as it plays on pretty much everything. One thing to keep in mind is that encoding H264 does take the longest out of all the three codecs we tested. (but we haven't played with encoder optimization because we were after the best quality clips, so your mileage may vary).

An example command line for using H264 as an encoder would be: