wiki:

Encode

/

VFX


Version 3 (modified by jakubk, 5 years ago) (diff)

--

Using ffmpeg in a VFX pipeline

Every VFX pipeline needs a way of converting still frames into motion sequences that can be played back on a large screen or projector (and/or directly on artists' workstations) for the purposes of doing a review or just seeing the resulting work in motion. It is perfectly possible to play back high resolution frames directly, but such a setup requires an incredible amount of throughput bandwidth (a 2K sequence will need 300MB/s for seamless playback) and an even larger amount of storage space (the average size for a 2K frame is about 12MB for a 10bit DPX, and a 16bit EXR is about 14MB in size). Encoding these frames into a single compressed video file provides the option to quickly preview the work, makes it portable and is much more suited for quick daily reviews of the work. Full resolution frames should always be used for final reviews and color correct final grades, but that would be performed on speciality hardware/software such as a Mistika suite or a Nucoda Filmmaster attached to a high performance SAN.

The basic requirements for generating movie clips in a VFX pipeline can be summed up into the following points:

  • Resulting clip should be playable on all three major OS's used in VFX: Mac OS X, Windows and most of all, Linux
  • The codec and review player should allow for frame-by-frame scrubbing of the clip
  • There should be some level of compression. This is obviously based on each studios capability in terms of storage space available.
  • Portable devices such as iPads are becoming increasingly more popular for doing film reviews. The clip encoding part of the VFX pipeline should be ideally capable of producing videos playable on these devices
  • The color must be as close as possible to the original source frames
  • The clip should play back on a reasonable hardware setup (you should not need a SAN to do daily reviews)
  • There should be a possibility of creating a mono and a stereo (3D) version of the clip

Out of all the codecs that are supported by ffmpeg, only some of them are usable in a VFX pipeline and possibly none of them meet all of the criteria outlined above, at the time of writing. Some areas will require compromise, but in general it is perfectly possible to successfully use ffmpeg in a such an environment.

Input

The default image file format for a high-end VFX workflow is Lucasfilm's OpenEXR (.exr). In addition to the standard .exr file, Weta Digital has developed a stereo (3D) extension to the EXR standard called SXR, which is basically a container for both the left and the right eyes within one file. (Saves the data managers having to worry about 2 sequences of files per each stereo stream). Unfortunately, at the time of this writing, OpenEXR is not very well supported in ffmpeg (despite there being claims of OpenEXR support) and SXR is definitely not supported. This means that the first part of the workflow must be a file conversion from EXR to either DPX or TIFF (or lossless JPEG).

Describing this conversion process is outside the scope of this guide, but there are many ways to skin this cat. The easiest one is to use Nuke to create a reader node for the SXRs, attach a writer node to it and have it write out a sequence of converted frames. The one thing to keep in mind is that many VFX workflows work with a color LUT to manipulate the look of the resulting images. FFmpeg does not have a way of applying a text-based LUT to its inputs so the LUT must be applied during the conversion process. Once we have a sequence of DPX/TIFF/JPEG/whatever images, we can proceed with encoding them into a movie clip.

One thing to note is that the high dynamic range of EXR (and 16bits per channel) will be "flattened" once the frames are converted to DPX or some other format, but this is OK because the best most video codecs can do is a 10bit depth per channel anyway.

Output

FFmpeg will take an image sequence on the input and will then output a movie clip file (usually in a .mov or .avi container). The VFX/film industry would mostly use the .mov container, especially for client deliveries, because it is the most commonly used format in the industry. Making these clips is pretty straightforward, but we'll look at the options that are of interest to this specific industry the most.

Mono vs. Stereo

The current trend in filmmaking is geared towards producing 3D (stereo) movies. In terms of a file container, a stereo movie is simply a single movie file (like .mov) that contains 2 video tracks - one for each eye. When choosing an output format one must make sure that it supports multiple video tracks (quicktime does, not sure about .avi). Creating a stereo movie requires two extra steps. The first one is that the correct input source must be specified for each eye. This basically means replicating the input path and any parameters for the 2nd eye. The second and most important step is to map both of the input streams into a single video stream. This is done using the "map" filter that comes with ffmpeg and by passing it the following parameters:

-map 0:0 -map 1:0 -metadata stereo_mode=left_right

This tells ffmpeg to take input stream #0 and input stream #1 and map both of them into output stream #0. Once a movie like this is opened in Quicktime, it will show as having 2 video tracks. It is then entirely up to the player that is used for playback to determine how to display this movie in 3D. RV for instance does it by default - all that needs to be done is to turn on stereo mode, but other players may require more tweaking. The metadata tag is potentially optional, I have not tested what happens if it is omitted

Prores

Apple's Prores codec is a very good and efficient codec. It's main problem is that one of the industry's main review linux review tools (RV) does not support playback of Prores on any linux platform. In fact, it only supports playback with the 32bit version of RV on Mac OS X. Having said that, the main use of RV is its capability to do frame-by-frame scrubbing, but if this is not a feature that is necessary for the workflow you're trying to achieve, mplayer and other players will happily play back Prores on linux.

Prores is a 422 codec, with an existing 4444 variation. FFmpeg comes with 3 different prores encodes: "prores", "prores_kostya" and "prores_anatolyi". In our testing we've used the "prores" and the "prores_kostya" encoders and found "prores_kostya" to be the best encoder to use. It is the only one that supports the 4444 colorspace and although it may be slightly slower. The color quality of the videos produced by these two codecs was visually indistinguishable Because of the 4444 support we've decided to go with Kostya's version of prores.

There are 4 profiles that exist within Prores: Proxy, LT, SQ and HQ (and then optionally 4444). In ffmpeg these profiles are assigned numbers (0 is Proxy and 3 is HQ). See Apple's official Prores whitepaper for details on the codec and information associated with the profiles. For quick reference, the basic difference is the bitrates: (TODO). The other option that is used with prores is the -pix_fmt option. This is normally set to yuv422p10le or something like that, but if you want to use the 4444 prores you would set it to yuv4444p10le. (A list of possible pixel formats can be invoked by running ffmpeg -pix_fmts. Note that not all of these formats are actually supported with prores).

An example command line for generating a 2K mono clip with Prores is:

# 2k mono @ 48 fps (422) 
ffmpeg -y -probesize 5000000 -f image2 -r 48 -force_fps -i ${DPX_HERO} -c:v prores_kostya -profile:v 3 -qscale:v ${QSCALE} -vendor ap10 -pix_fmt yuv422p10le -s 2048x1152 -r 48 output.mov

The options used here are standard and are explained in other documents, but let's elaborate a little bit more on the qscale paramater. This parameter determines the quality of the resulting prores movie - both the resulting size and bitrate. 0 means best and it goes up to 32 which is worst. From empirical testing we've found that a qscale of 9 - 13 gives a good result without exploding the space needed too much. 11 would be a good bet, 9 if a slightly better quality is required. When space is not a problem, go with qscale 5 or less, but approaching zero the resulting clip will be extremely large and the bitrate will be so high that it will stop being playable on normal equipment.

# 2k stereo @ 48 fps (422)

ffmpeg -y -probesize ${PROBESIZE} -f image2 -r 48 -force_fps -i ${DPX_HERO} -probesize ${PROBESIZE} -f image2 -r 48 -force_fps -i ${DPX_2ND} -c:v prores_kostya -profile:v 3 -qscale:v ${QSCALE} -vendor ap10 -pix_fmt yuv422p10le -s 2048x1152 -r 48 -map 0:0 -map 1:0 -metadata stereo_mode=left_right output.mov }}}