wiki:

Waveform


Version 1 (modified by lkiesow, 3 years ago) (diff)

Initial Version of How to Use FFmpeg and Gnuplot to Generate Waveform Images

Overview

Audio data is often represented by a waveform image. This guide explains how to easily create such an image using FFmpeg and Gnuplot.

Single Channel

Plotting a single channel waveform is easier as the data can be passed to Gnuplot directly. The basic idea here is to generate a specific binary format for the audio data which then can be read and interpreted automatically with Gnuplot.

The following command will generate a stream of raw binary data with two bytes representing each sample. The input format does not matter as long as there is an audio stream. If it is a multi-channel audio stream, the channels will be mixed into one.

  ffmpeg -i in.mkv -ac 1 -map 0:a -c:a pcm_s16le -f data -

Since audio usually comes with a lot of samples per second (e.g. 44100 samples per second for CD audio) it is usually a good idea to reduce the amount of data to increase speed and decrease memory consumption. Note that reducing the samplerate too much might distort the generated waveform. A good value to use is usually 8000 samples per second. The modified command line to do that would look like this:

  ffmpeg -i "$1" -ac 1 -filter:a aresample=8000 -map 0:a -c:a pcm_s16le -f data -

Now what is left is the letting Gnuplot create the waveform image from these data. For that we need a plot command that deals with the output from FFmpeg.

  plot '<cat' binary filetype=bin format='%int16' endian=little array=1:0 with lines;

This plot command reads from stdin where it expects a one dimensional array of two byte, little endian, signed integer representing the pcm vales. These are then plotted with lines.

We can already try this out by combining those commands. Note that the live rendering using the graphical user interface of Gnuplot is rather slow so you don't want to throw too many data at it. Use a short audio file or limit the duration using the '-t' option to plot only a part of the file.

  ffmpeg -i in.wav -ac 1 -filter:a aresample=8000 -map 0:a -c:a pcm_s16le -f data - | \
    gnuplot -p -e "plot '<cat' binary filetype=bin format='%int16' endian=little array=1:0 with lines;"

This should result in something like this: Gnuplot Window containing a waveform plot

Here we already got the waveform, but usually we don't need axis, labels, scales etc. Also the aspect ratio is not optimal. It should rather be something like '1:10'. Hence we need to extend the plot command.

  set terminal png size 5000,500;
  set output 'waveform.png';

  unset key;
  unset tics;
  unset border;
  set lmargin 0;
  set rmargin 0;
  set tmargin 0;
  set bmargin 0;

  plot '<cat' binary filetype=bin format='%int16' endian=little array=1:0 with lines;

This will make Gnuplot to generate a PNG image with the dimension of 5000x500 pixel as output and store it in a file named 'waveform.png'. It also removes all labels, axis and other non-data from the image and set all margins to zero.

All this can still be specified using the command line, but it is much more convenient to put all the plot commands in a separate file and pass that one to Gnuplot. Assuming the plot commands are stored in 'waveform.gnuplot', a valid command line to generate a waveform image would then be:

  ffmpeg -i in.mp3 -ac 1 -filter:a aresample=8000 -map 0:a -c:a pcm_s16le -f data - | \
    gnuplot waveform.gnuplot

The result should then look somewhat like this.

Waveform image of a single audio channel

Multiple Channels

While the basic idea and the command for plotting multiple channels remain the same, we cannot simply pipe the data into Gnuplot since the channels have to be plotted separately. Thus we first use FFmpeg to extract the data for all channels and then plot the data in a second step.

An FFmpeg command line to extract the audio channel data into separate files, prepared for Gnuplot could look like this:

  ffmpeg -i in.mp4 -ac 2 -filter_complex:a '[0:a]aresample=8000,asplit[l][r]' \
    -map '[l]' -c:a pcm_s16le -f data /tmp/plot-waveform-ac1 \
    -map '[r]' -c:a pcm_s16le -f data /tmp/plot-waveform-ac2

This would again downsample the data, but then split the audio channels into separate streams and store them in two files. One for each channel.

A plot command for this would then look like this:

  set terminal png size 5000,1000;
  set output 'waveform.png';

  unset key;
  unset tics;
  unset border;
  set lmargin 0;
  set rmargin 0;
  set tmargin 0;
  set bmargin 0;

  set multiplot layout 2,1;
  plot '/tmp/plot-waveform-ac1' binary filetype=bin format='%int16' endian=little array=1:0 with lines;
  plot '/tmp/plot-waveform-ac2' binary filetype=bin format='%int16' endian=little array=1:0 with lines;
  unset multiplot;"

The resulting image would then look like this:

Waveform image of two audio channel

Additional Hints

Sometimes there are a few loud pitches while the rest of the data is relatively quiet. These pitches would cause the rest of the data to scale down which might be unwanted. To make the y-axis centered and cut off peaks, add the following line (adjust the values) to the plot command:

  set yrange [-600:600];

Attachments (9)

Download all attachments as: .zip