Changes between Initial Version and Version 1 of Waveform


Ignore:
Timestamp:
Jan 3, 2015, 1:49:15 AM (5 years ago)
Author:
lkiesow
Comment:

Initial Version of How to Use FFmpeg and Gnuplot to Generate Waveform Images

Legend:

Unmodified
Added
Removed
Modified
  • Waveform

    v1 v1  
     1[[PageOutline(1-100,Contents)]]
     2
     3= Overview =
     4
     5Audio data is often represented by a waveform image. This guide explains how to
     6easily create such an image using FFmpeg and Gnuplot.
     7
     8= Single Channel =
     9
     10Plotting a single channel waveform is easier as the data can be passed to
     11Gnuplot directly. The basic idea here is to generate a specific binary format
     12for the audio data which then can be read and interpreted automatically with
     13Gnuplot.
     14
     15The following command will generate a stream of raw binary data with two bytes
     16representing each sample. The input format does not matter as long as there is
     17an audio stream. If it is a multi-channel audio stream, the channels will be
     18mixed into one.
     19
     20{{{
     21  ffmpeg -i in.mkv -ac 1 -map 0:a -c:a pcm_s16le -f data -
     22}}}
     23
     24Since audio usually comes with a lot of samples per second (e.g. 44100 samples
     25per second for CD audio) it is usually a good idea to reduce the amount of data
     26to increase speed and decrease memory consumption. Note that reducing the
     27samplerate too much might distort the generated waveform. A good value to use
     28is usually 8000 samples per second. The modified command line to do that would
     29look like this:
     30
     31{{{
     32  ffmpeg -i "$1" -ac 1 -filter:a aresample=8000 -map 0:a -c:a pcm_s16le -f data -
     33}}}
     34
     35Now what is left is the letting Gnuplot create the waveform image from these
     36data. For that we need a plot command that deals with the output from FFmpeg.
     37
     38{{{
     39  plot '<cat' binary filetype=bin format='%int16' endian=little array=1:0 with lines;
     40}}}
     41
     42This plot command reads from stdin where it expects a one dimensional array of
     43two byte, little endian, signed integer representing the pcm vales. These are
     44then plotted with lines.
     45
     46We can already try this out by combining those commands. Note that the live
     47rendering using the graphical user interface of Gnuplot is rather slow so you
     48don't want to throw too many data at it. Use a short audio file or limit the
     49duration using the '-t' option to plot only a part of the file.
     50
     51{{{
     52  ffmpeg -i in.wav -ac 1 -filter:a aresample=8000 -map 0:a -c:a pcm_s16le -f data - | \
     53    gnuplot -p -e "plot '<cat' binary filetype=bin format='%int16' endian=little array=1:0 with lines;"
     54}}}
     55
     56This should result in something like this:
     57[[Image(gnuplot_window.png)]]
     58
     59Here we already got the waveform, but usually we don't need axis, labels,
     60scales etc. Also the aspect ratio is not optimal. It should rather be something
     61like '1:10'. Hence we need to extend the plot command.
     62
     63{{{
     64  set terminal png size 5000,500;
     65  set output 'waveform.png';
     66
     67  unset key;
     68  unset tics;
     69  unset border;
     70  set lmargin 0;
     71  set rmargin 0;
     72  set tmargin 0;
     73  set bmargin 0;
     74
     75  plot '<cat' binary filetype=bin format='%int16' endian=little array=1:0 with lines;
     76}}}
     77
     78This will make Gnuplot to generate a PNG image with the dimension of 5000x500
     79pixel as output and store it in a file named 'waveform.png'. It also removes all
     80labels, axis and other non-data from the image and set all margins to zero.
     81
     82All this can still be specified using the command line, but it is much more
     83convenient to put all the plot commands in a separate file and pass that one to
     84Gnuplot. Assuming the plot commands are stored in 'waveform.gnuplot', a valid
     85command line to generate a waveform image would then be:
     86
     87{{{
     88  ffmpeg -i in.mp3 -ac 1 -filter:a aresample=8000 -map 0:a -c:a pcm_s16le -f data - | \
     89    gnuplot waveform.gnuplot
     90}}}
     91
     92The result should then look somewhat like this.
     93
     94[[Image(ac1.png)]]
     95
     96
     97= Multiple Channels =
     98
     99While the basic idea and the command for plotting multiple channels remain the
     100same, we cannot simply pipe the data into Gnuplot since the channels have to be
     101plotted separately. Thus we first use FFmpeg to extract the data for all
     102channels and then plot the data in a second step.
     103
     104An FFmpeg command line to extract the audio channel data into separate files,
     105prepared for Gnuplot could look like this:
     106
     107{{{
     108  ffmpeg -i in.mp4 -ac 2 -filter_complex:a '[0:a]aresample=8000,asplit[l][r]' \
     109    -map '[l]' -c:a pcm_s16le -f data /tmp/plot-waveform-ac1 \
     110    -map '[r]' -c:a pcm_s16le -f data /tmp/plot-waveform-ac2
     111}}}
     112
     113This would again downsample the data, but then split the audio channels into
     114separate streams and store them in two files. One for each channel.
     115
     116A plot command for this would then look like this:
     117
     118{{{
     119  set terminal png size 5000,1000;
     120  set output 'waveform.png';
     121
     122  unset key;
     123  unset tics;
     124  unset border;
     125  set lmargin 0;
     126  set rmargin 0;
     127  set tmargin 0;
     128  set bmargin 0;
     129
     130  set multiplot layout 2,1;
     131  plot '/tmp/plot-waveform-ac1' binary filetype=bin format='%int16' endian=little array=1:0 with lines;
     132  plot '/tmp/plot-waveform-ac2' binary filetype=bin format='%int16' endian=little array=1:0 with lines;
     133  unset multiplot;"
     134}}}
     135
     136The resulting image would then look like this:
     137
     138[[Image(ac2.png)]]
     139
     140
     141= Additional Hints =
     142
     143Sometimes there are a few loud pitches while the rest of the data is relatively
     144quiet. These pitches would cause the rest of the data to scale down which might
     145be unwanted. To make the y-axis centered and cut off peaks, add the following
     146line (adjust the values) to the plot command:
     147
     148{{{
     149  set yrange [-600:600];
     150}}}