Opened 14 months ago
Last modified 5 days ago
#6156 reopened defect
Issue with FFMPEG ALSA CPU usage
Reported by: | mushm0m | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | avdevice |
Version: | git-master | Keywords: | alsa |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description (last modified by llogan)
I am on a Raspberry Pi Zero with a Pi Camera and an I2S MEMS mic, running the latest Raspbian Jessie. I am attempting to livestream audio and video. I've discovered that when I record ALSA audio with FFMPEG, that process takes 95% of my CPU and produces stuttering, unusable audio in the stream.
I isolated the issue to the following:
- ffmpeg -f alsa -acodec pcm_s32le -i mic_sv temp.wav
This single command takes 95% of my CPU.
- arecord -Dmic_sv -c2 -r48000 -fS32_LE -twav temp.wav
This single command takes <5% of my CPU.
I have added a custom hook to ffmpeg so we can see the PCM settings. I have ensured that settings (buffer size, period size, etc.) are identical between the two commands.
Here is the full console output of FFMPEG:
$ ALSA_CONFIG_PATH=/usr/share/alsa/alsa.conf:./dumpasoundrc /home/pi/special/ffmpeg/ffmpeg -v 9 -loglevel 99 -f alsa -acodec pcm_s32le -i michooks -f null - ffmpeg version git-2017-02-05-e57fd92 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.9.2 (Raspbian 4.9.2-10) configuration: --extra-cflags=-I/opt/vc/include/IL --enable-nonfree --enable-omx-rpi --logfile=CONFIG.TXT libavutil 55. 46.100 / 55. 46.100 libavcodec 57. 75.100 / 57. 75.100 libavformat 57. 66.101 / 57. 66.101 libavdevice 57. 2.100 / 57. 2.100 libavfilter 6. 73.100 / 6. 73.100 libswscale 4. 3.101 / 4. 3.101 libswresample 2. 4.100 / 2. 4.100 Splitting the commandline. Reading option '-v' ... matched as option 'v' (set logging level) with argument '9'. Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument '99'. Reading option '-f' ... matched as option 'f' (force format) with argument 'alsa'. Reading option '-acodec' ... matched as option 'acodec' (force audio codec ('copy' to copy stream)) with argument 'pcm_s32le'. Reading option '-i' ... matched as input url with argument 'michooks'. Reading option '-f' ... matched as option 'f' (force format) with argument 'null'. Reading option '-' ... matched as output url. Finished splitting the commandline. Parsing a group of options: global . Applying option v (set logging level) with argument 9. Successfully parsed a group of options. Parsing a group of options: input url michooks. Applying option f (force format) with argument alsa. Applying option acodec (force audio codec ('copy' to copy stream)) with argument pcm_s32le. Successfully parsed a group of options. Opening an input file: michooks. [ALSA Debug Hook] Installing hooks. Hooks PCM Slave: Soft volume PCM Control: Boost Capture Volume min_dB: -3 max_dB: 50 resolution: 256 Slave: Route conversion PCM Transformation table: 0 <- 0 1 <- 0 Slave: Mmap emulation PCM Slave: Hardware PCM card 1 'mems-mic' device 0 subdevice 0 [ALSA Debug Hook] hw_params: Hooks PCM Slave: Soft volume PCM Control: Boost Capture Volume min_dB: -3 max_dB: 50 resolution: 256 Its setup is: stream : CAPTURE access : RW_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Slave: Route conversion PCM Transformation table: 0 <- 0 1 <- 0 Its setup is: stream : CAPTURE access : MMAP_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Slave: Mmap emulation PCM Its setup is: stream : CAPTURE access : MMAP_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Slave: Hardware PCM card 1 'mems-mic' device 0 subdevice 0 Its setup is: stream : CAPTURE access : RW_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1073741824 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 appl_ptr : 0 hw_ptr : 0 [alsa @ 0x31ea450] All info found [alsa @ 0x31ea450] stream 0: start_time: 1487382765.290 duration: -9223372036854.775 [alsa @ 0x31ea450] format: start_time: 1487382765.290 duration: -9223372036854.775 bitrate=3072 kb/s Guessed Channel Layout for Input Stream #0.0 : stereo Input #0, alsa, from 'michooks': Duration: N/A, start: 1487382765.289814, bitrate: 3072 kb/s Stream #0:0, 1, 1/1000000: Audio: pcm_s32le, 48000 Hz, stereo, s32, 3072 kb/s Successfully opened the file. Parsing a group of options: output url -. Applying option f (force format) with argument null. Successfully parsed a group of options. Opening an output file: -. Successfully opened the file. detected 1 logical cores [graph_0_in_0_0 @ 0x31f9310] Setting 'time_base' to value '1/48000' [graph_0_in_0_0 @ 0x31f9310] Setting 'sample_rate' to value '48000' [graph_0_in_0_0 @ 0x31f9310] Setting 'sample_fmt' to value 's32' [graph_0_in_0_0 @ 0x31f9310] Setting 'channel_layout' to value '0x3' [graph_0_in_0_0 @ 0x31f9310] tb:1/48000 samplefmt:s32 samplerate:48000 chlayout:0x3 [format_out_0_0 @ 0x31f9f10] Setting 'sample_fmts' to value 's16' [format_out_0_0 @ 0x31f9f10] auto-inserting filter 'auto_resampler_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_0' [AVFilterGraph @ 0x31f9980] query_formats: 4 queried, 6 merged, 3 already done, 0 delayed [auto_resampler_0 @ 0x31faf10] [SWR @ 0x321e130] Using fltp internally between filters [auto_resampler_0 @ 0x31faf10] ch:2 chl:stereo fmt:s32 r:48000Hz -> ch:2 chl:stereo fmt:s16 r:48000Hz Output #0, null, to 'pipe:': Metadata: encoder : Lavf57.66.101 Stream #0:0, 0, 1/48000: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s Metadata: encoder : Lavc57.75.100 pcm_s16le Stream mapping: Stream #0:0 -> #0:0 (pcm_s32le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help cur_dts is invalid (this is harmless if it occurs once at the start per stream) size=N/A time=00:00:17.84 bitrate=N/A speed= 1x video:0kB audio:3345kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (michooks): Input stream #0:0 (audio): 26372 packets read (6851424 bytes); 26372 frames decoded (856428 samples); Total: 26372 packets (6851424 bytes) demuxed Output file #0 (pipe:): Output stream #0:0 (audio): 26372 frames encoded (856428 samples); 26372 packets muxed (3425712 bytes); Total: 26372 packets (3425712 bytes) muxed 26372 frames successfully decoded, 0 decoding errors Exiting normally, received signal 2.
Here is the console output of Arecord:
$ arecord -v -r48000 -c2 -fS32_LE --period-size=16384 --buffer-size=65536 -Dmic_sv -twav /dev/null Recording WAVE '/dev/null' : Signed 32 bit Little Endian, Rate 48000 Hz, Stereo Soft volume PCM Control: Boost Capture Volume min_dB: -3 max_dB: 50 resolution: 256 Its setup is: stream : CAPTURE access : RW_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Slave: Route conversion PCM Transformation table: 0 <- 0 1 <- 0 Its setup is: stream : CAPTURE access : MMAP_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Slave: Mmap emulation PCM Its setup is: stream : CAPTURE access : MMAP_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Slave: Hardware PCM card 1 'mems-mic' device 0 subdevice 0 Its setup is: stream : CAPTURE access : RW_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1073741824 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 appl_ptr : 0 hw_ptr : 0 ^CAborted by signal Interrupt...
After a long chat in the FFMPEG IRC channel, we believe there may be a bug in the way FFMPEG waits for ALSA packets. Arecord uses snd_pcm_wait() to wait for packets, and this command is not used in FFMPEG sources. Inserting -re before -i into my FFMPEG command reduces CPU usage to <5%, but is not the right approach, and causes other problems (bitrate is almost 0kbps, audio file generated is <5 kilobytes).
Change History (10)
comment:1 Changed 14 months ago by mushm0m
comment:2 Changed 14 months ago by llogan
- Component changed from avcodec to undetermined
- Description modified (diff)
- Keywords alsa added; ALSA removed
- Version changed from unspecified to git-master
comment:3 Changed 14 months ago by mushm0m
Two things I've noticed, unsure if significant, from the FFMPEG debug output I've shared:
- ALSA duration is reported as negative value - why?
- ALSA bitrate is reported as 3072kbps, which seems high (I only have a Pi Zero). Is this configurable? How do I figure out the ALSA bitrate my Arecord command is using, to see if they are different?
comment:4 Changed 5 months ago by cehoyos
- Resolution set to invalid
- Status changed from new to closed
Please reopen this ticket if the issue is also reproducible with the following command:
$ ffmpeg -f alsa -i mic_sv -acodec pcm_s32le temp.wav
comment:5 follow-up: ↓ 7 Changed 8 days ago by kainz
- Resolution invalid deleted
- Status changed from closed to reopened
So on a pi0w using a similar setup, but with a PCM line mic that i've got rigged to show up as a alsa PCM capture device @48000/s16le, I can absolutely reproduce the CPU usage issue when running ffmpeg -f alsa -i default -acodec pcm_s16le out.wav, which produces the following output:
`
ffmpeg -f alsa -i default -acodec pcm_s16le out.wav
ffmpeg version N-90737-gb3b3a3e3ce Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 6.3.0 (Raspbian 6.3.0-18+rpi1+deb9u1) 20170516
configuration: --prefix=/usr/local --arch=armhf --target-os=linux --enable-gpl --enable-mmal --enable-omx --enable-omx-rpi --enable-nonfree --enable-libopus
libavutil 56. 15.100 / 56. 15.100
libavcodec 58. 19.100 / 58. 19.100
libavformat 58. 13.100 / 58. 13.100
libavdevice 58. 4.100 / 58. 4.100
libavfilter 7. 17.100 / 7. 17.100
libswscale 5. 2.100 / 5. 2.100
libswresample 3. 2.100 / 3. 2.100
libpostproc 55. 2.100 / 55. 2.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, alsa, from 'default':
Duration: N/A, start: 1524040707.062083, bitrate: 1536 kb/s
Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, ? for help
Output #0, wav, to 'out.wav':
Metadata:
ISFT : Lavf58.13.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s
Metadata:
encoder : Lavc58.19.100 pcm_s16le
size= 893kB time=00:00:04.76 bitrate=1536.5kbits/s speed=1.01x
video:0kB audio:893kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.008531%
`
Looking at the general in/filter/out process, libavdevice/alsa_dec.c (http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavdevice/alsa_dec.c;h=c50ce715064ad20f21c7d3c41d1e3003c69fef3f;hb=refs/heads/master#l116) returns AVERROR(EAGAIN) when snd_pcm_readi does not have ready data. This triggers the root cause of this issue, which is that in most places in fftools/ffmpeg.c, when a loop receives EAGAIN, it immediately loops. This means that if you are running faster than the sample rate, you are going to be pegging the CPU looping in and out of audio_read_packet.
I suppose this behavior is not unique to alsa_dec.c. In fact, if we take a quick survey of other libavdevice implementations, we can see a difference of implementation choices --
- jack.c - here, a jack thread runs to deal with jackd's realtime callbacks to fill a FIFO that then gets 'polled' by audio_read_packet. If a packet is not available, the jack 'frontend' waits up to two seconds and then hardfails.
- oss.c - the lower layer file read error is returned directly. I presume this includes -EAGAIN from the driver interface.
- iec61883.c - raw1394 polls its file descriptor to wait.
- openal-dec.c - AVERROR(EAGAIN) pattern is used here.
- pulse_audio_dec.c - here we call into the pulseaudio main loop wait-on-the-event-thread and let it deal with things. Unless pulseaudio dies or the connection fails somehow (or the stream is terminated/deleted), this loop will not terminate.
Given a lack of consensus in the code base on how this loop is handled, I'd propose implementing use of snd_pcm_wait. That said, there's also a possible API shortcoming in general here -- should libavdevices/AVFormats implement a wait/ready/whats-my-pollfd callback and push this handling out somewhere more generic?
Barring the above, I'm going to drop a snd_pcm_wait(s->h, 16); into my alsa_dec.c and report back. That should induce a roughly 1/60th second wait if data is not yet available.
Presuming the above would work, any reccomendations/ideas on implementing a proper fix?
comment:6 Changed 8 days ago by kainz
So something about snd_pcm_wait seems to not be a solution on its own... For comparison though, using ffmpeg with the JACK input only ate about 2-7% CPU for each of jackd and ffmpeg, running jackd in 48000/s16LE mode, and using ffmpeg to write a 48000/S16LE PCM wav file. Still investigating...
comment:7 in reply to: ↑ 5 ; follow-up: ↓ 8 Changed 8 days ago by cehoyos
Replying to kainz:
So on a pi0w using a similar setup, but with a PCM line mic that i've got rigged to show up as a alsa PCM capture device @48000/s16le, I can absolutely reproduce the CPU usage issue when running ffmpeg -f alsa -i default -acodec pcm_s16le out.wav
Please confirm that the issue is not reproducible with arecord.
comment:8 in reply to: ↑ 7 Changed 7 days ago by kainz
Replying to cehoyos:
Replying to kainz:
So on a pi0w using a similar setup, but with a PCM line mic that i've got rigged to show up as a alsa PCM capture device @48000/s16le, I can absolutely reproduce the CPU usage issue when running ffmpeg -f alsa -i default -acodec pcm_s16le out.wav
Please confirm that the issue is not reproducible with arecord.
arecord -D hw:0 -c 2 -f S16_LE -r 48000 out.wav uses approximately 0.7-1% cpu on the pi0w, as does using -D default.
I get the pegged CPU on ffmpeg using either -i default or -i hw:0 on ffmpeg.
So, I just tested with pulseaudio as well now, ensuring that I have pulseaudio daemon's default sample rate set to 48k. This gives me ffmpeg -y -f pulse -i default -r 48000 out.wav results of 1.6% CPU use on ffmpeg and 0.7% cpu use on pulseaudio, as well.
comment:9 Changed 7 days ago by cehoyos
- Component changed from undetermined to avdevice
comment:10 Changed 5 days ago by kainz
Still digging... as a further comparison, using gstreamer, the following pipeline produces about 8% cpu use for me on the pi0:
gst-launch-1.0 -v alsasrc device=hw:0 ! audio/x-raw, format=S16LE, rate=48000 ! queue ! wavenc ! filesink location=out.wav
Apologies for any incorrect formatting, it's my first time submitting a ticket. Please let me know if I can provide additional info!