Opened 7 years ago
Last modified 3 years ago
#6156 reopened defect
Issue with FFMPEG ALSA CPU usage
Reported by: | mushm0m | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | ffmpeg |
Version: | git-master | Keywords: | alsa |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description (last modified by )
I am on a Raspberry Pi Zero with a Pi Camera and an I2S MEMS mic, running the latest Raspbian Jessie. I am attempting to livestream audio and video. I've discovered that when I record ALSA audio with FFMPEG, that process takes 95% of my CPU and produces stuttering, unusable audio in the stream.
I isolated the issue to the following:
ffmpeg -f alsa -acodec pcm_s32le -i mic_sv temp.wav
This single command takes 95% of my CPU.
arecord -Dmic_sv -c2 -r48000 -fS32_LE -twav temp.wav
This single command takes <5% of my CPU.
I have added a custom hook to ffmpeg so we can see the PCM settings. I have ensured that settings (buffer size, period size, etc.) are identical between the two commands.
Here is the full console output of FFMPEG:
$ ALSA_CONFIG_PATH=/usr/share/alsa/alsa.conf:./dumpasoundrc /home/pi/special/ffmpeg/ffmpeg -v 9 -loglevel 99 -f alsa -acodec pcm_s32le -i michooks -f null - ffmpeg version git-2017-02-05-e57fd92 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.9.2 (Raspbian 4.9.2-10) configuration: --extra-cflags=-I/opt/vc/include/IL --enable-nonfree --enable-omx-rpi --logfile=CONFIG.TXT libavutil 55. 46.100 / 55. 46.100 libavcodec 57. 75.100 / 57. 75.100 libavformat 57. 66.101 / 57. 66.101 libavdevice 57. 2.100 / 57. 2.100 libavfilter 6. 73.100 / 6. 73.100 libswscale 4. 3.101 / 4. 3.101 libswresample 2. 4.100 / 2. 4.100 Splitting the commandline. Reading option '-v' ... matched as option 'v' (set logging level) with argument '9'. Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument '99'. Reading option '-f' ... matched as option 'f' (force format) with argument 'alsa'. Reading option '-acodec' ... matched as option 'acodec' (force audio codec ('copy' to copy stream)) with argument 'pcm_s32le'. Reading option '-i' ... matched as input url with argument 'michooks'. Reading option '-f' ... matched as option 'f' (force format) with argument 'null'. Reading option '-' ... matched as output url. Finished splitting the commandline. Parsing a group of options: global . Applying option v (set logging level) with argument 9. Successfully parsed a group of options. Parsing a group of options: input url michooks. Applying option f (force format) with argument alsa. Applying option acodec (force audio codec ('copy' to copy stream)) with argument pcm_s32le. Successfully parsed a group of options. Opening an input file: michooks. [ALSA Debug Hook] Installing hooks. Hooks PCM Slave: Soft volume PCM Control: Boost Capture Volume min_dB: -3 max_dB: 50 resolution: 256 Slave: Route conversion PCM Transformation table: 0 <- 0 1 <- 0 Slave: Mmap emulation PCM Slave: Hardware PCM card 1 'mems-mic' device 0 subdevice 0 [ALSA Debug Hook] hw_params: Hooks PCM Slave: Soft volume PCM Control: Boost Capture Volume min_dB: -3 max_dB: 50 resolution: 256 Its setup is: stream : CAPTURE access : RW_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Slave: Route conversion PCM Transformation table: 0 <- 0 1 <- 0 Its setup is: stream : CAPTURE access : MMAP_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Slave: Mmap emulation PCM Its setup is: stream : CAPTURE access : MMAP_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Slave: Hardware PCM card 1 'mems-mic' device 0 subdevice 0 Its setup is: stream : CAPTURE access : RW_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1073741824 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 appl_ptr : 0 hw_ptr : 0 [alsa @ 0x31ea450] All info found [alsa @ 0x31ea450] stream 0: start_time: 1487382765.290 duration: -9223372036854.775 [alsa @ 0x31ea450] format: start_time: 1487382765.290 duration: -9223372036854.775 bitrate=3072 kb/s Guessed Channel Layout for Input Stream #0.0 : stereo Input #0, alsa, from 'michooks': Duration: N/A, start: 1487382765.289814, bitrate: 3072 kb/s Stream #0:0, 1, 1/1000000: Audio: pcm_s32le, 48000 Hz, stereo, s32, 3072 kb/s Successfully opened the file. Parsing a group of options: output url -. Applying option f (force format) with argument null. Successfully parsed a group of options. Opening an output file: -. Successfully opened the file. detected 1 logical cores [graph_0_in_0_0 @ 0x31f9310] Setting 'time_base' to value '1/48000' [graph_0_in_0_0 @ 0x31f9310] Setting 'sample_rate' to value '48000' [graph_0_in_0_0 @ 0x31f9310] Setting 'sample_fmt' to value 's32' [graph_0_in_0_0 @ 0x31f9310] Setting 'channel_layout' to value '0x3' [graph_0_in_0_0 @ 0x31f9310] tb:1/48000 samplefmt:s32 samplerate:48000 chlayout:0x3 [format_out_0_0 @ 0x31f9f10] Setting 'sample_fmts' to value 's16' [format_out_0_0 @ 0x31f9f10] auto-inserting filter 'auto_resampler_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_0' [AVFilterGraph @ 0x31f9980] query_formats: 4 queried, 6 merged, 3 already done, 0 delayed [auto_resampler_0 @ 0x31faf10] [SWR @ 0x321e130] Using fltp internally between filters [auto_resampler_0 @ 0x31faf10] ch:2 chl:stereo fmt:s32 r:48000Hz -> ch:2 chl:stereo fmt:s16 r:48000Hz Output #0, null, to 'pipe:': Metadata: encoder : Lavf57.66.101 Stream #0:0, 0, 1/48000: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s Metadata: encoder : Lavc57.75.100 pcm_s16le Stream mapping: Stream #0:0 -> #0:0 (pcm_s32le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help cur_dts is invalid (this is harmless if it occurs once at the start per stream) size=N/A time=00:00:17.84 bitrate=N/A speed= 1x video:0kB audio:3345kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (michooks): Input stream #0:0 (audio): 26372 packets read (6851424 bytes); 26372 frames decoded (856428 samples); Total: 26372 packets (6851424 bytes) demuxed Output file #0 (pipe:): Output stream #0:0 (audio): 26372 frames encoded (856428 samples); 26372 packets muxed (3425712 bytes); Total: 26372 packets (3425712 bytes) muxed 26372 frames successfully decoded, 0 decoding errors Exiting normally, received signal 2.
Here is the console output of Arecord:
$ arecord -v -r48000 -c2 -fS32_LE --period-size=16384 --buffer-size=65536 -Dmic_sv -twav /dev/null Recording WAVE '/dev/null' : Signed 32 bit Little Endian, Rate 48000 Hz, Stereo Soft volume PCM Control: Boost Capture Volume min_dB: -3 max_dB: 50 resolution: 256 Its setup is: stream : CAPTURE access : RW_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Slave: Route conversion PCM Transformation table: 0 <- 0 1 <- 0 Its setup is: stream : CAPTURE access : MMAP_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Slave: Mmap emulation PCM Its setup is: stream : CAPTURE access : MMAP_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 Slave: Hardware PCM card 1 'mems-mic' device 0 subdevice 0 Its setup is: stream : CAPTURE access : RW_INTERLEAVED format : S32_LE subformat : STD channels : 2 rate : 48000 exact rate : 48000 (48000/1) msbits : 32 buffer_size : 65536 period_size : 16384 period_time : 341333 tstamp_mode : NONE period_step : 1 avail_min : 16384 period_event : 0 start_threshold : 1073741824 stop_threshold : 65536 silence_threshold: 0 silence_size : 0 boundary : 1073741824 appl_ptr : 0 hw_ptr : 0 ^CAborted by signal Interrupt...
After a long chat in the FFMPEG IRC channel, we believe there may be a bug in the way FFMPEG waits for ALSA packets. Arecord uses snd_pcm_wait()
to wait for packets, and this command is not used in FFMPEG sources. Inserting -re
before -i
into my FFMPEG command reduces CPU usage to <5%, but is not the right approach, and causes other problems (bitrate is almost 0kbps, audio file generated is <5 kilobytes).
Change History (15)
comment:1 by , 7 years ago
comment:2 by , 7 years ago
Component: | avcodec → undetermined |
---|---|
Description: | modified (diff) |
Keywords: | alsa added; ALSA removed |
Version: | unspecified → git-master |
comment:3 by , 7 years ago
Two things I've noticed, unsure if significant, from the FFMPEG debug output I've shared:
- ALSA duration is reported as negative value - why?
- ALSA bitrate is reported as 3072kbps, which seems high (I only have a Pi Zero). Is this configurable? How do I figure out the ALSA bitrate my Arecord command is using, to see if they are different?
comment:4 by , 6 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Please reopen this ticket if the issue is also reproducible with the following command:
$ ffmpeg -f alsa -i mic_sv -acodec pcm_s32le temp.wav
follow-up: 7 comment:5 by , 5 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
So on a pi0w using a similar setup, but with a PCM line mic that i've got rigged to show up as a alsa PCM capture device @48000/s16le, I can absolutely reproduce the CPU usage issue when running ffmpeg -f alsa -i default -acodec pcm_s16le out.wav
, which produces the following output:
`
ffmpeg -f alsa -i default -acodec pcm_s16le out.wav
ffmpeg version N-90737-gb3b3a3e3ce Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 6.3.0 (Raspbian 6.3.0-18+rpi1+deb9u1) 20170516
configuration: --prefix=/usr/local --arch=armhf --target-os=linux --enable-gpl --enable-mmal --enable-omx --enable-omx-rpi --enable-nonfree --enable-libopus
libavutil 56. 15.100 / 56. 15.100
libavcodec 58. 19.100 / 58. 19.100
libavformat 58. 13.100 / 58. 13.100
libavdevice 58. 4.100 / 58. 4.100
libavfilter 7. 17.100 / 7. 17.100
libswscale 5. 2.100 / 5. 2.100
libswresample 3. 2.100 / 3. 2.100
libpostproc 55. 2.100 / 55. 2.100
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, alsa, from 'default':
Duration: N/A, start: 1524040707.062083, bitrate: 1536 kb/s
Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
Press [q] to stop, ? for help
Output #0, wav, to 'out.wav':
Metadata:
ISFT : Lavf58.13.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s
Metadata:
encoder : Lavc58.19.100 pcm_s16le
size= 893kB time=00:00:04.76 bitrate=1536.5kbits/s speed=1.01x
video:0kB audio:893kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.008531%
`
Looking at the general in/filter/out process, libavdevice/alsa_dec.c
(http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavdevice/alsa_dec.c;h=c50ce715064ad20f21c7d3c41d1e3003c69fef3f;hb=refs/heads/master#l116) returns AVERROR(EAGAIN)
when snd_pcm_readi
does not have ready data. This triggers the root cause of this issue, which is that in most places in fftools/ffmpeg.c
, when a loop receives EAGAIN, it immediately loops. This means that if you are running faster than the sample rate, you are going to be pegging the CPU looping in and out of audio_read_packet.
I suppose this behavior is not unique to alsa_dec.c
. In fact, if we take a quick survey of other libavdevice implementations, we can see a difference of implementation choices --
jack.c
- here, a jack thread runs to deal with jackd's realtime callbacks to fill a FIFO that then gets 'polled' by audio_read_packet. If a packet is not available, the jack 'frontend' waits up to two seconds and then hardfails.oss.c
- the lower layer file read error is returned directly. I presume this includes -EAGAIN from the driver interface.iec61883.c
- raw1394 polls its file descriptor to wait.openal-dec.c
-AVERROR(EAGAIN)
pattern is used here.pulse_audio_dec.c
- here we call into the pulseaudio main loop wait-on-the-event-thread and let it deal with things. Unless pulseaudio dies or the connection fails somehow (or the stream is terminated/deleted), this loop will not terminate.
Given a lack of consensus in the code base on how this loop is handled, I'd propose implementing use of snd_pcm_wait
. That said, there's also a possible API shortcoming in general here -- should libavdevices/AVFormats implement a wait/ready/whats-my-pollfd callback and push this handling out somewhere more generic?
Barring the above, I'm going to drop a snd_pcm_wait(s->h, 16);
into my alsa_dec.c and report back. That should induce a roughly 1/60th second wait if data is not yet available.
Presuming the above would work, any reccomendations/ideas on implementing a proper fix?
comment:6 by , 5 years ago
So something about snd_pcm_wait seems to not be a solution on its own... For comparison though, using ffmpeg with the JACK input only ate about 2-7% CPU for each of jackd and ffmpeg, running jackd in 48000/s16LE mode, and using ffmpeg to write a 48000/S16LE PCM wav file. Still investigating...
follow-up: 8 comment:7 by , 5 years ago
Replying to kainz:
So on a pi0w using a similar setup, but with a PCM line mic that i've got rigged to show up as a alsa PCM capture device @48000/s16le, I can absolutely reproduce the CPU usage issue when running
ffmpeg -f alsa -i default -acodec pcm_s16le out.wav
Please confirm that the issue is not reproducible with arecord
.
comment:8 by , 5 years ago
Replying to cehoyos:
Replying to kainz:
So on a pi0w using a similar setup, but with a PCM line mic that i've got rigged to show up as a alsa PCM capture device @48000/s16le, I can absolutely reproduce the CPU usage issue when running
ffmpeg -f alsa -i default -acodec pcm_s16le out.wav
Please confirm that the issue is not reproducible with
arecord
.
arecord -D hw:0 -c 2 -f S16_LE -r 48000 out.wav
uses approximately 0.7-1% cpu on the pi0w, as does using -D default
.
I get the pegged CPU on ffmpeg using either -i default
or -i hw:0
on ffmpeg.
So, I just tested with pulseaudio as well now, ensuring that I have pulseaudio daemon's default sample rate set to 48k. This gives me ffmpeg -y -f pulse -i default -r 48000 out.wav
results of 1.6% CPU use on ffmpeg and 0.7% cpu use on pulseaudio, as well.
comment:9 by , 5 years ago
Component: | undetermined → avdevice |
---|
comment:10 by , 5 years ago
Still digging... as a further comparison, using gstreamer, the following pipeline produces about 8% cpu use for me on the pi0:
gst-launch-1.0 -v alsasrc device=hw:0 ! audio/x-raw, format=S16LE, rate=48000 ! queue ! wavenc ! filesink location=out.wav
follow-up: 12 comment:11 by , 5 years ago
So after having done some profiling runs on ffmpeg, I think it's just ffmpeg's processing loop causing most of the cpu use in this case, via smaller buffers (but large enough to make malloc/free expensive) and an accordingly larger amount of allocations.
I added some debugging code to print the selected ALSA pcm hw buffer size, and compared profiling runs between ffmpeg with alsa-direct and ffmpeg with alsa-jack. What happens is on the raspberrypi hardware, the i2s/pcm input has a 128KB buffer, and ffmpeg ends up allocating/deallocating this on every frame. For comparison, the alsapulse plugin ends up using a 1MB buffer, but provides larger frames, so the allocations dont happen as often.
Given all of the interactions above, I'm not really sure how best to mitigate this, and my usecase was aggravated by trying to combine the above with a concurrent v4l2 capture, which has proven challenging for other people with ffmpeg in the past.
If the above behavior is considered a design bug, I suppose I could help find and/or test a solution to it, but I think the fix would end up going deep into the ffmpeg/libavcodec/transcoding code base, probably more than is comfortable. Otherwise, I suppose this is really an unfortunate sideeffect of the hardware implementation.
An easy mitigation could be to add a commandline argument(s?) to tell the alsa demuxer to limit the buffer(or periods etc) sizes to better match the platform vs what the drivers report as the maximum permissible values.
Thoughts?
follow-up: 13 comment:12 by , 5 years ago
Replying to kainz:
So after having done some profiling runs on ffmpeg, I think it's just ffmpeg's processing loop causing most of the cpu use in this case, via smaller buffers (but large enough to make malloc/free expensive) and an accordingly larger amount of allocations.
Does increasing the buffer size fixes the issue for you?
comment:13 by , 5 years ago
Replying to cehoyos:
Does increasing the buffer size fixes the issue for you?
Oddly enough, I was seeing the opposite. I needed to reduce the size of the buffer to fix the issue. I'm not where I can test easily right now, but after 6/3 I'll have access to what i'd need to test with various buffer sizes to show its effect on CPU.
comment:14 by , 4 years ago
Hi Cehoyos,
I am experiencing the same issue on a normal Ubuntu 18 install when capturing from an ALSA capture device. I have multiple MIC inputs and for each FFMPEG process it consumes 100% CPU using the latest FFMpeg 4 build. where as arecord consumes less than 5% per process.
Could I request if it would be possible to add a small bounty towards finding a solution to this bug? If so please advise which platform.
comment:15 by , 3 years ago
Component: | avdevice → ffmpeg |
---|
Hello,
I have the same issue with no audio by this command on my raspberry pi Zero W:
ffmpeg -t 00:01:00 -f alsa -ar 44100 -ac 2 -acodec pcm_s32le -i plughw:1,0 -y output.aac
Apologies for any incorrect formatting, it's my first time submitting a ticket. Please let me know if I can provide additional info!