Opened 3 years ago

Last modified 3 years ago

#4984 new defect

ffmpeg amerge and amix filter delay when working with RTSP

Reported by: leogsa Owned by:
Priority: normal Component: undetermined
Version: unspecified Keywords: RTSP
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

ffmpeg amerge and amix filter delay

I need to take audio-streams from several IP cameras and merge them into
one stream, so that they would sound simaltaneousely.

I tried filter "amix": (for testing purposes I take audio-stream 2 times
from the same camera. yes, I tried 2 cameras - result is the same)

ffmpeg -i rtsp://user:pass@172.22.5.202 -i rtsp://user:pass@172.22.5.202
-map 0:a -map 1:a -filter_complex
amix=inputs=2:duration=first:dropout_transition=3 -ar 22050 -vn -f flv
rtmp://172.22.45.38:1935/live/stream1

result: I say "hello". And hear in speakers the first "hello" and in 1
second I hear the second "hello". Instead of hearing two "hello"'s
simaltaneousely.

also I tried filter "amerge":

ffmpeg -i rtsp://user:pass@172.22.5.202 -i rtsp://user:pass@172.22.5.202
-map 0:a -map 1:a -filter_complex amerge -ar 22050 -vn -f flv rtmp://
172.22.45.38:1935/live/stream1

result: the same as in the first example, but now I hear the first "hello"
in left speaker and in 1 second I hear the second "hello" in right speaker,
instead of hearing two "hello"'s in both speakers simaltaneousely.

Here is ful command-line output for both variants: amix:

ffmpeg -i rtsp://admin:12345@172.22.5.202 -i rtsp://

admin:12345 at 172.22.5.202 -map 0:a -map 1:a -filter_complex
amix=inputs=2:duration=longest:dropout_transition=0 -vn -ar 22050 -f flv
rtmp://172.22.45.38:1935/live/stream1 ffmpeg version N-76031-g9099079
Copyright (c) 2000-2015 the FFmpeg developers

built with gcc 4.4.7 (GCC) 20120313 (Red Hat 4.4.7-16)
configuration: --enable-gpl --enable-libx264 --enable-libmp3lame

--enable-nonfree --enable-version3

libavutil 55. 4.100 / 55. 4.100
libavcodec 57. 6.100 / 57. 6.100
libavformat 57. 4.100 / 57. 4.100
libavdevice 57. 0.100 / 57. 0.100
libavfilter 6. 11.100 / 6. 11.100
libswscale 4. 0.100 / 4. 0.100
libswresample 2. 0.100 / 2. 0.100
libpostproc 54. 0.100 / 54. 0.100

Input #0, rtsp, from 'rtsp://admin:12345@172.22.5.202':

Metadata:

title : Media Presentation

Duration: N/A, start: 0.032000, bitrate: N/A

Stream #0:0: Video: h264 (Baseline), yuv420p, 1280x720, 20 fps, 25

tbr, 90k tbn, 40 tbc

Stream #0:1: Audio: adpcm_g726, 8000 Hz, mono, s16, 16 kb/s
Stream #0:2: Data: none

Input #1, rtsp, from 'rtsp://admin:12345@172.22.5.202':

Metadata:

title : Media Presentation

Duration: N/A, start: 0.032000, bitrate: N/A

Stream #1:0: Video: h264 (Baseline), yuv420p, 1280x720, 20 fps, 25

tbr, 90k tbn, 40 tbc

Stream #1:1: Audio: adpcm_g726, 8000 Hz, mono, s16, 16 kb/s
Stream #1:2: Data: none

Output #0, flv, to 'rtmp://172.22.45.38:1935/live/stream1':

Metadata:

title : Media Presentation
encoder : Lavf57.4.100
Stream #0:0: Audio: mp3 (libmp3lame) ([2][0][0][0] / 0x0002), 22050

Hz, mono, fltp (default)

Metadata:

encoder : Lavc57.6.100 libmp3lame

Stream mapping:

Stream #0:1 (g726) -> amix:input0
Stream #1:1 (g726) -> amix:input1
amix -> Stream #0:0 (libmp3lame)

Press [q] to stop, ? for help
[rtsp @ 0x2689600] Thread message queue blocking; consider raising the

thread_queue_size option (current value: 8)

[rtsp @ 0x2727c60] Thread message queue blocking; consider raising the

thread_queue_size option (current value: 8)

[rtsp @ 0x2689600] max delay reached. need to consume packet
[NULL @ 0x268c500] RTP: missed 38 packets
[rtsp @ 0x2689600] max delay reached. need to consume packet
[NULL @ 0x268d460] RTP: missed 4 packets
[flv @ 0x2958360] Failed to update header with correct duration.
[flv @ 0x2958360] Failed to update header with correct filesize.
size= 28kB time=00:00:06.18 bitrate= 36.7kbits/s
video:0kB audio:24kB subtitle:0kB other streams:0kB global headers:0kB

muxing overhead: 16.331224%

and amerge:

# ffmpeg -i rtsp://admin:12345@172.22.5.202 -i rtsp://
admin:12345 at 172.22.5.202 -map 0:a -map 1:a -filter_complex amerge -vn -ar
22050 -f flv rtmp://172.22.45.38:1935/live/stream1

ffmpeg version N-76031-g9099079 Copyright (c) 2000-2015 the FFmpeg

developers

built with gcc 4.4.7 (GCC) 20120313 (Red Hat 4.4.7-16)
configuration: --enable-gpl --enable-libx264 --enable-libmp3lame

--enable-nonfree --enable-version3

libavutil 55. 4.100 / 55. 4.100
libavcodec 57. 6.100 / 57. 6.100
libavformat 57. 4.100 / 57. 4.100
libavdevice 57. 0.100 / 57. 0.100
libavfilter 6. 11.100 / 6. 11.100
libswscale 4. 0.100 / 4. 0.100
libswresample 2. 0.100 / 2. 0.100
libpostproc 54. 0.100 / 54. 0.100

Input #0, rtsp, from 'rtsp://admin:12345@172.22.5.202':

Metadata:

title : Media Presentation

Duration: N/A, start: 0.064000, bitrate: N/A

Stream #0:0: Video: h264 (Baseline), yuv420p, 1280x720, 20 fps, 25

tbr, 90k tbn, 40 tbc

Stream #0:1: Audio: adpcm_g726, 8000 Hz, mono, s16, 16 kb/s
Stream #0:2: Data: none

Input #1, rtsp, from 'rtsp://admin:12345@172.22.5.202':

Metadata:

title : Media Presentation

Duration: N/A, start: 0.032000, bitrate: N/A

Stream #1:0: Video: h264 (Baseline), yuv420p, 1280x720, 20 fps, 25

tbr, 90k tbn, 40 tbc

Stream #1:1: Audio: adpcm_g726, 8000 Hz, mono, s16, 16 kb/s
Stream #1:2: Data: none

[Parsed_amerge_0 @ 0x3069cc0] No channel layout for input 1
[Parsed_amerge_0 @ 0x3069cc0] Input channel layouts overlap: output

layout will be determined by the number of distinct input channels

Output #0, flv, to 'rtmp://172.22.45.38:1935/live/stream1':

Metadata:

title : Media Presentation
encoder : Lavf57.4.100
Stream #0:0: Audio: mp3 (libmp3lame) ([2][0][0][0] / 0x0002), 22050

Hz, stereo, s16p (default)

Metadata:

encoder : Lavc57.6.100 libmp3lame

Stream mapping:

Stream #0:1 (g726) -> amerge:in0
Stream #1:1 (g726) -> amerge:in1
amerge -> Stream #0:0 (libmp3lame)

Press [q] to stop, ? for help
[rtsp @ 0x2f71640] Thread message queue blocking; consider raising the

thread_queue_size option (current value: 8)

[rtsp @ 0x300fb40] Thread message queue blocking; consider raising the

thread_queue_size option (current value: 8)

[rtsp @ 0x2f71640] max delay reached. need to consume packet
[NULL @ 0x2f744a0] RTP: missed 18 packets
[flv @ 0x3058b00] Failed to update header with correct duration.
[flv @ 0x3058b00] Failed to update header with correct filesize.
size= 39kB time=00:00:04.54 bitrate= 70.2kbits/s
video:0kB audio:36kB subtitle:0kB other streams:0kB global headers:0kB

muxing overhead: 8.330614%

UPDATE 30 oct 2015: I found interesting detail when connecting 2 cameras
(they have different microphones and I hear the difference between them):
the order of "Hello"'s from different cams depends on the ORDER OF INPUTS.

with command

ffmpeg -i rtsp://cam2 -i rtsp://cam1 -map 0:a -map 1:a -filter_complex
amix=inputs=2:duration=longest:dropout_transition=0 -vn -ar 22050 -f flv
rtmp://172.22.45.38:1935/live/stream1

I hear "hello" from 1st cam and then in 1 second "hello" from 2nd cam.


with command

ffmpeg -i rtsp://cam1 -i rtsp://cam2 -map 0:a -map 1:a -filter_complex
amix=inputs=2:duration=longest:dropout_transition=0 -vn -ar 22050 -f flv
rtmp://172.22.45.38:1935/live/stream1

I hear "hello" from 2nd cam and then in 1 second "hello" from 1st cam.

So, As I understand - ffmpeg takes inputs not simaltaneousely, but in the
order of inputs given.

P.S. FILES are mixed and merged perfectly with same commands.

Change History (3)

comment:1 Changed 3 years ago by Cigaes

I believe this is expected. Neither amix nor amerge (preferred) take the input timestamps into account. Furthermore, the command line you use subtracts the initial timestamps of both streams, and since the capture do not start exactly at the same time, there is a shift.

I suspect that to get this working, you would need to use the -copyts option, then find a way of subtract the same initial timestamp to both streams, and finally use aresample to sync the audio to its timestamps.

comment:2 Changed 3 years ago by leogsa

Cigaes, thank yoou for the answer.

Why then whis command works perfectly with files? and why the difference is so big - 1 second? Does it mean that ffmpeg begins taking data from 1 camera and only in 1 second from 2nd one?

comment:3 Changed 3 years ago by Cigaes

I can not observe your files, I can only assume that they start at the same instant.

As for the recording from the cameras, this depends on the cameras themselves on top of the probings performed by ffmpeg. You can try to see the delay when you run ffmpeg to read from a single camera, between the instant you validate the command and the instant it starts printing progress.

Note: See TracTickets for help on using tickets.