Opened 3 years ago

Last modified 6 months ago

#9009 new defect

SIGINT ignored while awaiting RTP data

Reported by: Juan Navarro Owned by:
Priority: normal Component: ffmpeg
Version: git-master Keywords: sigint, exit
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

I'm writing a recorder based on FFmpeg (on Ubuntu linux), which is controlled with system signals, so it gets sent SIGINT when it must stop and close the output file. In a working scenario, FFmpeg helpfully processes SIGINT (and SIGTERM) as a request to perform a clean exit.

However, if the RTP sender stops before the FFmpeg process has been stopped, then FFmpeg does not process the SIGINT or SIGTERM signals.

I'll put here both examples for working scenario with expected behavior, and failing scenario with wrong behavior.

Working scenario

STEP 1 - Terminal 1: simulate an RTP sender.

$ ffmpeg -re -i test_video -an -c:v vp8 -f rtp -sdp_file test.sdp "rtp://127.0.0.1:5004?rtcpport=5005"

Command output:

ffmpeg version git-2020-11-17-bb7bb44 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.12) 20160609
  configuration: --prefix=/opt/ffmpeg --enable-gpl --enable-version3 --enable-nonfree --enable-static --disable-shared --enable-ffmpeg --enable-ffplay --enable-sdl2 --disable-libxcb --disable-sndio --disable-xlib --enable-ffprobe --disable-doc --enable-libfdk-aac --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopus --disable-librtmp --disable-libv4l2 --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --disable-lzma
  libavutil      56. 60.100 / 56. 60.100
  libavcodec     58.112.103 / 58.112.103
  libavformat    58. 64.100 / 58. 64.100
  libavdevice    58. 11.102 / 58. 11.102
  libavfilter     7. 90.100 /  7. 90.100
  libswscale      5.  8.100 /  5.  8.100
  libswresample   3.  8.100 /  3.  8.100
  libpostproc    55.  8.100 / 55.  8.100
Input #0, matroska,webm, from '/home/juan/videos_NOBACKUP/video.webm':
  Metadata:
    encoder         : Lavf56.40.101
  Duration: 00:04:32.44, start: -0.007000, bitrate: 672 kb/s
    Stream #0:0: Video: vp8, yuv420p(progressive), 640x360, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 1k tbc (default)
    Stream #0:1(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
Stream mapping:
  Stream #0:0 -> #0:0 (vp8 (native) -> vp8 (libvpx))
Press [q] to stop, [?] for help
[libvpx @ 0x360cb00] v1.5.0
[libvpx @ 0x360cb00] Neither bitrate nor constrained quality specified, using default CRF of 32 and bitrate of 256kbit/sec
Output #0, rtp, to 'rtp://127.0.0.1:5004?rtcpport=5005':
  Metadata:
    encoder         : Lavf58.64.100
    Stream #0:0: Video: vp8 (libvpx), yuv420p(tv, bt470bg/unknown/unknown, progressive), 640x360 [SAR 1:1 DAR 16:9], q=-1--1, 256 kb/s, 23.98 fps, 90k tbn, 23.98 tbc (default)
    Metadata:
      encoder         : Lavc58.112.103 libvpx
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
frame=  392 fps= 24 q=0.0 Lsize=     433kB time=00:00:16.30 bitrate= 217.6kbits/s speed=0.986x

Contents of test.sdp:

$ cat test.sdp
v=0
o=- 0 0 IN IP4 127.0.0.1
s=No Name
c=IN IP4 127.0.0.1
t=0 0
a=tool:libavformat 58.64.100
m=video 5004 RTP/AVP 96
b=AS:256
a=rtpmap:96 VP8/90000

(Note that test.sdp does not contain a=rtcp:5005, which makes it impossible to use any other custom port; this is a related bug: https://trac.ffmpeg.org/ticket/8474)

STEP 2 - Terminal 2: FFmpeg RTP recorder.

$ ffmpeg -nostdin -protocol_whitelist file,rtp,udp -i test.sdp -map 0:v:0 -c:v copy -f webm -flags +global_header -y test.webm

Command output:

ffmpeg version git-2020-11-17-bb7bb44 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.12) 20160609
  configuration: --prefix=/opt/ffmpeg --enable-gpl --enable-version3 --enable-nonfree --enable-static --disable-shared --enable-ffmpeg --enable-ffplay --enable-sdl2 --disable-libxcb --disable-sndio --disable-xlib --enable-ffprobe --disable-doc --enable-libfdk-aac --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopus --disable-librtmp --disable-libv4l2 --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --disable-lzma
  libavutil      56. 60.100 / 56. 60.100
  libavcodec     58.112.103 / 58.112.103
  libavformat    58. 64.100 / 58. 64.100
  libavdevice    58. 11.102 / 58. 11.102
  libavfilter     7. 90.100 /  7. 90.100
  libswscale      5.  8.100 /  5.  8.100
  libswresample   3.  8.100 /  3.  8.100
  libpostproc    55.  8.100 / 55.  8.100
[sdp @ 0x308e680] Keyframe missing
Input #0, sdp, from 'test.sdp':
  Metadata:
    title           : No Name
  Duration: N/A, start: 0.458789, bitrate: N/A
    Stream #0:0: Video: vp8, yuv420p(tv, bt470bg/unknown/unknown), 640x360, 23.98 fps, 23.98 tbr, 90k tbn, 90k tbc
Output #0, webm, to 'test.webm':
  Metadata:
    title           : No Name
    encoder         : Lavf58.64.100
    Stream #0:0: Video: vp8, yuv420p(tv, bt470bg/unknown/unknown), 640x360, q=2-31, 23.98 fps, 23.98 tbr, 1k tbn, 90k tbc
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
frame=  138 fps= 28 q=-1.0 Lsize=     162kB time=00:00:05.71 bitrate= 232.4kbits/s speed=1.17x

STEP 3 - Terminal 3: send SIGINT to FFmpeg recorder.

$ pkill -SIGINT -f 'ffmpeg -nostdin'

This causes the expected behavior of an immediate clean exit in Terminal 2:

video:161kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.948426%
Exiting normally, received signal 2.

Failing scenario

Following these steps is how to reproduce the bug:

STEP 1 - Terminal 1: simulate an RTP sender (same as before, so no command output included because it is the same).

$ ffmpeg -re -i test_video -an -c:v vp8 -f rtp -sdp_file test.sdp "rtp://127.0.0.1:5004?rtcpport=5005"

STEP 2 - Terminal 2: FFmpeg RTP recorder. This time -report is added because this command is the one showing the issue.
-listen_timeout is also used to avoid timeoputs due to no RTP activity.

$ ffmpeg -nostdin -report -protocol_whitelist file,rtp,udp -listen_timeout 999 -i test.sdp -map 0:v:0 -c:v copy -f webm -flags +global_header -y test.webm

Command output:

ffmpeg started on 2020-11-25 at 13:33:50
Report written to "ffmpeg-20201125-134337.log"
Log level: 48
ffmpeg version git-2020-11-17-bb7bb44 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.12) 20160609
  configuration: --prefix=/opt/ffmpeg --enable-gpl --enable-version3 --enable-nonfree --enable-static --disable-shared --enable-ffmpeg --enable-ffplay --enable-sdl2 --disable-libxcb --disable-sndio --disable-xlib --enable-ffprobe --disable-doc --enable-libfdk-aac --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopus --disable-librtmp --disable-libv4l2 --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --disable-lzma
  libavutil      56. 60.100 / 56. 60.100
  libavcodec     58.112.103 / 58.112.103
  libavformat    58. 64.100 / 58. 64.100
  libavdevice    58. 11.102 / 58. 11.102
  libavfilter     7. 90.100 /  7. 90.100
  libswscale      5.  8.100 /  5.  8.100
  libswresample   3.  8.100 /  3.  8.100
  libpostproc    55.  8.100 / 55.  8.100
[sdp @ 0x3b9d000] Keyframe missing
Input #0, sdp, from 'test.sdp':
  Metadata:
    title           : No Name
  Duration: N/A, start: 2.544211, bitrate: N/A
    Stream #0:0: Video: vp8, yuv420p(tv, bt470bg/unknown/unknown), 640x360, 23.98 fps, 23.98 tbr, 90k tbn, 90k tbc
Output #0, webm, to 'test.webm':
  Metadata:
    title           : No Name
    encoder         : Lavf58.64.100
    Stream #0:0: Video: vp8, yuv420p(tv, bt470bg/unknown/unknown), 640x360, q=2-31, 23.98 fps, 23.98 tbr, 1k tbn, 90k tbc
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
frame=  288 fps= 26 q=-1.0 size=     256kB time=00:00:11.97 bitrate= 175.2kbits/s speed=1.08x

STEP 3 - Terminal 1: Stop the RTP sender (Ctrl+C)

video:618kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.820318%
Exiting normally, received signal 2.

STEP 4 - Terminal 3: send SIGINT to FFmpeg recorder.

$ pkill -SIGINT -f 'ffmpeg -nostdin'

This causes the wrong behavior of the FFmpeg RTP Recorder ignoring the signal and not exiting cleanly.

The FFmpeg recorder will ignore the SIGINT and will stay there until the listen_timeout is elapsed.

Tested also with removing -nostdin from the FFmpeg recorder, and instead trying to stop the process by sending a q through stdin, but it doesn't work either, the "q" also gets ignored like the system signals.

Report file ffmpeg-20201125-134337.log attached here (although it doesn't seem to contain useful data, the sdp demuxer seems to be simply waiting for incoming data until the listen_timeout is elapsed)

Attachments (1)

ffmpeg-20201125-134337.log (53.6 KB ) - added by Juan Navarro 3 years ago.
Full, uncut console output

Download all attachments as: .zip

Change History (14)

by Juan Navarro, 3 years ago

Attachment: ffmpeg-20201125-134337.log added

Full, uncut console output

comment:1 by Juan Navarro, 3 years ago

This can also be reproduced with latest development branch as of right now:
ffmpeg version git-2020-11-25-3de3d2f

comment:2 by Balling, 3 years ago

In that case triple Ctrl-C or maybe Ctrl + \ (SIGQUIT) should work, no?

Just saying.

in reply to:  2 comment:3 by Juan Navarro, 3 years ago

Replying to Balling:

In that case triple Ctrl-C or maybe Ctrl + \ (SIGQUIT) should work, no?

It does; at least on the manual test, the second Ctrl-C kills the FFmpeg process. But I see doing that a bit (a lot) like sweeping the underlying problem under a carpet... a big difference between letting FFmpeg know it should close, so it has time to adequately flush the output file, write the ending metadata (e.g. MP4 moov), etc; and simply killing it, which could render an unreadable output file.

Also note the commands I provided are a simplified, minimal, reproducible example, not the actual use case. In reality there is a Node.js process in control of FFmpeg, among other things; it sends SIGINT to external processes, expecting them to exit and doing some additional cleanup on the process' exit events. But the event never arrives from the FFmpeg process, because it got itself into a soft-lock, and that's probably a bug caused by some internal control loop not acting as intended.

comment:4 by Andriy Gelman, 3 years ago

@juannm, do you want to try this branch?
https://github.com/talih0/FFmpeg/tree/signal

There are two commit:

  • first replaces signal() by sigaction() on linux
  • second commit is just an idea to exit demuxers after the first SIGINT. Otherwise, two SIGINTs will be needed.. (which also caused errors writing muxer's trailer)

in reply to:  4 comment:5 by Juan Navarro, 3 years ago

Replying to taliho:

@juannm, do you want to try this branch?
https://github.com/talih0/FFmpeg/tree/signal

There are two commit:

  • first replaces signal() by sigaction() on linux
  • second commit is just an idea to exit demuxers after the first SIGINT. Otherwise, two SIGINTs will be needed.. (which also caused errors writing muxer's trailer)

Yes, of course I can/want to test :)

With the changes on the "signal" branch, I can confirm the issue does not happen anymore. In both the "Working scenario" and the "Failing scenario", as well as the "real" more complex scenario of my application, the FFmpeg recorder process correctly catches the SIGINT and exits with an output such as this:

test.sdp: Immediate exit requested
frame=  189 fps= 27 q=-1.0 Lsize=     224kB time=00:00:07.84 bitrate= 234.3kbits/s speed=1.12x    
video:222kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.868649%
Exiting normally, received signal 2.

(I'm including only the final portion of the output, because all the previous output seems to be identical to already provided command output logs)

comment:6 by Andriy Gelman, 3 years ago

thanks for testing. I'll send the patch to the ML

in reply to:  6 ; comment:7 by Juan Navarro, 3 years ago

Replying to Andriy Gelman:

thanks for testing. I'll send the patch to the ML

Hi, was a patch sent to the ML in the end? I'm resuming work on the project that got affected by this bug, so I'd like to know if this report can be closed, but I haven't been able to find any related discussion in the mailing list.

I have re-tested with latest release as of today (version 4.4) as well as with the latest git code (version "git-2021-05-11-d5e575e"), but the problem persists. Doesn't look like the fix got merged.

If this was discussed indeed, could you please comment with a link to the conversation?

in reply to:  7 ; comment:8 by Andriy Gelman, 3 years ago

If this was discussed indeed, could you please comment with a link to the conversation?

Sorry I missed your message. Here's a link to patchwork:
https://patchwork.ffmpeg.org/project/ffmpeg/patch/20201128194654.33756-2-andriy.gelman@gmail.com/

in reply to:  8 ; comment:9 by Juan Navarro, 3 years ago

Replying to Andriy Gelman:

If this was discussed indeed, could you please comment with a link to the conversation?

Sorry I missed your message. Here's a link to patchwork:
https://patchwork.ffmpeg.org/project/ffmpeg/patch/20201128194654.33756-2-andriy.gelman@gmail.com/

Thank you. I read the answer by Michael Niedermayer, but didn't understand the implications of what they say. Is it a rejection of the patch?

Also according to the comment, do you think the oficial method to stop a FFmpeg process is to always send two signals, and that's guaranteed to work? As far as user experience goes, it is a bit weird, but an easy workaround nonetheless.

in reply to:  9 ; comment:10 by Andriy Gelman, 3 years ago

Thank you. I read the answer by Michael Niedermayer, but didn't understand the implications of what they say. Is it a rejection of the patch?

Also according to the comment, do you think the oficial method to stop a FFmpeg process is to always send two signals, and that's guaranteed to work? As far as user experience goes, it is a bit weird, but an easy workaround nonetheless.

The way it's done now is that we exit with one SIGINT during transcode() function (ffmpeg.c:
4803) - assuming we are not stuck somewhere.

If the patch is added, the exit can actually be triggered in the muxer somewhere during the ff_check_interrupt() call, even if we are not in an infinite loop. This behavior is a bit different, and I guess you can potentially lose some data on the input side (at least that's how I understood Michael's answer).

I think what Michael was proposing was to add a smart way to detect if the muxer is actually stuck and decide whether to exit accordingly.

Last edited 3 years ago by Andriy Gelman (previous) (diff)

in reply to:  10 comment:11 by Michael Niedermayer, 3 years ago

Replying to Andriy Gelman:

If the patch is added, the exit can actually be triggered in the muxer somewhere during the ff_check_interrupt() call, even if we are not in an infinite loop. This behavior is a bit different, and I guess you can potentially lose some data on the input side (at least that's how I understood Michael's answer).

I think what Michael was proposing was to add a smart way to detect if the muxer is actually stuck and decide whether to exit accordingly.

yes

comment:12 by Anh Trinh, 6 months ago

I encountered the same issue as well.

My usecase is: ffmpeg process receive packets sent from browser MediaRecorder via websocket, recording as HLS .ts 6s segments.

However, when I stop MediaRecorder, and then send and SIGTERM to ffmpeg, it just hang there. Then I wait about couple of seconds and send SIGKILL, but then the last segments failed to be written.

Would be nice if this can be fixed.

comment:13 by Juan Navarro, 6 months ago

Bug reporter here. I'd just want to mention that for the internal project that needed it, I've been successfully using the patched branch provided in comment:4, without issues, for the last 2 years.

My real-world use case has been a program that records incoming RTP, with a command that looks like this:

ffmpeg \
    -protocol_whitelist file,rtp,udp \
    -fflags +discardcorrupt+genpts \
    -use_wallclock_as_timestamps 1 \
    -i input.sdp \
    -map 0:a:0 -c:a copy \
    -map 0:v:0 -c:v copy \
    -f webm -flags +global_header \
    -y recording.webm

So I'd like to endorse the changes proposed in the mentioned branch.

The issue seemed to be, from what I understood in the patchwork conversation (comment:8), that the current code design of FFmpeg around signal handling does make a difference between handling signals for the main loop, and handling them within nested loops (and that's where the need for two signals came from, and the motivation for this report)

I'd agree that the user experience of requiring two signals is already strange, but requiring them only some times, makes it even more confusing. IMO it's all due to an internal implementation detail, which should not leak to the user.

Note: See TracTickets for help on using tickets.