Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#8466 closed defect (fixed)

av_url_split violates RFC2396, fails to parse URLs

Reported by: Jacob Owned by:
Priority: normal Component: avformat
Version: unspecified Keywords: http
Cc: Marton Balint Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

(Slightly related ticket: #7871)

Summary of the bug:

This took a long time to track down, but the exact file and problem has now been identified.

The bug is in "libavformat/utils.c", function "av_url_split".

When given a URL that conforms to RFC2398, the 1998 URL standard, FFmpeg incorrectly parses the URL and fails to download the file.

This is because FFmpeg's parser looks for the earliest slash after the domain name, by assuming that there must be a slash after the domain name, before the query string, which is not true. The standard says that the slash is totally optional (https://stackoverflow.com/a/42193734/8874388).

How to reproduce:

ffmpeg version 4.2.1 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 9.1.1 (GCC) 20190807

INCORRECTLY PARSED BY FFMPEG:

% ffmpeg -i "https://key.vscdns.com?key_id=2&model_id=991011&&access_key="
[https @ 000002308f499b00] HTTP error 400 Bad Request

CORRECTLY PARSED (VIA USER MANUALLY ADDING A SLASH):

% ffmpeg -i "https://key.vscdns.com/?key_id=2&model_id=991011&&access_key="
[https @ 00000262520c9b00] HTTP error 403 Forbidden

It affects any URL where the query string is immediate. Which means that it affects direct media links, M3U8/HLS playlists (where the user has no control over the URLs and no way to fix it themselves), etc.

This is problematic because the ".com?" format is entirely up to the streaming media provider, and cannot be affected by the user. Any streaming playlist that uses a field such as "EXT-X-KEY" with such a URL (as in my example URL) will fail completely in FFmpeg and there's nothing the user can do about it. So it is imperative that FFmpeg understands RFC2396 and allows the slash to be optional.

I looked at the parsing code and it seems quite simple to fix. It currently looks for the first "/" slash and "?" question mark. The fix would probably be to say "If a question mark was found, and it is earlier than the first slash, or no slash was found at all" then set the pointers so that it treats the "?" as the start of the query and the character immediately before that as the end of the domain name.

PS: I have set the same priority as the earlier URL parsing error ticket I referred to above. Feel free to lower the priority.

Change History (11)

comment:1 by Carl Eugen Hoyos, 4 years ago

Keywords: RFC2396 parsing removed
Priority: importantnormal
Version: 4.2unspecified

Please test current FFmpeg git head and provide the command line you tested together with the complete, uncut console output to make this a valid ticket.

comment:2 by Jacob, 4 years ago

I am looking at the bugged code in question:

http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/utils.c

(In function "av_url_split".)

It uses "strchr" to scan for a slash and a question mark. The strchr function returns NULL if the character is not found, else it returns a pointer to where the character was found.

4806     /* separate path from hostname */
4807     ls = strchr(p, '/');
4808     ls2 = strchr(p, '?');
4809     if (!ls)
4810         ls = ls2;
4811     else if (ls && ls2)
4812         ls = FFMIN(ls, ls2);

"ls" is slash and "ls2" is question mark.

As you can see, it works as follows:

  • if "ls" not found, set "ls" to "ls2" (query location)
  • else if "ls" and "ls2" both found, set "ls" to the lowest of "ls" or "ls2"

This seems to be correct and to properly understand that URLs can start with a query string.

So I assume that the bug is further down in the function, where it splits the URI string based on the pointers it found...

There are many potential reasons for the bug:

  • Perhaps "ls" must be set to the character BEFORE "ls2" whenever "ls" was not found. Currently it's setting "ls" to the same position as "ls2" (the question mark).
  • Perhaps the splitting code further down needs to understand that the path can be completely empty.
  • Perhaps the splitting code needs to generate a "/" path when the path is empty, so that the generated URL in the HTTP GET request that is sent to the server will be "/?key=..." instead of "?key=..."

The latter is the most likely cause. It probably generates absolutely nothing as path when given an URL such as this, and then sends a HTTP/1.1 GET "?key=..." to the server, which the server in turn fails to understand, since the HTTP/1.1 GET protocol probably demands a leading slash, whereas the URL standard does not.

If that's the cause, then there are two potential areas to fix this:

  • Either in the function "av_url_split", where we can make it generate a default "/" path whenever no path was given.
  • Or in the HTTP connection library (the one generating the HTTP/1.1 GET request), where we can make it request "/..." whenever the given request path "..." doesn't begin with a slash

comment:3 by Jacob, 4 years ago

Alright. Here's the latest Git build's output:

INCORRECTLY PARSED BY FFMPEG:

$ ffmpeg -i "https://key.vscdns.com?key_id=2&model_id=991011&&access_key="
ffmpeg version git-2020-01-10-3ea7057 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 9.2.1 (GCC) 20191125
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf
  libavutil      56. 38.100 / 56. 38.100
  libavcodec     58. 65.102 / 58. 65.102
  libavformat    58. 35.101 / 58. 35.101
  libavdevice    58.  9.103 / 58.  9.103
  libavfilter     7. 70.101 /  7. 70.101
  libswscale      5.  6.100 /  5.  6.100
  libswresample   3.  6.100 /  3.  6.100
  libpostproc    55.  6.100 / 55.  6.100
[https @ 00000229894b92c0] HTTP error 400 Bad Request
https://key.vscdns.com?key_id=2&model_id=991011&&access_key=: Server returned 400 Bad Request


MANUALLY REWRITTEN TO MAKE FFMPEG UNDERSTAND:

$ ffmpeg -i "https://key.vscdns.com?key_id=2&model_id=991011&&access_key="
ffmpeg version git-2020-01-10-3ea7057 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 9.2.1 (GCC) 20191125
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf
  libavutil      56. 38.100 / 56. 38.100
  libavcodec     58. 65.102 / 58. 65.102
  libavformat    58. 35.101 / 58. 35.101
  libavdevice    58.  9.103 / 58.  9.103
  libavfilter     7. 70.101 /  7. 70.101
  libswscale      5.  6.100 /  5.  6.100
  libswresample   3.  6.100 /  3.  6.100
  libpostproc    55.  6.100 / 55.  6.100
[https @ 00000205c0b592c0] HTTP error 403 Forbidden
https://key.vscdns.com/?key_id=2&model_id=991011&&access_key=: Server returned 403 Forbidden (access denied)

My previous reply above has located the bug to both the URL parser (it can be fixed in there), and the HTTP GET connection library (it can be fixed there too). Where it's fixed doesn't matter, and it would be most trivial to fix it via the URL parser (av_url_split).

comment:4 by Jacob, 4 years ago

The 2nd (manually rewritten) example above accidentally copied the command line part of the 1st example. That's what happens when you do a quick copy... sigh.

Here's the used commands, yet again:

INCORRECTLY PARSED BY FFMPEG:

$ ffmpeg -i "https://key.vscdns.com?key_id=2&model_id=991011&&access_key="
ffmpeg version git-2020-01-10-3ea7057 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 9.2.1 (GCC) 20191125
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf
  libavutil      56. 38.100 / 56. 38.100
  libavcodec     58. 65.102 / 58. 65.102
  libavformat    58. 35.101 / 58. 35.101
  libavdevice    58.  9.103 / 58.  9.103
  libavfilter     7. 70.101 /  7. 70.101
  libswscale      5.  6.100 /  5.  6.100
  libswresample   3.  6.100 /  3.  6.100
  libpostproc    55.  6.100 / 55.  6.100
[https @ 00000229894b92c0] HTTP error 400 Bad Request
https://key.vscdns.com?key_id=2&model_id=991011&&access_key=: Server returned 400 Bad Request


MANUALLY REWRITTEN TO MAKE FFMPEG UNDERSTAND:

$ ffmpeg -i "https://key.vscdns.com/?key_id=2&model_id=991011&&access_key="
ffmpeg version git-2020-01-10-3ea7057 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 9.2.1 (GCC) 20191125
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf
  libavutil      56. 38.100 / 56. 38.100
  libavcodec     58. 65.102 / 58. 65.102
  libavformat    58. 35.101 / 58. 35.101
  libavdevice    58.  9.103 / 58.  9.103
  libavfilter     7. 70.101 /  7. 70.101
  libswscale      5.  6.100 /  5.  6.100
  libswresample   3.  6.100 /  3.  6.100
  libpostproc    55.  6.100 / 55.  6.100
[https @ 00000205c0b592c0] HTTP error 403 Forbidden
https://key.vscdns.com/?key_id=2&model_id=991011&&access_key=: Server returned 403 Forbidden (access denied)

comment:5 by Marton Balint, 4 years ago

Cc: Marton Balint added
Resolution: fixed
Status: newclosed

in reply to:  5 comment:6 by Jacob, 4 years ago

Replying to cus:

Fixed in a3d8875df1f71e62a36c583cee638ab1d51a71d3.

Hi Marton, I saw your patches below:

https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=365b817b51630447305f49a4e2f79ab8ad842473

https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=554576b6cfe79a91d37e14d3617ca417562085db

https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=987ce96d41d21a120e6109f9bb1e3a9dcddbf30b

https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=a3d8875df1f71e62a36c583cee638ab1d51a71d3

I've also reviewed them now and have to say that you have done a brilliant job! The new URL parser code is so much cleaner and safer, and you noticed that hashmark support was also needed (slash, question mark and hashmark all being supported by RFC3986).

Really great work, thank you so much! :-)

Best Regards,

comment:7 by Jacob, 4 years ago

Resolution: fixed
Status: closedreopened

Oh and it was great to see that you even made a sanitizer for the connection algorithm ("http_open_cnx_internal") which auto-adds a slash if there was no slash in the input URL given to the function.

I'm actually re-opening this to ask a question, though, for other reviewers to consider:

https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=a3d8875df1f71e62a36c583cee638ab1d51a71d3

Note how the code is adding a "/" slash if the first character of the path is a question mark. Is there any problem if the URL is "https://example.com#foo"?

It looks to me like the hashmark check a few lines earlier ( https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=987ce96d41d21a120e6109f9bb1e3a9dcddbf30b ) puts a null byte at the hashmark position to "erase the hashmark and everything after it" already, so I guess there is no problem! Just asking to be sure. :-)

in reply to:  7 ; comment:8 by Marton Balint, 4 years ago

Resolution: fixed
Status: reopenedclosed

Replying to aitte:

Note how the code is adding a "/" slash if the first character of the path is a question mark. Is there any problem if the URL is "https://example.com#foo"?

It looks to me like the hashmark check a few lines earlier ( https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=987ce96d41d21a120e6109f9bb1e3a9dcddbf30b ) puts a null byte at the hashmark position to "erase the hashmark and everything after it" already, so I guess there is no problem! Just asking to be sure. :-)

Yeah, that should be fine as well, empty path (it becomes empty after removing the hashmark) was converted to / even before this patchset.

in reply to:  8 comment:9 by Jacob, 4 years ago

Replying to cus:

Replying to aitte:

Note how the code is adding a "/" slash if the first character of the path is a question mark. Is there any problem if the URL is "https://example.com#foo"?

It looks to me like the hashmark check a few lines earlier ( https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=987ce96d41d21a120e6109f9bb1e3a9dcddbf30b ) puts a null byte at the hashmark position to "erase the hashmark and everything after it" already, so I guess there is no problem! Just asking to be sure. :-)

Yeah, that should be fine as well, empty path (it becomes empty after removing the hashmark) was converted to / even before this patchset.

Hey, thanks again, I really almost can't believe how much cleaner the new URL code is. The old one you replaced (ls, ls2 system) was a convoluted mess. Hehe. This change is truly brilliant: https://git.videolan.org/?p=ffmpeg.git;a=commitdiff;h=554576b6cfe79a91d37e14d3617ca417562085db

And ahh yeah I see what you mean now about the empty path converting to "/", at this area of the code:
https://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/http.c;h=d0df1b6d585da422aef14eb6128eb319ddd9ac98;hb=a3d8875df1f71e62a36c583cee638ab1d51a71d3#l220

So "http://example.com#tag" would be split via "av_url_split()" into "http", "example.com" and "#tag" (path1). Then path1 variable gets truncated to "\0tag", and then it says "if (path1[0] == '\0') { path = "/"; }" which means it should properly be converting the path to "/" when there was just a hashmark after the domain name.

And if the path is non-empty but starts with "?", your new code formats it as "/?" properly. And lastly, if the path is normal and starts with "/" already then it's used as-is.

Alright this looks perfect.

You've done a fantastic job!

  • Cleaned up the buggy, messy av_url_split, wohoo!
  • Added support for "example.com?x" and "example.com#x" URLs in the URL splitter.
  • Properly removing hashmark fragments from paths when connecting to target URLs. That would have caused problems in the old code which kept them. Great catch!
  • Added support for auto-inserting "/" when connecting to target URLs that didn't begin their path with a slash already.

This looks like a good implementation of RFC3986. Thank you so much for the great work!

:-)

comment:11 by Jacob, 4 years ago

This is now live in the latest stable release! FFmpeg 4.3 Stable brought this fix to the masses. Thank you so much again, for your great work @cus!

Note: See TracTickets for help on using tickets.