Opened 3 years ago

Last modified 19 months ago

#9351 open defect

bug in several windows builds when reencoding h.264 to h.265 with GPU

Reported by: John Dury Owned by: r.arzumanyan@visionlabs.ai
Priority: normal Component: undetermined
Version: unspecified Keywords: ffmpeg gpu buffer
Cc: Russell Morris Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description (last modified by John Dury)

Summary of the bug:
How to reproduce:

% ffmpeg -report -n -loglevel error -stats -hwaccel cuvid -hwaccel_output_format cuda -threads 2 -i test_H_264_AAC_1080p.mkv -c:s copy -c:a copy -map 0:a -map 0:s? -map 0:v -c:v hevc_nvenc -preset slow -qmax 20 hevc-test_H_264_AAC_1080p.mkv
The error is:
[hevc_nvenc @ 000000c920fbb940] Failed locking bitstream buffer: not enough buffer (14): .19x
video encoding failed: Buffer too small

This happens on several windows builds but not all. I tested every windows build with the exception of the lgpl and gpl since licensing should be the only difference. This is true for both the gyan.dev and BtbN builds. I created logs for each test from each build. I used the exact same ffmpeg comands on different windows builds with the exact same MKV test file. That file name is test_H_264_AAC_1080p.mkv and will be uploaded to the FTP with all of the logs from each build, successful and failed. I have tried this on two different machines with two different NVIDIA cards with no other GPU activity occurring simultaneously. The success/failures looks like this:
 ffmpeg-2021-07-27-git-0068b3d0f0-essentials_build(failed)
ffmpeg-2021-07-27-git-0068b3d0f0-full_build(failed)
ffmpeg-4.4-essentials_build(successful)
ffmpeg-4.4-full_build(successful)
ffmpeg-4.4-full_build-shared(successful)
ffmpeg-N-103083-g0068b3d0f0-win64-gpl(failed)
ffmpeg-N-103083-g0068b3d0f0-win64-gpl-shared(failed)
ffmpeg-n4.4-79-gde1132a891-win64-gpl-4.4(successful)
ffmpeg-n4.4-79-gde1132a891-win64-gpl-shared-4.4(successful)

Trying to connect to the ftp server but it keeps timing out. I will hopefully upload the sample MKV and it's logs soon. Both were uploaded to www.filehosting.org as:
sample MKV file:
https://www.filehosting.org/file/details/955546/test_H_264_AAC_1080p.mkv
log files from each windows build: ​https://www.filehosting.org/file/details/955547/test_H_264_AAC_1080p_logs.rar

Change History (42)

comment:1 by John Dury, 3 years ago

Both the sample MKV file and the logs for each of the windows builds were uploaded to https://www.filehosting.org/ as:
sample MKV file:
https://www.filehosting.org/file/details/955546/test_H_264_AAC_1080p.mkv
log files from each windows build: https://www.filehosting.org/file/details/955547/test_H_264_AAC_1080p_logs.rar

comment:2 by John Dury, 3 years ago

Description: modified (diff)

comment:3 by Gyan, 3 years ago

Your hoster requires an email address to which it will send the link. That will have few takers.

Paste your logs at an open service like pastebin.

comment:4 by John Dury, 3 years ago

FWIW, I reverted back to an older build 05/05/2021 4.40 gyan.dev and it seems to not have this issue. I spoke with one of the source hosts (BtbN) and he said none of the x.265 code had changed recently so it must be something else.
I also tried pasting the logs one at a time but there are too large for pastebin. Honestly I would be a little surprised if anyone looked at this since it is so odd,especially across builds and appears to only exist when using ffmpeg on some, not all, MKV files. I can consistently recreate it with the uploaded MKV and several of the newer ffmpeg windows binaries.

comment:5 by John Dury, 3 years ago

I will gladly upload the sample MKV and logs to another host. Suggestions since the ffmpeg FTP appears to be nonexistent or not working.

comment:6 by John Dury, 3 years ago

don't know if it helps, but I went back and tried the same command on archived versions of ffmpeg by gyan.dev and it fails on all archived versions I was able to find.
ffmpeg-2021-07-18-git-694545b6d5-full_build
ffmpeg-2021-07-21-git-f614390ecc-full_build
ffmpeg-2021-07-25-git-a2a7547b2f-full_build
ffmpeg-2021-07-27-git-0068b3d0f0-full_build

comment:7 by John Dury, 3 years ago

Since the BtbN version seems to have more archived versions, I went and grabbed several and it looks like between the 102631 and 102809, the bug was introduced. My testing of BtbN versions is as follows:
ffmpeg-N-102186-g8b83a4a885-win64-gpl(successful)
ffmpeg-N-102631-gbaf5cc5b7a-win64-gpl(successful)
ffmpeg-N-102809-gde8e6e67e7-win64-gpl(failed)
ffmpeg-N-102981-gcf12a478b2-win64-gpl(failed)
Hopefully someone is following all of this as I am not sure what more testing I can do. (willing to try though)

comment:8 by Gyan, 3 years ago

So far, you haven't described what the bug is.

Your command is using NVENC so x265 is not in the picture.

You can find copies of all my releases at the Github mirror.

Last edited 3 years ago by Gyan (previous) (diff)

comment:9 by Balling, 3 years ago

BTW, -hwaccel cuvid -hwaccel_output_format cuda: the second is still forced by the first one.

comment:10 by John Dury, 3 years ago

I thought my original post had all of the relevant information including the failure but it kept timing out when trying to post so maybe it did not go through correctly.
The error I am getting is:
[hevc_nvenc @ 0000001d5df021c0] Failed locking bitstream buffer: not enough buffer (14): .39x
video encoding failed: Buffer too small
This is easily reproduced when using the same MKV file. It has happened on several others also. There is no other GPU activity happening when the error occurs. I have tried this on two different machines with two different NVIDIA GPUs, both with their latest drivers. When regressing build levels it appears to not happen. I am testing older versions of gyan.dev builds now and will post results once I narrow down where the bug started.

in reply to:  3 comment:11 by Balling, 3 years ago

Replying to Gyan:

Your hoster requires an email address to which it will send the link. That will have few takers.

Paste your logs at an open service like pastebin.

Just use https://temp-mail.org/ Really??

comment:12 by Balling, 3 years ago

Works without -hwaccel cuvid -hwaccel_output_format cuda. Please bisect, thanks. I will update when will check the old version.

comment:13 by John Dury, 3 years ago

I went back and tried on several archive versions (thanks for pointing me there!) and the results are as follows with gyan versions:
ffmpeg-2021-05-30-git-51f1194eda-full_build(successful)
ffmpeg-2021-06-06-git-43295ae6a9-full_build(failed)
ffmpeg-2021-06-09-git-e01bf559df-full_build(failed)
ffmpeg-2021-06-13-git-3ce272a9da-full_build(failed)
ffmpeg-2021-06-16-git-604924a069-full_build(failed)
ffmpeg-2021-06-19-git-2cf95f2dd9-full_build(failed)
ffmpeg-2021-06-23-git-947122f111-full_build(failed)
ffmpeg-2021-06-27-git-49e3a8165c-full_build(failed)
ffmpeg-2021-06-30-git-de8e6e67e7-full_build(failed)
ffmpeg-2021-07-04-git-301d275301-full_build(failed)
ffmpeg-2021-07-11-git-79ebdbb9b9-full_build(failed)

in reply to:  12 comment:14 by John Dury, 3 years ago

Replying to Balling:

Works without -hwaccel cuvid -hwaccel_output_format cuda. Please bisect, thanks. I will update when will check the old version.

Yes. I thought all of that was in my original post which seems to be missing.

comment:15 by Balling, 3 years ago

I do not understand why you are using hwaccel cuvid. There is no such hwaccel unless you will actually use such cuda codec, you DO NOT. You should use -hwaccel cuda, IMHO.

comment:17 by Balling, 3 years ago

Owner: set to Timo R.
Status: newopen

And again, you code broke something, please look, workaround is to set -extra_hw_frames 5 in the

ffmpeg -extra_hw_frames 5 -y -report  -loglevel error -stats -hwaccel cuvid -hwaccel_output_format cuda -threads 2 -i C:\Users\ZAQU\Downloads\test_H_264_AAC_1080p.mkv -c:s copy -c:a copy -map 0:a -map 0:s? -map 0:v -c:v hevc_nvenc -preset slow -qmax 20 hevc-test_H_264_AAC_1080p.mkv


comment:18 by John Dury, 3 years ago

Tried some more archive versions just for completeness.
ffmpeg-2021-05-05-git-7c451b609c-full_build(Successful)
ffmpeg-2021-05-09-git-8649f5dca6-full_build(Successful)
ffmpeg-2021-05-12-git-175f675f7b-full_build(Successful)
ffmpeg-2021-05-16-git-f53414a038-full_build(Successful)
ffmpeg-2021-05-19-git-2261cc6d8a-full_build(Successful)
ffmpeg-2021-05-23-git-4c0d6c91f6-full_build(Successful)
ffmpeg-2021-05-26-git-7a879cce37-full_build(Successful)

comment:19 by John Dury, 3 years ago

I tried adding "-extra_hw_frames 5" and it still fails on latest build (07/27) of gyan.

in reply to:  19 ; comment:20 by Balling, 3 years ago

Replying to John Dury:

I tried adding "-extra_hw_frames 5" and it still fails on latest build (07/27) of gyan.

try 6 and more. Again, that is the bug here, since it breaks after 10th message using -v debug

[h264 @ 000002b1809d5fc0] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 0

BTW, crazy that there is no space there before bracket. See: https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavcodec/h2645_parse.c#L307 (and line 324).

BTW, Failed locking bitstream buffer: not enough buffer (14): happens because of Ctrl-C here. Not because of code. Though, IMHO, also a bug.

Last edited 3 years ago by Balling (previous) (diff)

in reply to:  20 comment:21 by John Dury, 3 years ago

I tried -extra_hw_frames 5 all the way to 15. They all fail with the same error:
[hevc_nvenc @ 000000c920fbb940] Failed locking bitstream buffer: not enough buffer (14): .19x
video encoding failed: Buffer too small

Replying to Balling:

Replying to John Dury:

I tried adding "-extra_hw_frames 5" and it still fails on latest build (07/27) of gyan.

try 6 and more. Again, that is the bug here, since it beaks after 10th (-v debug)

[h264 @ 000002b1809d5fc0] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 0

BTW, crazy that there is no space there before bracket. See: https://github.com/FFmpeg/FFmpeg/blob/870bfe16a12bf09dca3a4ae27ef6f81a2de80c40/libavcodec/h2645_parse.c#L307 (and line 324).

BTW, Failed locking bitstream buffer: not enough buffer (14): happens because of Ctrl-C here. Not because of code. Though, IMHO, also a bug.

comment:22 by John Dury, 3 years ago

Description: modified (diff)

comment:23 by Balling, 3 years ago

Real workaround is -bf 0. Sigh, just like was in #9130. [Or of course just do not use -
hwaccel_output_format cuda.]

ffmpeg-2021-05-30-git-51f1194eda-full_build also has this problem, since it also has this a0949d0bcb0eee2f3fffcf9a4810c0295d14c0dc that lead to #9130.

Still, it is strange that -extra_hw_frames 5 does not work for you. Maybe you GPU cannot do it?

in reply to:  23 comment:24 by John Dury, 3 years ago

I tried extra frames all the way up to 15 and it still fails. Also, I have tried this on two different machines with two different GPUs and it consistently fails on both. So I guess it is possible both GPUs can't handle it. Has anyone else tried this with the uploaded MKV with the same ffmpeg commands? (FWIW, I did change -hwaccel to cuda)

Also, I tried -bf0 and got:
Codec AVOption bf (set maximum number of B-frames between non-B-frames) specified for input file #0 (z:\temp\test_H_264_AAC_1080p.mkv) is not a decoding option.
Completely unfamiliar with -bf0.
If I disable the GPU options, it definitely works. Hopefully this isn't a permanent workaround though as I do a tremendous amount of x.265 encoding. I could always stay with older versions of FFMPEG also as a workaround since this bug only appears in all newer versions.

Replying to Balling:

Real workaround is -bf 0. Sigh, just like was in #9130. [Or of course just do not use -
hwaccel_output_format cuda.]

ffmpeg-2021-05-30-git-51f1194eda-full_build also has this problem, since it also has this a0949d0bcb0eee2f3fffcf9a4810c0295d14c0dc that lead to #9130.

Still, it is strange that -extra_hw_frames 5 does not work for you. Maybe you GPU cannot do it?

comment:25 by Balling, 3 years ago

It is -c:v hevc_nvenc -bf 0. And yes, it is permanent workaround, since that was the default before a0949d0bcb0eee2f3fffcf9a4810c0295d14c0dc.
"Did change -hwaccel to cuda" that is the same as nvdec and will be the same as cuvid (foe now cuvid also implied format cuda).

Last edited 3 years ago by Balling (previous) (diff)

in reply to:  25 comment:26 by John Dury, 3 years ago

I added -qf 0 and tried again on both machines with different GPUs and I get the same error using the latest (08/01) gyan binaries.
[hevc_nvenc @ 0000017fefe4afc0] Failed locking bitstream buffer: not enough buffer (14): .95x
video encoding failed: Buffer too small error.
The two GPUs are a GTX 1050 Ti and a GTX 980M if it matters.

Replying to Balling:

It is -c:v hevc_nvenc -bf 0. And yes, it is permanent workaround, since that was the default before a0949d0bcb0eee2f3fffcf9a4810c0295d14c0dc.
"Did change -hwaccel to cuda" that is the same as nvdec and will be the same as cuvid (foe now cuvid also implied format cuda).

comment:27 by Balling, 3 years ago

It is -bf 0, not qf.

in reply to:  27 comment:28 by John Dury, 3 years ago

Sorry. Typo. Retried on both GPUs just to be sure with -bf 0 and still get:
[hevc_nvenc @ 0000026f2690afc0] Failed locking bitstream buffer: not enough buffer (14): .95x
video encoding failed: Buffer too small

Replying to Balling:

It is -bf 0, not qf.

comment:29 by Timo R., 3 years ago

There is unfortunately no way to automatically increase the number of allocated hw frames when somewhere down the chain something needs a higher buffer.
That information is simply not available anywhere for the decoder.

I have no idea why the driver fails with that specific error though, and I don't think this is fixable in ffmpeg, but rather is some kind of weird behaviour of the nvidia driver.

If in doubt, throw a scale_cuda=passthrough=0 filter into the chain to take pressure of the decoder surface pool, at the expense of more VRAM usage.

Last edited 3 years ago by Timo R. (previous) (diff)

comment:30 by Timo R., 3 years ago

The "Failed locking bitstream buffer: not enough buffer" thing seems to be another fallout of a bug in the Nvidia-Driver regarding SEI data.
Passing additional SEI data seemingly causes random memory corruption and can cause all kinds of explosions when the input has any additional SEI data (a53 subs, s12m timestamps, other user sei data, ...).

Nvidia has acknowledged that bug and fixed it in Driver 495.
There is no way for nvenc/CUDA to check the driver version, so there is no sane way to conditionally disable that feature on older drivers.

I added the "-extra_sei" option for that purpose. Setting it to 0 will prevent it from writing any extra SEI data, avoding the issue.
On 4.4 you will have to manually turn off s12m_tc and a53cc, since the option to turn all of it off does not exist there.

in reply to:  30 comment:31 by John Dury, 3 years ago

Thanks. I will try "-extra_sei 0" and wait for the 495 driver when it goes GA. The driver I have installed is 471.41.

Replying to Timo R.:

The "Failed locking bitstream buffer: not enough buffer" thing seems to be another fallout of a bug in the Nvidia-Driver regarding SEI data.
Passing additional SEI data seemingly causes random memory corruption and can cause all kinds of explosions when the input has any additional SEI data (a53 subs, s12m timestamps, other user sei data, ...).

Nvidia has acknowledged that bug and fixed it in Driver 495.
There is no way for nvenc/CUDA to check the driver version, so there is no sane way to conditionally disable that feature on older drivers.

I added the "-extra_sei" option for that purpose. Setting it to 0 will prevent it from writing any extra SEI data, avoding the issue.
On 4.4 you will have to manually turn off s12m_tc and a53cc, since the option to turn all of it off does not exist there.

in reply to:  30 ; comment:32 by Balling, 3 years ago

Replying to Timo R.:

The "Failed locking bitstream buffer: not enough buffer" thing seems to be another fallout of a bug in the Nvidia-Driver regarding SEI data.

Yes, that is the case with logs in .rar. The GPU stops encoding after a while. But on the other hand we bisected it (if you did not get it, my problem is different from his, it is -bf -1 regression due to some strange SEI in the very start of the sample) it is either https://github.com/FFmpeg/FFmpeg/commit/cee9f9628fb983ad9e1c84fb17570f297bc542d2 or ​https://github.com/FFmpeg/FFmpeg/commit/63948a61700d2530f13b7d32df03060f0e7fbb94

Last edited 3 years ago by Balling (previous) (diff)

in reply to:  32 comment:33 by John Dury, 3 years ago

I added "-extra_sei 0" and tested with the gyan 08/01/2021 binaries and it works! So happy. Once the 495 driver goes GA, I will try it and update but at least this is a working woraround for now with the latest windows binaries. So helpful. Thanks much.

Replying to Balling:

Replying to Timo R.:

The "Failed locking bitstream buffer: not enough buffer" thing seems to be another fallout of a bug in the Nvidia-Driver regarding SEI data.

Yes, that is the case with logs in .rar. The GPU stops encoding after a while. But on the other hand we bisected it (if you did not get it, my problem is different from his, it is -bf -1 regression due to some strange SEI in the very start of the sample) it is either https://github.com/FFmpeg/FFmpeg/commit/cee9f9628fb983ad9e1c84fb17570f297bc542d2 or ​https://github.com/FFmpeg/FFmpeg/commit/63948a61700d2530f13b7d32df03060f0e7fbb94

comment:34 by Balling, 3 years ago

Apparently (this is my part of the issue) it works with -bf 1 and two threads yet with -bf 2 already makes it fail even on 2 threads. Same about -b_ref_mode each. Also there is a problem that the sample is VFR. I am already TIRED of all those VFR bugs.

ffmpeg -hwaccel cuda -an -i test_H_264_AAC_1080p.mkv -vf vfrdet -an -f null -
frame=38289 fps=652 q=-0.0 Lsize=N/A time=00:21:17.57 bitrate=N/A speed=21.8x
video:16452kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_vfrdet_0 @ 0000021eff7a2000] VFR:0.733337 (28078/10210) min: 33 max: 34 avg: 33

Version 2, edited 3 years ago by Balling (previous) (next) (diff)

comment:35 by Balling, 3 years ago

Do you think it may be a good idea to revert a0949d0bcb0eee2f3fffcf9a4810c0295d14c0dc? See again: https://www.reddit.com/r/ffmpeg/comments/pq8xbf/comment/hdn006n/

Please compare the amount of b frames...

Last edited 3 years ago by Balling (previous) (diff)

comment:36 by Balling, 3 years ago

This should fix my part of the issue (the patch in comment from Timo): https://patchwork.ffmpeg.org/project/ffmpeg/patch/BN9PR12MB52745632705BBB311428177ED2A89@BN9PR12MB5274.namprd12.prod.outlook.com/#67249

Not that I like the approach much.

Last edited 3 years ago by Balling (previous) (diff)

comment:37 by Balling, 3 years ago

Okay, so 496.13 driver from branch 495_86 was released.

Last edited 3 years ago by Balling (previous) (diff)

comment:38 by Russell Morris, 2 years ago

Cc: Russell Morris added

FYI, have been having issues using ffmpeg with my (Nvidia) GPU - specifically with hevc_nvenc. The workaround above (-extra_sei 0) also addresses my issue, so it seems this bug is still present. A few items, that may help,

  • I don't need the workaround with h264_nvenc
  • I can reproduce the issue with a (stored) TS file
  • I can (often) reproduce the issue with specific channels, streaming from my tuner card.
  • I am running the latest ffmpeg code (i.e. from the repo, local build)
  • I have the latest Nvidia driver installed

Thanks!

Last edited 2 years ago by Russell Morris (previous) (diff)

comment:39 by Balling, 19 months ago

Important commit revert: ac7c265b33b52f914ebe05e581bbe9343eca1186 (that fixed #10332).

Some magic in 402d98c9d467dff6931d906ebb732b9a00334e0b but that caused #10409.

comment:40 by Balling, 19 months ago

Also this may be again https://trac.ffmpeg.org/ticket/7562#comment:6

    /*
     * We add two extra frames to the pool to account for deinterlacing filters
     * holding onto their frames.
     */
    frames_ctx->initial_pool_size = dpb_size + 2;
Last edited 19 months ago by Balling (previous) (diff)

comment:41 by Balling, 19 months ago

Lets do some end of work here: "-extra_sei 0" is no longer needed, Nvidia fixed the problem with "Driver 495" as for the

If in doubt, throw a scale_cuda=passthrough=0 filter into the chain to take pressure of the decoder surface pool, at the expense of more VRAM usage.

That should be fixed by this work in progress patch (two patches, first one just makes it work like scale_cuda=passthrough=0): https://patchwork.ffmpeg.org/project/ffmpeg/patch/BN9PR12MB52745632705BBB311428177ED2A89@BN9PR12MB5274.namprd12.prod.outlook.com/
But then again 402d98c9d467dff6931d906ebb732b9a00334e0b is by the same nvidia Roman guy and it introduced #10409, assigning to him.

Last edited 19 months ago by Balling (previous) (diff)

comment:42 by Balling, 19 months ago

Owner: changed from Timo R. to r.arzumanyan@visionlabs.ai
Note: See TracTickets for help on using tickets.