Opened 17 months ago

Last modified 16 months ago

#5822 new defect

filter "scale_npp" fails to select correct algorithm (Nvidia CUDA/NPP scaler)

Reported by: sdack Owned by:
Priority: normal Component: avfilter
Version: git-master Keywords: npp
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Summary of the bug:

The Nvidia CUDA/NPP filter for scaling images on the GPU fails to select the requested algorithm in some cases.

While scaling an UltraHD video from 3840x2160p down to 1280x720p (33% size reduction with no change in aspect ratio) do several filters produce the exact same result.

This behaviour shows most severely with a 33% reduction in scale, but can also be observed at 50% and 25%. In other cases such as a 67% and 40% reduction of scale does it appear to be working as expected.

When the aspect ratio is changed with the size then it behaves as expected again.

One can see the effect here in a montage of various combinations of hardware and software scalers and encoders. Each picture shows the name of the algorithm, an encoder setting and the resulting file size of the video. In the cases where scale_npp fails to select the algorithm do the images look identical and the resulting file sizes are identical, too:

http://i.imgur.com/5qfPCSU.png

If I had to guess I'd say there is an optimization going wrong or the scaler could be running into a hardware limitation.

The issue can be observed with CUDA 7.5 and 8.0rc1.

How to reproduce:

The following script can be used to detect the issue. It produces a 1 second uhd2160 video with 50 frames, scales it down with scale_npp and runs a checksum (md5sum) on the raw video. Where the algorithms produced identical output for all 50 frames do they show identical checksums.

--- test.sh ---
#!/bin/bash
function runtests() {

w="$1" ; h="$2" ; fmt="nv12"
for alg in nn linear cubic cubic2p_bspline \

cubic2p_catmullrom cubic2p_b05c03 super lanczos; do

ffmpeg -v error -f lavfi \

-i testsrc2=duration=1:size=uhd2160:rate=50 \
-pix_fmt $fmt \
-filter:v "hwupload_cuda,scale_npp=w=$w:h=$h:format=$fmt:interp_algo=$alg,hwdownload" \
-f rawvideo -y - | md5sum -b | sed "s/-/$alg/"

done

}

echo "3840x2160 -> 2560x1440" ; runtests 2560 1440 | sort # 67% - ok
echo "3840x2160 -> 1920x1080" ; runtests 1920 1080 | sort # 50% - linear=super
echo "3840x2160 -> 1536x864" ; runtests 1536 864 | sort # 40% - ok
echo "3840x2160 -> 1280x720" ; runtests 1280 720 | sort # 33% - nn=linear=cubic=catmulrom=lanczos
echo "3840x2160 -> 960x540" ; runtests 960 540 | sort # 25% - catmulrom=cubic

echo "3840x2160 -> 2561x1441" ; runtests 2561 1441 | sort # 67% - ok
echo "3840x2160 -> 1921x1081" ; runtests 1921 1081 | sort # 50% - ok
echo "3840x2160 -> 1537x865" ; runtests 1537 865 | sort # 40% - ok
echo "3840x2160 -> 1281x721" ; runtests 1281 721 | sort # 33% - ok
echo "3840x2160 -> 961x541" ; runtests 961 541 | sort # 25% - ok
--- EOF ---

A work-around is not to use the module, but to fallback to the software scaler.

Change History (10)

comment:1 Changed 17 months ago by sdack

  • Version changed from unspecified to git-master

comment:2 Changed 17 months ago by oromit

You'll have to complain to Nvidia about that.
All the code in ffmpeg does it passing the interpolation-method on to libnpp.

comment:3 Changed 17 months ago by sdack

I may have found something. It appears to lie with the parameters to the function call at line 474 of vf_scale_npp.c:

err = nppiResizeSqrPixel_8u_C1R(in->data[i], (NppiSize?){ iw, ih },

in->linesize[i], (NppiRect?){ 0, 0, iw, ih },
out->data[i], out->linesize[i],
(NppiRect?){ 0, 0, ow, oh },
(double)ow / iw, (double)oh / ih,
0.0, 0.0, s->interp_algo);

The 2nd-last and 3rd-last parameter are specified as 0.0. However by looking at one of Nvidia's programming samples do they use 0.5 for both.

According to their documentation:

--- NPP_Library_Image_Geometry.pdf ---
7.5.3.28
NppStatus? nppiResizeSqrPixel_8u_C1R (const Npp8u ∗ pSrc, NppiSize? oSrcSize, int
nSrcStep, NppiRect? oSrcROI, Npp8u ∗ pDst, int nDstStep, NppiRect? oDstROI, double
nXFactor, double nYFactor, double nXShift, double nYShift, int eInterpolation)
1 channel 8-bit unsigned image resize.
Parameters:
pSrc Source-Image Pointer.
nSrcStep Source-Image Line Step.
oSrcSize Size in pixels of the source image.
oSrcROI Region of interest in the source image.
pDst Destination-Image Pointer.
nDstStep Destination-Image Line Step.
oDstROI Region of interest in the destination image.
nXFactor Factor by which x dimension is changed.
nYFactor Factor by which y dimension is changed.
nXShift Source pixel shift in x-direction.
nYShift Source pixel shift in y-direction.
eInterpolation The type of eInterpolation to perform resampling.
---

These two values describe a pixel shift. Further does it say:

---
ResizeSqrPixel? attempts to choose source pixels that would approximately represent the center of the des-
tination pixels. It does so by using the following scaling formula to select source pixels for interpolation:
nAdjustedXFactor = 1.0 / nXFactor;
nAdjustedYFactor = 1.0 / nYFactor;
nAdjustedXShift = nXShift * nAdjustedXFactor + ((1.0 - nAdjustedXFactor) * 0.5);
nAdjustedYShift = nYShift * nAdjustedYFactor + ((1.0 - nAdjustedYFactor) * 0.5);
nSrcX = nAdjustedXFactor * nDstX - nAdjustedXShift;
nSrcY = nAdjustedYFactor * nDstY - nAdjustedYShift;
---

For a destination coordinate of 10,10 and at a 33% size reduction would it pick 31,31 as source for a shift of 0.0.
If the shift is 0.5 for both x and y then 10,10 of the destination falls onto source position 29.5,29.5 or rounded 30,30.
And if the shift was 1.0 for both then 10,10 of the destination would fall onto 28,28.

My guess here is that it should be 0.5 for both shift values just as it is done in the example and that the two values describe center of the square pixel in a range of 0.0 and 1.0.

I don't know yet how this affects the algorithms, but a first test with the shifts changed to 0.5 looks promising. There are no more identical outputs. I'll do some more tests with real footage and see how this affects the output.

comment:4 Changed 17 months ago by sdack

Although one can influence the result with a different pixel shift and thereby produce distinguishable images from the algorithms does this also cause a minor shift in the image itself, which isn't acceptable.

After reading the Nvidia forums did I notice a dev saying there were bugs...
"We (NVIDIA) recently found a bug in the rotation code which at some scale factors can cause the last source pixel in a row or column to not be sampled. ResizeSqrPixel? contains a similar test which probably also needs fixed. It would be great if you could send us an example of a failure case. In the meantime, a possible work around would be to increase oSrcROI.width and oSrcROI.height by 1. To be safe in all cases however, this may require that you increase the memory allocated for your source image by 1 in both width and height." (Posted 07/26/2016)

This doesn't sound quite like the issue here, but I am hoping an Nvidia dev will take a look at this since there is already one known bug. I have posted the problem on the Nvidia forums.

https://devtalk.nvidia.com/default/topic/962808/gpu-accelerated-libraries/nppiresizesqrpixel_8u_c1r-produces-low-quality-images/

comment:5 Changed 17 months ago by cehoyos

  • Keywords npp added; scale_npp interp_algo Nvidia CUDA NPP removed

So there is no bug that can be fixed in FFmpeg?

comment:6 Changed 17 months ago by sdack

No, there is more than one bug. The one confirmed by Nvidia is unrelated to this.

It is my hope to get a response from them and telling me FFmpeg is doing it wrong and how to do it right, which means it can be fixed easily. If it turns out to be with Nvidia then who knows when or if this gets fixed.

To fix the issue in FFmpeg might require using the 16-bit or floating-point implementation of this function.

I'd like to wait for a response by Nvidia.

comment:7 Changed 17 months ago by sdack

After getting some info from the Nvidia forums and further reading is this the situation as it presents itself to me:

  1. Nvidia's CUDA NPP functions are copies of Intel's IPP functions (NPP stands for Nvidia Performance Primitives and IPP stands for Intel Performance Primitives). Nvidia uses this fact to point to Intel's documentation when developers have questions about it.
  1. Intel have marked the corresponding function (and variations) as deprecated as of IPP v7. The current release is IPP v9.
  1. Intel have provided replacement functions with IPP v7, which users should be using instead. These allow to specify filter matrices, which I interpret as a sign of quality improvement and a confession on the poor quality of the ResizeSqrPixel? function. (I see a light at the horizon...)
  1. Sadly (... and the light is gone) does Nvidia not provide any of the replacement functions even when Intel have announced this change several versions ago. The replacements cannot be found in either CUDA 7.5 (current) or CUDA 8.0rc1 (future). CUDA 8.0rc1 stills contain copies of the old functions, which Intel have now removed for the IPP libraries.

In short, this function is a sinking ship. Sadly, is it the central part of the scale_npp module, too.

Last edited 17 months ago by sdack (previous) (diff)

comment:8 Changed 16 months ago by sdack

So far the only response I got was to send in a feature request for Nvidia to provide the new functions, which I've done.

CUDA 8.0 has been released and still provides no replacement functions. They have even abandoned the use of some of the algorithms for this function.

cubic2p_bspline, cubic2p_catmullrom and cubic2p_b05c03 now only produce an error message (-22, Interpolation Error).

The npp_scale filter may need to be declared as deprecated.

comment:9 Changed 16 months ago by oromit

I don't see a reason to deprecate it. Specially as there is no replacement.
It's an upstream bug, and it still gets the job done, just not with the correct scaling type.

comment:10 Changed 16 months ago by sdack

You may be confusing "deprecated" with "removed". I'm not saying it should be removed. It may only be the filter will get removed due to this lack of support, for having a low image quality and being bound to a specific hardware and an external library.

It's then better to give users a "heads up" by declaring it as deprecated, not to make it a secret, and to hope it's going to change in the future. One can always undeclare it.

Note: See TracTickets for help on using tickets.