Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#9781 closed defect (fixed)

zscale adds distortion when resampling video data

Reported by: Llyw Owned by:
Priority: normal Component: avfilter
Version: git-master Keywords: zscale distortion
Cc: Llyw Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description (last modified by Llyw)

I noticed that the zscale filter can introduce distortion (by either cutting or crushing of the image, I have not been able to pinpoint the exact type of distortion yet) when resampling video input using FFmpeg@30e2bb0f64, but not when using FFmpeg 5.0.1:

Take, for example, the 720p VP9 WebM version of Blender Foundation's Elephants Dream at https://upload.wikimedia.org/wikipedia/commons/transcoded/2/28/Elephants_Dream_%282006%29_1080p24.webm/Elephants_Dream_%282006%29_1080p24.webm.720p.vp9.webm.

At roughly 7 minutes, 2 seconds a stills image of this video looks like this in VLC 3.0.17.4:
https://trac.ffmpeg.org/attachment/ticket/9781/Original.png

Then, resample to 480x272 pixels with a display aspect ratio of 16:9 and a sample aspect ratio of 136:135:

ffmpeg -i 'Elephants_Dream_(2006)_1080p24.webm.720p.vp9.webm' -vf 'zscale=w=480:h=272:f=lanczos,setdar=dar=16/9' -pix_fmt yuv420p Output_Distorted.mp4

The corresponding stills image will look like this when resizing the video port to show a 720p resolution:
https://trac.ffmpeg.org/attachment/ticket/9781/Output_Distorted.png

I understand that there is some movement in this scene but if you look at the background you should clearly see the distortion I was talking about.

Compare this with the same operation done using FFmpeg's scale filter:

ffmpeg -i 'Elephants_Dream_(2006)_1080p24.webm.720p.vp9.webm' -vf 'scale=w=480:h=272:flags=lanczos+accurate_rnd,setdar=dar=16/9' -pix_fmt yuv420p Output_Undistorted.mp4

Here, the corresponding stills image looks like this and does not show significant distortion when comparing with the stills image of the input video (some minor mismatch is obviously expected due to the strong resampling and movement in the scene):
https://trac.ffmpeg.org/attachment/ticket/9781/Output_Undistorted.png

I am assuming that this might have something to do with the recent-ish efforts to parallelize the zscale filter and would appreciate it, if this behavior is investigated.

Attachments (5)

Original.png (1.1 MB ) - added by Llyw 2 years ago.
Stills image of original input
Output_Distorted.png (909.8 KB ) - added by Llyw 2 years ago.
Stills image when using zscale
Output_Undistorted.png (916.5 KB ) - added by Llyw 2 years ago.
Stills image when using scale
Output_zscale_5.0.1.png (916.2 KB ) - added by Llyw 2 years ago.
Stills image using zscale of FFmpeg 5.0.1
Output_zscale_git-e3580f6077.png (918.8 KB ) - added by Llyw 2 years ago.
Stills image using zscale as of FFmpeg@e3580f6077 (git-master)

Change History (19)

by Llyw, 2 years ago

Attachment: Original.png added

Stills image of original input

by Llyw, 2 years ago

Attachment: Output_Distorted.png added

Stills image when using zscale

by Llyw, 2 years ago

Attachment: Output_Undistorted.png added

Stills image when using scale

comment:1 by Llyw, 2 years ago

Cc: Llyw added
Description: modified (diff)

comment:2 by Elon Musk, 2 years ago

Resolution: invalid
Status: newclosed

Taking pngs from Lossy encodes is invalid.

comment:3 by Llyw, 2 years ago

Resolution: invalid
Status: closedreopened

I do not understand how that is invalid. The PNGs are screenshots of VLC, where I used the advance-by-1-frame option to forward to the same frame in all three files before taking the screenshot.

If you don't feel like taking the screenshots seriously, at least try my reproducer command using the zscale filter and compare the git-master and 5.0.1 output. The difference will be immediately apparent. You are also free to try any other input, but it did take me several hours to come up with an easy enough reproducer so I must ask you to put at least some effort into this.

comment:4 by Balling, 2 years ago

Why cannot you use mpv and jxl? VLC is very buggy. Also you did not tag the output and that leads to a bug, different color between original and undistorted.

I understand that there is some movement in this scene but if you look at the background you should clearly see the distortion I was talking about.

This sounds very vague. I see that it looks like aspect ratio bug somehwere. Or that pixels are not squere. Looks like setdar=dar=16/9' is wrong.

Last edited 2 years ago by Balling (previous) (diff)

comment:5 by Elon Musk, 2 years ago

Resolution: fixed
Status: reopenedclosed

Sort of improved output with 52a14b8505923116ed6acc5e691c0c7c44e6f708

For best output use even number of threads option: zscale=....:threads=2 for example.

comment:6 by Llyw, 2 years ago

Thanks for taking the time to take a look at this, Paul!

I took a look at your changes and I think I am beginning to understand the issue with the previous implementation: Essentially, the previous implementation computed the active input area based on the job index and number of jobs. Ignoring the adjustment to make the slice offsets divisible by four and denoting the input height with ih, the output height with oh, and the number of threads with n, the previous implementation would thus assign an implicit resample ratio of (ih % floor(ih/n))/(oh % floor(oh/n)) to the last slice, and a resample ratio of floor(ih/n)/floor(oh/n) to all others if oh was not divisible by n and n>1. These ratios can differ significantly, leading to observed distortion, and are bad approximations to the actual resample ratio ih/oh.

Your commit improves this by computing the active input area based on the actual resample ratio ih/oh instead. This will reduce the distortion when oh is not divisible by n and n > 1, since the resample ratio of the last slice floor(((oh % ceil(oh/n)) * ih)/oh)/(oh % ceil(oh/n)) will be closer to floor((ceil(oh/n) * ih)/oh)/ceil(oh/n), with both expressions being closer to ih/oh than the previous approximation. (When ceil(oh/n) is odd, you need to replace it with ceil(oh/n)+1 in the above expressions.)

To reduce distortion, one should therefore at least ensure that oh is divisible by s where s is equal to floor((ceil(oh/n) + 1)/2) or choose n=1. For n>1, having oh being divisible by s is still no guarantee for no distortion, however, you would also need s * ih to be divisible by oh for that.

I did take a look at the z.lib/zimg API and it seems that the active area does not have to be specified as an integer but is stored as floating-point values, so a future improvement would be to use double precision floating-point values when computing the active area of the input for each output slice. Until then the filter may still show some distortion depending on the input dimensions, output dimensions, and the thread count. (I would honestly appreciate it if this defect would remain open until this improvement is realized but am hesitant to re-open it myself.)

comment:7 by Balling, 2 years ago

Well, if the output is not bitperfect with what was before the Intel commit it should be reopened. It is not even the first bug that commit brought.

Last edited 2 years ago by Balling (previous) (diff)

comment:8 by Llyw, 2 years ago

Description: modified (diff)
Resolution: fixed
Status: closedreopened

comment:9 by Elon Musk, 2 years ago

Resolution: fixed
Status: reopenedclosed

Fixed fully.

comment:10 by Llyw, 2 years ago

Description: modified (diff)

Agreed. (I can understand a desire to close new issues as fast as possible, but I don't think that it's the right way forward in this case. The relevant code in the z.lib/zimg API showing that it is possible to compute the input slice's active area precisely is at ​https://github.com/sekrit-twc/zimg/blob/release-3.0.4/src/zimg/api/zimg.h#L568)

comment:11 by Llyw, 2 years ago

Thank you for your swift help, Paul! (There was some race-condition between my previous and your last comment, sorry if that caused any confusion. I also accidentally mangled the ticket's description, that should have been rectified as well.)

comment:12 by Balling, 2 years ago

710dce131fdb6c1ebec1f26a7a4173d82d828330 makes it bitperfect then?

comment:13 by Llyw, 2 years ago

That's an easy question to ask, but a difficult one to answer ;)

While I won't be able to answer your question to the full extent, I tried to gain a rough understanding on how the use of multiple slices might cause floating-point deviations: Internally, z.lib/zimg seems to convert from the input to the output coordinate system, see https://github.com/sekrit-twc/zimg/blob/release-3.0.4/src/zimg/resize/filter.cpp#L250, while the parallelization scheme is doing it the other way around. Since oh/iw is not necessarily equal to 1/(oh/ih) in floating-point arithmetics that could lead to some minor deviations due to round-off. In relative terms, this deviation should be in the same order as the unit round-off for double precision floating-point arithmetic, so I do not think that this should be a reason to reopen this defect. I furthermore suspect that there might be no way around a minor loss of accuracy when parallelizing the execution of the zscale filter.

It might make sense, however, to ask the question whether you will get identical results when using only a single slice. As ih and oh approach INT_MAX (not that this will happen any time soon) you could improve the following assignment, at least from a theoretical/academic point of view:

        s->in_slice_start[i] = s->out_slice_start[i] * in_h / (double)out_h;
        s->in_slice_end[i]   = s->out_slice_end[i]   * in_h / (double)out_h;

When rewriting this as

        s->in_slice_start[i] = in_h * (s->out_slice_start[i] / (double)out_h);
        s->in_slice_end[i]   = in_h * (s->out_slice_end[i]   / (double)out_h);

you can be certain (not withstanding my comment in the final paragraph down below) that this will reduce precisely to

        s->in_slice_start[0] = 0.0;
        s->in_slice_end[0]   = in_h;

for a single slice since as in this case you have s->out_slice_start[0]=0.0 and s->out_slice_end[0]=(double)oh with oh being equal to out_h: Since the dimensions of an AVFrame are 32-bit signed integers, see https://ffmpeg.org/doxygen/2.7/structAVFrame.html, and 32-bit signed integers can be exactly represented as double precision floating-point values, you can be certain that the expression s->out_slice_end[0]/(double)out_h will be equal to 1.0, causing the endpoint of your slice to be exactly equal to in_h, which is also the default that z.lib/zimg assumes when the active_area member is not specified, see https://github.com/sekrit-twc/zimg/blob/release-3.0.4/src/zimg/api/zimg.cpp#L405, as was the case before the Intel-commit. (With the current code this will only be true as long as the exact result of out_h*in_h can be precisely represented by a double precision floating-point value.)

Note, however, that the default floating-point model will give the compiler some freedom when it comes to interpreting the source code and I am not sure whether the above reasoning will hold in general. If you want to investigate this further, I would recommend that you add some regression tests.

I should also point out that I have not yet had the opportunity to try my reproducer commands with the updated code and even then I will be limited to a simple visual comparison. I strongly expect that I will not be able to tell a difference with my yuv420p video (when comparing the updated git-master to FFmpeg 5.0.1) when I get around to doing that, but will get back to you if this turns out not to be the case.

Last edited 2 years ago by Llyw (previous) (diff)

by Llyw, 2 years ago

Attachment: Output_zscale_5.0.1.png added

Stills image using zscale of FFmpeg 5.0.1

by Llyw, 2 years ago

Stills image using zscale as of FFmpeg@e3580f6077 (git-master)

comment:14 by Llyw, 2 years ago

Final update: I just had the opportunity to retry my reproducer command with an updated FFmpeg build (based on FFmpeg@e3580f6077) and can confirm that the geometrical distortion as observed in this defect has been fixed with the most recent changes, as evident when comparing the newly added attachments https://trac.ffmpeg.org/attachment/ticket/9781/Output_zscale_5.0.1.png and https://trac.ffmpeg.org/attachment/ticket/9781/Output_zscale_git-e3580f6077.png that show the zscale output of FFmpeg 5.0.1 and the result obtained using the aforementioned git revision, respectively.

Note: See TracTickets for help on using tickets.