Opened 18 months ago

Closed 8 months ago

Last modified 8 months ago

#7825 closed defect (fixed)

Malfunctioning `ssim` filter?..

Reported by: gdgsdg123 Owned by:
Priority: normal Component: avfilter
Version: git-master Keywords: ssim
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Build from: https://zeranoe.com/builds/win64/static/ffmpeg-20190402-6aeaac3-win64-static.zip

C:\>ffmpeg -i "bt709.avi" -i "bt601.avi" -lavfi ssim -f null -
ffmpeg version N-93515-g6aeaac3e1c Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 8.2.1 (GCC) 20190212
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
  libpostproc    55.  4.100 / 55.  4.100
[avi @ 00000000005218c0] decoding for stream 0 failed
Input #0, avi, from 'bt709.avi':
  Duration: 00:00:00.03, start: 0.000000, bitrate: 206981 kb/s
    Stream #0:0: Video: h264 (High 4:4:4 Predictive) (H264 / 0x34363248), yuv420p10le(tv, bt709, progressive), 1440x836, 30 fps, 30 tbr, 30 tbn, 60 tbc
[avi @ 0000000002e965c0] decoding for stream 0 failed
Input #1, avi, from 'bt601.avi':
  Duration: 00:00:00.03, start: 0.000000, bitrate: 206981 kb/s
    Stream #1:0: Video: h264 (High 4:4:4 Predictive) (H264 / 0x34363248), yuv420p10le(tv, smpte170m, progressive), 1440x836, 30 fps, 30 tbr, 30 tbn, 60 tbc
Stream mapping:
  Stream #0:0 (h264) -> ssim:main
  Stream #1:0 (h264) -> ssim:reference
  ssim -> Stream #0:0 (wrapped_avframe)
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf58.26.101
    Stream #0:0: Video: wrapped_avframe, yuv420p10le, 1440x836, q=2-31, 200 kb/s, 30 fps, 30 tbn, 30 tbc (default)
    Metadata:
      encoder         : Lavc58.48.100 wrapped_avframe
frame=    1 fps=0.0 q=-0.0 Lsize=N/A time=00:00:00.03 bitrate=N/A speed=0.189x
video:1kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_ssim_0 @ 0000000002df0f40] SSIM Y:1.000000 (inf) U:1.000000 (inf) V:1.000000 (inf) All:1.000000 (inf)

"bt709.avi" and "bt601.avi" are from the same source and encoded using the same parameters, but tagged differently.

And apparently they don't look the same...

Attachments (3)

sample.7z (839.3 KB) - added by gdgsdg123 18 months ago.
black.png (23.7 KB) - added by gdgsdg123 8 months ago.
Sheer black (#000000) for all pixels.
tainted.png (23.7 KB) - added by gdgsdg123 8 months ago.
Sheer black for all pixels except the most upper-left pixel tainted to "#010000".

Download all attachments as: .zip

Change History (17)

Changed 18 months ago by gdgsdg123

comment:1 follow-up: Changed 18 months ago by cehoyos

  • Keywords colorspace removed
  • Resolution set to invalid
  • Status changed from new to closed
  • Type changed from enhancement to defect

Apart from the attachment, I believe the filter acts as specified.

comment:2 in reply to: ↑ 1 Changed 18 months ago by gdgsdg123

Replying to cehoyos:

Apart from the attachment, I believe the filter acts as specified.

I don't think SSIM is supposed to be color-blind... (and I'm not talking about the `colorspace` filter)


Workaround

Due to the colorspace awareness issue of the ssim filter, it's recommended to convert both inputs to some colorspace without all these color management hazards. (e.g. RGB)

Or make sure that both inputs use exactly the same color management schema.

Last edited 8 months ago by gdgsdg123 (previous) (diff)

Changed 8 months ago by gdgsdg123

Sheer black (#000000) for all pixels.

Changed 8 months ago by gdgsdg123

Sheer black for all pixels except the most upper-left pixel tainted to "#010000".

comment:3 Changed 8 months ago by gdgsdg123

  • Resolution invalid deleted
  • Status changed from closed to reopened
  • Summary changed from `-lavfi ssim` is not colorspace aware?.. to Malfunctioning `ssim` filter?..

I fear even the core of the `ssim` filter may not be functioning properly...


I purposely made 2 PNG files of "sheer black" content ("black.png", "tainted.png"), with the "tainted.png" purposefully had the most upper-left pixel tainted to a different color, and:

ffmpeg -i "black.png" -i "tainted.png" -lavfi "ssim;[0][1]psnr" -f null -
ffmpeg version git-2020-01-26-5e62100 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 9.2.1 (GCC) 20200122
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf
  libavutil      56. 38.100 / 56. 38.100
  libavcodec     58. 67.100 / 58. 67.100
  libavformat    58. 36.100 / 58. 36.100
  libavdevice    58.  9.103 / 58.  9.103
  libavfilter     7. 71.100 /  7. 71.100
  libswscale      5.  6.100 /  5.  6.100
  libswresample   3.  6.100 /  3.  6.100
  libpostproc    55.  6.100 / 55.  6.100
Input #0, png_pipe, from 'black.png':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: png, rgb24(pc), 3840x2160, 25 tbr, 25 tbn, 25 tbc
Input #1, png_pipe, from 'tainted.png':
  Duration: N/A, bitrate: N/A
    Stream #1:0: Video: png, rgb24(pc), 3840x2160, 25 tbr, 25 tbn, 25 tbc
Stream mapping:
  Stream #0:0 (png) -> ssim:main
  Stream #0:0 (png) -> psnr:main
  Stream #1:0 (png) -> ssim:reference
  Stream #1:0 (png) -> psnr:reference
  ssim -> Stream #0:0 (wrapped_avframe)
  psnr -> Stream #0:1 (wrapped_avframe)
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf58.36.100
    Stream #0:0: Video: wrapped_avframe, gbrp(progressive), 3840x2160, q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc
    Metadata:
      encoder         : Lavc58.67.100 wrapped_avframe
    Stream #0:1: Video: wrapped_avframe, gbrp, 3840x2160, q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc
    Metadata:
      encoder         : Lavc58.67.100 wrapped_avframe
frame=    1 fps=0.0 q=-0.0 Lq=-0.0 size=N/A time=00:00:00.04 bitrate=N/A speed=0.122x
video:1kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_ssim_0 @ 0000000002a16b40] SSIM R:1.000000 (inf) G:1.000000 (inf) B:1.000000 (inf) All:1.000000 (inf)
[Parsed_psnr_1 @ 0000000002a1f2c0] PSNR r:117.318653 g:inf b:inf average:122.089866 min:122.089866 max:122.089866

...You sure this fits the definition?

Structural similarity - Wikipedia

The resultant SSIM index is a decimal value between -1 and 1, and value 1 is only reachable in the case of two identical sets of data and therefore indicates perfect structural similarity.




Also check:

AV1 vs VP9 vs AVC (h.264) vs HEVC (h.265) Part I - Lossless

SSIM Y:1.000000 (73.043867) U:1.000000 (70.134668) V:1.000000 (69.880162) All:1.000000 (72.541141)
PSNR y:97.786557 u:99.494677 v:99.335577 average:98.264431 min:79.206295 max:inf


What does the value in parentheses for ffmpeg ssim log denote - Video Production Stack Exchange

It is dB representation of All value, calculated with following formula:

10 * log10(1 / (1 - ssim))

comment:4 follow-ups: Changed 8 months ago by cehoyos

  • Resolution set to invalid
  • Status changed from reopened to closed

Feel free to open a new ticket instead of changing an existing one but please understand that there will always be a resolution high enough to compensate for one different pixel.

comment:5 Changed 8 months ago by richardpl

SSIM does not work like that, if you need one pixel difference use PSNR.

comment:6 in reply to: ↑ 4 Changed 8 months ago by gdgsdg123

Replying to cehoyos:

Feel free to open a new ticket instead of changing an existing one.

The 2 topics are closely related so I believe it would be better to have them merged.


Replying to richardpl:

SSIM does not work like that, if you need one pixel difference use PSNR.

So you mean what on the Wikipedia is plainly wrong?..

If so, it would be very kind of you to edit that page and provide sufficient proof. The rest of the world would appreciate it.

comment:7 follow-up: Changed 8 months ago by richardpl

Look, if you can prove you statement feel free to reopen bug, otherwise keep calm so to not reveal big ignorance. The page you linked nowhere mentions single pixel change by one difference gives different results.

comment:8 in reply to: ↑ 7 Changed 8 months ago by gdgsdg123

Replying to richardpl:

Look, if you can prove you statement feel free to reopen bug, otherwise keep calm so to not reveal big ignorance.

Huh?.. Why would you take a polite request as the sign of big ignorance?


Replying to richardpl:

The page you linked nowhere mentions single pixel change by one difference gives different results.

Check the comment:3, I believe it's clear enough.

comment:9 Changed 8 months ago by richardpl

Lets try in other words, find ssim implementation that give results you expect.

comment:10 Changed 8 months ago by pdr0

Could it be a rounding issue? decimal places / precision ? If the value in parenthesis is the dB representation of All value, and values are rounded to 6 decimal could that be the issue ?

Repeating the tests at different resolutions eg. 1280x720, 1920x1080, etc... all with 1 pixel @0,0 as RGB [1,0,0]

At some point, between 1280x800 and 1200x1000 ffmpeg ssim does not detect the difference (inf). It calculates 1280x720 and 1280x800 versions as the same values, where PSNR determines them as different


ffmpeg -i "1280x720_1.png" -i "1280x720_0.png" -lavfi "ssim;[0][1]psnr" -f null -

[Parsed_ssim_0 @ 0000006362f0f900] SSIM R:1.000000 (72.247199) G:1.000000 (inf)
B:1.000000 (inf) All:1.000000 (inf)
[Parsed_psnr_1 @ 0000006362f16ac0] PSNR r:107.776228 g:inf b:inf average:112.547
441 min:112.547441 max:112.547441


ffmpeg -i "1280x800_1.png" -i "1280x800_0.png" -lavfi "ssim;[0][1]psnr" -f null -

[Parsed_ssim_0 @ 0000004e6a813700] SSIM R:1.000000 (72.247199) G:1.000000 (inf)
B:1.000000 (inf) All:1.000000 (inf)
[Parsed_psnr_1 @ 0000004e6a83e780] PSNR r:108.233803 g:inf b:inf average:113.005
016 min:113.005016 max:113.005016


ffmpeg -i "1200x1000_1.png" -i "1200x1000_0.png" -lavfi "ssim;[0][1]psnr" -f null -

[Parsed_ssim_0 @ 00000076083dc340] SSIM R:1.000000 (inf) G:1.000000 (inf) B:1.00
0000 (inf) All:1.000000 (inf)
[Parsed_psnr_1 @ 00000076083dcfc0] PSNR r:108.922616 g:inf b:inf average:113.693
829 min:113.693829 max:113.693829


ffmpeg -i "1440x1080_1.png" -i "1440x1080_0.png" -lavfi "ssim;[0][1]psnr" -f null -

[Parsed_ssim_0 @ 000000930607e6c0] SSIM R:1.000000 (inf) G:1.000000 (inf) B:1.00
0000 (inf) All:1.000000 (inf)
[Parsed_psnr_1 @ 0000009306071340] PSNR r:110.048666 g:inf b:inf average:114.819
879 min:114.819879 max:114.819879


ffmpeg -i "1920x1080_1.png" -i "1920x1080_0.png" -lavfi "ssim;[0][1]psnr" -f null -

[Parsed_ssim_0 @ 0000003f50d34e80] SSIM R:1.000000 (inf) G:1.000000 (inf) B:1.00
0000 (inf) All:1.000000 (inf)
[Parsed_psnr_1 @ 0000003f50d3cd80] PSNR r:111.298053 g:inf b:inf average:116.069
266 min:116.069266 max:116.069266


vapoursynth ssim can detect the differences, but it carries more decimal places

vapoursynth ssim (no downsample, but enable downsample also detects the difference)
3840x2160
0.9999999975397562135270845828927122056484222412109375

1920x1080
0.99999999015902474308603586905519478023052215576171875

1280x720
0.99999997785780581072145878351875580847263336181640625

self test @ 1920x1080 (to check validity)
1

There are different variations of SSIM calculations - some use different window sizes, some downsample (as suggested in original ssim paper), some apply gaussian filter (slower), some a box blur (faster)

https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/vf_ssim.c

 * To improve speed, this implementation uses the standard approximation of
 * overlapped 8x8 block sums, rather than the original gaussian weights.

Could this "speed" implementation be contributing to the issue?

comment:11 follow-up: Changed 8 months ago by richardpl

No, its float usage in places where double should be used instead.

comment:12 in reply to: ↑ 11 Changed 8 months ago by Balling

  • Resolution changed from invalid to fixed

Replying to richardpl:

No, its float usage in places where double should be used instead.

Does this fcc0424c933742c8fc852371e985d16b6eb4bfe9 fix this problem? Yeah and fixit commit 0815a22dccbb67970ea84559f22afacee4219192

Last edited 8 months ago by Balling (previous) (diff)

comment:13 Changed 8 months ago by pdr0

Repeating those tests above with a binary that includes those commits - differences are detected now (no longer "(inf)") when they were not before

comment:14 in reply to: ↑ 4 Changed 8 months ago by gdgsdg123

Replying to cehoyos:

...please understand that there will always be a resolution high enough to compensate for one different pixel.

I do understand that but also realize: the error is certainly avoidable.

By distinguishing the case of zero absolute difference from other cases (using a flag), conditionally adapt the original algorithm's evaluation to some slightly less accurate but practically still accurate enough values. (thus the distinction, and better fit of the definition)



Unsure if SSIM = 0 has some special meaning too.

If it does, the problem can be avoided too taking a similar approach.

Last edited 8 months ago by gdgsdg123 (previous) (diff)
Note: See TracTickets for help on using tickets.