Context Navigation

#7825 closed defect (fixed)

Malfunctioning `ssim` filter?..

Reported by:	gdgsdg123	Owned by:
Priority:	normal	Component:	avfilter
Version:	git-master	Keywords:	ssim
Cc:		Blocked By:
Blocking:		Reproduced by developer:	no
Analyzed by developer:	no

Description

Build from: https://zeranoe.com/builds/win64/static/ffmpeg-20190402-6aeaac3-win64-static.zip

C:\>ffmpeg -i "bt709.avi" -i "bt601.avi" -lavfi ssim -f null -
ffmpeg version N-93515-g6aeaac3e1c Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 8.2.1 (GCC) 20190212
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 48.100 / 58. 48.100
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  7.100 / 58.  7.100
  libavfilter     7. 48.100 /  7. 48.100
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
  libpostproc    55.  4.100 / 55.  4.100
[avi @ 00000000005218c0] decoding for stream 0 failed
Input #0, avi, from 'bt709.avi':
  Duration: 00:00:00.03, start: 0.000000, bitrate: 206981 kb/s
    Stream #0:0: Video: h264 (High 4:4:4 Predictive) (H264 / 0x34363248), yuv420p10le(tv, bt709, progressive), 1440x836, 30 fps, 30 tbr, 30 tbn, 60 tbc
[avi @ 0000000002e965c0] decoding for stream 0 failed
Input #1, avi, from 'bt601.avi':
  Duration: 00:00:00.03, start: 0.000000, bitrate: 206981 kb/s
    Stream #1:0: Video: h264 (High 4:4:4 Predictive) (H264 / 0x34363248), yuv420p10le(tv, smpte170m, progressive), 1440x836, 30 fps, 30 tbr, 30 tbn, 60 tbc
Stream mapping:
  Stream #0:0 (h264) -> ssim:main
  Stream #1:0 (h264) -> ssim:reference
  ssim -> Stream #0:0 (wrapped_avframe)
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf58.26.101
    Stream #0:0: Video: wrapped_avframe, yuv420p10le, 1440x836, q=2-31, 200 kb/s, 30 fps, 30 tbn, 30 tbc (default)
    Metadata:
      encoder         : Lavc58.48.100 wrapped_avframe
frame=    1 fps=0.0 q=-0.0 Lsize=N/A time=00:00:00.03 bitrate=N/A speed=0.189x
video:1kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_ssim_0 @ 0000000002df0f40] SSIM Y:1.000000 (inf) U:1.000000 (inf) V:1.000000 (inf) All:1.000000 (inf)

"bt709.avi" and "bt601.avi" are from the same source and encoded using the same parameters, but tagged differently.

And apparently they don't look the same...

Attachments (3)

sample.7z (839.3 KB ) - added by gdgsdg123 6 years ago.
black.png (23.7 KB ) - added by gdgsdg123 5 years ago.: Sheer black (#000000) for all pixels.
tainted.png (23.7 KB ) - added by gdgsdg123 5 years ago.: Sheer black for all pixels except the most upper-left pixel tainted to "#010000".

Download all attachments as: .zip

Change History (17)

by gdgsdg123, 6 years ago

Attachment:	sample.7z added

follow-up: 2 comment:1 by Carl Eugen Hoyos, 6 years ago

Keywords:	colorspace removed
Resolution:	→ invalid
Status:	new → closed
Type:	enhancement → defect

Apart from the attachment, I believe the filter acts as specified.

in reply to: 1 comment:2 by gdgsdg123, 6 years ago

Replying to cehoyos:

Apart from the attachment, I believe the filter acts as specified.

I don't think SSIM is supposed to be color-blind... (and I'm not talking about the `colorspace` filter)

Workaround

Due to the colorspace awareness issue of the ssim filter, it's recommended to convert both inputs to some colorspace without all these color management hazards. (e.g. RGB)

Or make sure that both inputs use exactly the same color management schema.

Last edited 5 years ago by gdgsdg123 (previous) (diff)

by gdgsdg123, 5 years ago

Attachment:	black.png added

Sheer black (#000000) for all pixels.

by gdgsdg123, 5 years ago

Attachment:	tainted.png added

Sheer black for all pixels except the most upper-left pixel tainted to "#010000".

comment:3 by gdgsdg123, 5 years ago

Resolution:	invalid
Status:	closed → reopened
Summary:	`-lavfi ssim` is not colorspace aware?.. → Malfunctioning `ssim` filter?..

I fear even the core of the `ssim` filter may not be functioning properly...

I purposely made 2 PNG files of "sheer black" content ("black.png", "tainted.png"), with the "tainted.png" purposefully had the most upper-left pixel tainted to a different color, and:

ffmpeg -i "black.png" -i "tainted.png" -lavfi "ssim;[0][1]psnr" -f null -

ffmpeg version git-2020-01-26-5e62100 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 9.2.1 (GCC) 20200122
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf
  libavutil      56. 38.100 / 56. 38.100
  libavcodec     58. 67.100 / 58. 67.100
  libavformat    58. 36.100 / 58. 36.100
  libavdevice    58.  9.103 / 58.  9.103
  libavfilter     7. 71.100 /  7. 71.100
  libswscale      5.  6.100 /  5.  6.100
  libswresample   3.  6.100 /  3.  6.100
  libpostproc    55.  6.100 / 55.  6.100
Input #0, png_pipe, from 'black.png':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: png, rgb24(pc), 3840x2160, 25 tbr, 25 tbn, 25 tbc
Input #1, png_pipe, from 'tainted.png':
  Duration: N/A, bitrate: N/A
    Stream #1:0: Video: png, rgb24(pc), 3840x2160, 25 tbr, 25 tbn, 25 tbc
Stream mapping:
  Stream #0:0 (png) -> ssim:main
  Stream #0:0 (png) -> psnr:main
  Stream #1:0 (png) -> ssim:reference
  Stream #1:0 (png) -> psnr:reference
  ssim -> Stream #0:0 (wrapped_avframe)
  psnr -> Stream #0:1 (wrapped_avframe)
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
  Metadata:
    encoder         : Lavf58.36.100
    Stream #0:0: Video: wrapped_avframe, gbrp(progressive), 3840x2160, q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc
    Metadata:
      encoder         : Lavc58.67.100 wrapped_avframe
    Stream #0:1: Video: wrapped_avframe, gbrp, 3840x2160, q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc
    Metadata:
      encoder         : Lavc58.67.100 wrapped_avframe
frame=    1 fps=0.0 q=-0.0 Lq=-0.0 size=N/A time=00:00:00.04 bitrate=N/A speed=0.122x
video:1kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_ssim_0 @ 0000000002a16b40] SSIM R:1.000000 (inf) G:1.000000 (inf) B:1.000000 (inf) All:1.000000 (inf)
[Parsed_psnr_1 @ 0000000002a1f2c0] PSNR r:117.318653 g:inf b:inf average:122.089866 min:122.089866 max:122.089866

...You sure this fits the definition?

Structural similarity - Wikipedia

The resultant SSIM index is a decimal value between -1 and 1, and value 1 is only reachable in the case of two identical sets of data and therefore indicates perfect structural similarity.

Also check:

AV1 vs VP9 vs AVC (h.264) vs HEVC (h.265) Part I - Lossless

SSIM Y:1.000000 (73.043867) U:1.000000 (70.134668) V:1.000000 (69.880162) All:1.000000 (72.541141)
PSNR y:97.786557 u:99.494677 v:99.335577 average:98.264431 min:79.206295 max:inf

What does the value in parentheses for ffmpeg ssim log denote - Video Production Stack Exchange

It is dB representation of All value, calculated with following formula:
10 * log10(1 / (1 - ssim))

follow-ups: 6 14 comment:4 by Carl Eugen Hoyos, 5 years ago

Resolution:	→ invalid
Status:	reopened → closed

Feel free to open a new ticket instead of changing an existing one but please understand that there will always be a resolution high enough to compensate for one different pixel.

comment:5 by Elon Musk, 5 years ago

SSIM does not work like that, if you need one pixel difference use PSNR.

in reply to: 4 comment:6 by gdgsdg123, 5 years ago

Replying to cehoyos:

Feel free to open a new ticket instead of changing an existing one.

The 2 topics are closely related so I believe it would be better to have them merged.

Replying to richardpl:

SSIM does not work like that, if you need one pixel difference use PSNR.

So you mean what on the Wikipedia is plainly wrong?..

If so, it would be very kind of you to edit that page and provide sufficient proof. The rest of the world would appreciate it.

follow-up: 8 comment:7 by Elon Musk, 5 years ago

Look, if you can prove you statement feel free to reopen bug, otherwise keep calm so to not reveal big ignorance. The page you linked nowhere mentions single pixel change by one difference gives different results.

in reply to: 7 comment:8 by gdgsdg123, 5 years ago

Replying to richardpl:

Look, if you can prove you statement feel free to reopen bug, otherwise keep calm so to not reveal big ignorance.

Huh?.. Why would you take a polite request as the sign of big ignorance?

Replying to richardpl:

The page you linked nowhere mentions single pixel change by one difference gives different results.

Check the comment:3, I believe it's clear enough.

comment:9 by Elon Musk, 5 years ago

Lets try in other words, find ssim implementation that give results you expect.

comment:10 by pdr0, 5 years ago

Could it be a rounding issue? decimal places / precision ? If the value in parenthesis is the dB representation of All value, and values are rounded to 6 decimal could that be the issue ?

Repeating the tests at different resolutions eg. 1280x720, 1920x1080, etc... all with 1 pixel @0,0 as RGB [1,0,0]

At some point, between 1280x800 and 1200x1000 ffmpeg ssim does not detect the difference (inf). It calculates 1280x720 and 1280x800 versions as the same values, where PSNR determines them as different

ffmpeg -i "1280x720_1.png" -i "1280x720_0.png" -lavfi "ssim;[0][1]psnr" -f null -

[Parsed_ssim_0 @ 0000006362f0f900] SSIM R:1.000000 (72.247199) G:1.000000 (inf)
B:1.000000 (inf) All:1.000000 (inf)
[Parsed_psnr_1 @ 0000006362f16ac0] PSNR r:107.776228 g:inf b:inf average:112.547
441 min:112.547441 max:112.547441

ffmpeg -i "1280x800_1.png" -i "1280x800_0.png" -lavfi "ssim;[0][1]psnr" -f null -

[Parsed_ssim_0 @ 0000004e6a813700] SSIM R:1.000000 (72.247199) G:1.000000 (inf)
B:1.000000 (inf) All:1.000000 (inf)
[Parsed_psnr_1 @ 0000004e6a83e780] PSNR r:108.233803 g:inf b:inf average:113.005
016 min:113.005016 max:113.005016

ffmpeg -i "1200x1000_1.png" -i "1200x1000_0.png" -lavfi "ssim;[0][1]psnr" -f null -

[Parsed_ssim_0 @ 00000076083dc340] SSIM R:1.000000 (inf) G:1.000000 (inf) B:1.00
0000 (inf) All:1.000000 (inf)
[Parsed_psnr_1 @ 00000076083dcfc0] PSNR r:108.922616 g:inf b:inf average:113.693
829 min:113.693829 max:113.693829

ffmpeg -i "1440x1080_1.png" -i "1440x1080_0.png" -lavfi "ssim;[0][1]psnr" -f null -

[Parsed_ssim_0 @ 000000930607e6c0] SSIM R:1.000000 (inf) G:1.000000 (inf) B:1.00
0000 (inf) All:1.000000 (inf)
[Parsed_psnr_1 @ 0000009306071340] PSNR r:110.048666 g:inf b:inf average:114.819
879 min:114.819879 max:114.819879

ffmpeg -i "1920x1080_1.png" -i "1920x1080_0.png" -lavfi "ssim;[0][1]psnr" -f null -

[Parsed_ssim_0 @ 0000003f50d34e80] SSIM R:1.000000 (inf) G:1.000000 (inf) B:1.00
0000 (inf) All:1.000000 (inf)
[Parsed_psnr_1 @ 0000003f50d3cd80] PSNR r:111.298053 g:inf b:inf average:116.069
266 min:116.069266 max:116.069266

vapoursynth ssim can detect the differences, but it carries more decimal places

vapoursynth ssim (no downsample, but enable downsample also detects the difference)
3840x2160
0.9999999975397562135270845828927122056484222412109375

1920x1080
0.99999999015902474308603586905519478023052215576171875

1280x720
0.99999997785780581072145878351875580847263336181640625

self test @ 1920x1080 (to check validity)
1

There are different variations of SSIM calculations - some use different window sizes, some downsample (as suggested in original ssim paper), some apply gaussian filter (slower), some a box blur (faster)

https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/vf_ssim.c

 * To improve speed, this implementation uses the standard approximation of
 * overlapped 8x8 block sums, rather than the original gaussian weights.

Could this "speed" implementation be contributing to the issue?

follow-up: 12 comment:11 by Elon Musk, 5 years ago

No, its float usage in places where double should be used instead.

in reply to: 11 comment:12 by Balling, 5 years ago

Resolution:	invalid → fixed

Replying to richardpl:

No, its float usage in places where double should be used instead.

Does this fcc0424c933742c8fc852371e985d16b6eb4bfe9 fix this problem? Yeah and fixit commit 0815a22dccbb67970ea84559f22afacee4219192

Last edited 5 years ago by Balling (previous) (diff)

comment:13 by pdr0, 5 years ago

Repeating those tests above with a binary that includes those commits - differences are detected now (no longer "(inf)") when they were not before

in reply to: 4 comment:14 by gdgsdg123, 5 years ago

Replying to cehoyos:

...please understand that there will always be a resolution high enough to compensate for one different pixel.

I do understand that but also realize: the error is certainly avoidable.

By distinguishing the case of zero absolute difference from other cases (using a flag), conditionally adapt the original algorithm's evaluation to some slightly more inaccurate but practically still accurate enough values. (thus the distinction, and better fit of the definition)

Version 0, edited 5 years ago by gdgsdg123 (next)

Note: See TracTickets for help on using tickets.

Download in other formats: