Opened 13 months ago

Closed 13 months ago

Last modified 12 months ago

#8590 closed defect (invalid)

'telecine=pattern' error for p24, soft telecined sources

Reported by: markfilipak Owned by:
Priority: normal Component: undetermined
Version: unspecified Keywords:
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Scenario:
I transcode p24 sources to p60 targets via a 5-5-5-5 pull-down in order to virtually eliminate the telecine judder associated with 2-3-2-3 pull-down.
This telecine strategy works wonderfully for progressive p24 sources (Blu-ray) but fails for soft-telecined p24 sources (DVD).

Bug Summary:
ffmpeg -i IN.VOB -vf "telecine=pattern=5555,bwdif=mode=send_frame" OUT.MKV
where IN.VOB is any soft telecined, p24 MPEG-2 DVD source,
results in an OUT.MKV @ 75/1.001 FPS instead of @ 60/1.001 FPS, and also a plethora of PTS errors.
It appears that the 'telecine' filter is disregarding the metadata [1] that clearly shows soft-telecining and instead is honoring 'frame_rate_code' (which is bogus for soft-telecined streams) found in the MPEG sequence header.
I suspect 'frame_rate_code' because this line:
ffmpeg -i IN.M2TS -vf "telecine=pattern=5555,bwdif=mode=send_frame" OUT.MKV
works as expected and, to my knowledge, 'frame_rate_code' is the only difference between a progressive p24 source (frame_rate_code=0001, i.e., 24/1.001 FPS) and a soft-telecined p24 source (frame_rate_code=0100, i.e., 30/1.001 FPS).

[1]
frames.frame.0.interlaced_frame=0
frames.frame.0.top_field_first=1
frames.frame.0.repeat_pict=0
frames.frame.1.interlaced_frame=0
frames.frame.1.top_field_first=1
frames.frame.1.repeat_pict=1
frames.frame.2.interlaced_frame=0
frames.frame.2.top_field_first=0
frames.frame.2.repeat_pict=0
frames.frame.3.interlaced_frame=0
frames.frame.3.top_field_first=0
frames.frame.3.repeat_pict=1

Problem Statement:
Except for the audio, the resulting OUT.MKV plays fast.

Repeatability:
Always for soft telecined sources.

Files Effected:
Any unencrypted VOB for a soft telecined movie.

ffmpeg -report -i IN.VOB -vf "telecine=pattern=5555,bwdif=mode=send_frame" OUT.MKV
ffmpeg started on 2020-03-31 at 02:56:03
Report written to "ffmpeg-20200331-025603.log"
ffmpeg version N-94664-g0821bc4eee Copyright (c) 2000-2019 the FFmpeg developers

built with gcc 9.1.1 (GCC) 20190807
configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
libavutil 56. 33.100 / 56. 33.100
libavcodec 58. 55.101 / 58. 55.101
libavformat 58. 31.104 / 58. 31.104
libavdevice 58. 9.100 / 58. 9.100
libavfilter 7. 58.101 / 7. 58.101
libswscale 5. 6.100 / 5. 6.100
libswresample 3. 6.100 / 3. 6.100
libpostproc 55. 6.100 / 55. 6.100

Input #0, mpeg, from 'IN.VOB':

Duration: 00:22:07.55, start: 0.066633, bitrate: 6470 kb/s

Stream #0:0[0x1bf]: Data: dvd_nav_packet
Stream #0:1[0x1e0]: Video: mpeg2video (Main), yuv420p(tv, progressive), 720x480 [SAR 32:27 DAR 16:9], 29.97 fps, 59.94 tbr, 90k tbn, 59.94 tbc
Stream #0:2[0x83]: Audio: ac3, 48000 Hz, 5.1(side), fltp, 384 kb/s
Stream #0:3[0x82]: Audio: ac3, 48000 Hz, 5.1(side), fltp, 384 kb/s
Stream #0:4[0x81]: Audio: ac3, 48000 Hz, 5.1(side), fltp, 384 kb/s
Stream #0:5[0x80]: Audio: ac3, 48000 Hz, 5.1(side), fltp, 384 kb/s

Stream mapping:

Stream #0:1 -> #0:0 (mpeg2video (native) -> h264 (libx264))
Stream #0:2 -> #0:1 (ac3 (native) -> vorbis (libvorbis))

Press [q] to stop, ? for help
[Parsed_telecine_0 @ 0000020fdbc1a800] Telecine pattern 5555 yields up to 3 frames per frame, pts advance factor: 8/20
[libx264 @ 0000020fdbdabc80] using SAR=32/27
[libx264 @ 0000020fdbdabc80] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0000020fdbdabc80] profile High, level 3.1, 4:2:0, 8-bit
[libx264 @ 0000020fdbdabc80] 264 - core 158 r2984 3759fcb - H.264/MPEG-4 AVC codec - Copyleft 2003-2019 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, matroska, to 'OUT.MKV':

Metadata:

encoder : Lavf58.31.104
Stream #0:0: Video: h264 (libx264) (H264 / 0x34363248), yuv420p(progressive), 720x480 [SAR 32:27 DAR 16:9], q=-1--1, 74.93 fps, 1k tbn, 74.93 tbc
Metadata:

encoder : Lavc58.55.101 libx264

Side data:

cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: 18446744073709551615

Stream #0:1: Audio: vorbis (libvorbis) (oV[0][0] / 0x566F), 48000 Hz, 5.1(side), fltp
Metadata:

encoder : Lavc58.55.101 libvorbis

[mpeg @ 0000020fdb73a000] New subtitle stream 0:6 at pos:4487182 and DTS:6.106s/s speed=2.35x
[mpeg @ 0000020fdb73a000] New subtitle stream 0:7 at pos:4489230 and DTS:6.106s
[mpeg @ 0000020fdb73a000] New subtitle stream 0:8 at pos:55965710 and DTS:71.4713sspeed=1.25x
[mpeg @ 0000020fdb73a000] New subtitle stream 0:9 at pos:55967758 and DTS:71.4713s
[mpeg @ 0000020fdb73a000] New subtitle stream 0:10 at pos:55969806 and DTS:71.4713s
[mpeg @ 0000020fdb73a000] New subtitle stream 0:11 at pos:55971854 and DTS:71.4713s
[mpeg @ 0000020fdb73a000] New subtitle stream 0:12 at pos:55982094 and DTS:71.4713s
[mpeg @ 0000020fdb73a000] New subtitle stream 0:13 at pos:55984142 and DTS:71.4713s
[mpeg @ 0000020fdb73a000] New subtitle stream 0:14 at pos:55986190 and DTS:71.4713s
[mpeg @ 0000020fdb73a000] New subtitle stream 0:15 at pos:55988238 and DTS:71.4713s
[mpeg @ 0000020fdb73a000] New subtitle stream 0:16 at pos:55990286 and DTS:71.4713s
[mpeg @ 0000020fdb73a000] New subtitle stream 0:17 at pos:55992334 and DTS:71.4713s
[mpeg @ 0000020fdb73a000] New subtitle stream 0:18 at pos:55994382 and DTS:71.4713s
[mpeg @ 0000020fdb73a000] New subtitle stream 0:19 at pos:55996430 and DTS:71.4713s
[mpeg @ 0000020fdb73a000] New subtitle stream 0:20 at pos:60086286 and DTS:77.0102speed=1.23x
[mpeg @ 0000020fdb73a000] New subtitle stream 0:21 at pos:60088334 and DTS:77.0102s
[matroska @ 0000020fdbc22940] Starting new cluster due to timestamp= 661.1kbits/s speed=1.12x

Last message repeated 10 times

[matroska @ 0000020fdbc22940] Starting new cluster due to timestamp= 669.3kbits/s speed=1.12x

Last message repeated 13 times

[matroska @ 0000020fdbc22940] Starting new cluster due to timestamp= 668.6kbits/s speed=1.13x

Last message repeated 9 times

:::::::::::::::::::::::::::::::::::::::::
:: ::
:: Repeated several thousand times ::
:: ::
:::::::::::::::::::::::::::::::::::::::::

[matroska @ 0000020fdbc22940] Starting new cluster due to timestamp= 785.3kbits/s speed=1.17x

Last message repeated 5 times

[matroska @ 0000020fdbc22940] Starting new cluster due to timestamp= 785.3kbits/s speed=1.17x

Last message repeated 7 times

[matroska @ 0000020fdbc22940] Starting new cluster due to timestamp= 785.3kbits/s speed=1.17x

Last message repeated 10 times

[mpeg2video @ 0000020fdbdad340] ac-tex damaged at 16 17
[mpeg2video @ 0000020fdbdad340] Warning MVs not available
[mpeg2video @ 0000020fdbdad340] concealing 585 DC, 585 AC, 585 MV errors in B frame
IN.VOB: corrupt decoded frame in stream 1
[matroska @ 0000020fdbc22940] Starting new cluster due to timestamp
[ac3 @ 0000020fdbdac580] incomplete frame
IN.VOB: corrupt decoded frame in stream 2
[matroska @ 0000020fdbc22940] Starting new cluster due to timestamp
frame=79575 fps= 70 q=-1.0 Lsize= 127629kB time=00:22:07.54 bitrate= 787.6kbits/s speed=1.17x
video:88135kB audio:37691kB subtitle:0kB other streams:0kB global headers:7kB muxing overhead: 1.433324%
[libx264 @ 0000020fdbdabc80] frame I:567 Avg QP:20.88 size: 18473
[libx264 @ 0000020fdbdabc80] frame P:24105 Avg QP:24.91 size: 2497
[libx264 @ 0000020fdbdabc80] frame B:54903 Avg QP:24.65 size: 357
[libx264 @ 0000020fdbdabc80] consecutive B-frames: 1.2% 19.5% 3.1% 76.2%
[libx264 @ 0000020fdbdabc80] mb I I16..4: 23.3% 62.6% 14.1%
[libx264 @ 0000020fdbdabc80] mb P I16..4: 0.6% 1.9% 0.3% P16..4: 19.9% 5.7% 3.4% 0.0% 0.0% skip:68.1%
[libx264 @ 0000020fdbdabc80] mb B I16..4: 0.0% 0.1% 0.0% B16..8: 14.8% 0.6% 0.1% direct: 0.2% skip:84.2% L0:40.3% L1:56.9% BI: 2.8%
[libx264 @ 0000020fdbdabc80] 8x8 transform intra:65.2% inter:76.0%
[libx264 @ 0000020fdbdabc80] coded y,uvDC,uvAC intra: 53.9% 51.2% 17.6% inter: 3.3% 2.5% 0.1%
[libx264 @ 0000020fdbdabc80] i16 v,h,dc,p: 36% 36% 7% 20%
[libx264 @ 0000020fdbdabc80] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 21% 17% 20% 5% 7% 9% 6% 8% 7%
[libx264 @ 0000020fdbdabc80] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 23% 23% 16% 5% 7% 9% 6% 6% 5%
[libx264 @ 0000020fdbdabc80] i8c dc,h,v,p: 58% 21% 17% 4%
[libx264 @ 0000020fdbdabc80] Weighted P-Frames: Y:1.7% UV:0.2%
[libx264 @ 0000020fdbdabc80] ref P L0: 63.4% 17.8% 11.3% 7.4% 0.1%
[libx264 @ 0000020fdbdabc80] ref B L0: 90.7% 6.4% 2.9%
[libx264 @ 0000020fdbdabc80] ref B L1: 97.9% 2.1%
[libx264 @ 0000020fdbdabc80] kb/s:679.80

Attachments (7)

compare.jpg (117.3 KB) - added by markfilipak 12 months ago.
23.976p.mp4 (662.3 KB) - added by pdr0 12 months ago.
OUT.MKV (836.3 KB) - added by pdr0 12 months ago.
blenddeint_combedonly.mp4 (632.0 KB) - added by pdr0 12 months ago.
Various telecines.png (2.1 MB) - added by markfilipak 12 months ago.
blend deinterlace.png (142.0 KB) - added by pdr0 12 months ago.
'telecine=pattern=5' 59.940i.mkv (1.0 MB) - added by markfilipak 12 months ago.

Change History (51)

comment:1 Changed 13 months ago by markfilipak

A workaround would be nice. For example, if I could trick ffmpeg by patching 'frame_rate_code' (from 0100 to 0001) on the fly, but I don't know how to do that.

comment:2 Changed 13 months ago by richardpl

  • Resolution set to wontfix
  • Status changed from new to closed

comment:3 Changed 13 months ago by Balling

Or you could use True Cinema or Match framerate on LG TVs. Just saying. In that case both 23.976 and 24 will look good.

Last edited 13 months ago by Balling (previous) (diff)

comment:4 Changed 13 months ago by markfilipak

No discussion at all, eh?
Have you closed this because you don't understand what I'm doing, or because you don't think it's important?

comment:5 Changed 13 months ago by cehoyos

  • Resolution changed from wontfix to invalid

More likely because there is no bug.

comment:6 Changed 13 months ago by cehoyos

  • Component changed from avfilter to undetermined

comment:7 Changed 13 months ago by richardpl

Probably you want repeatfields filter for soft-telecined files.
But I still think what you doing is pointless.

comment:8 follow-up: Changed 13 months ago by markfilipak

Thanks for having some discussion.

Well, Richard, you can see it for yourself.
ffmpeg -i IN.M2TS -vf "telecine=pattern=5555,bwdif=mode=send_frame" -avoid_negative_ts 1 -c:v libx265 -crf 20 -preset medium -c:a copy -c:s copy OUT.MKV

It works wonderfully for BD content, but not soft-telecined DVD content due to the design of 'telecine'.

With 2-3-2-3 pull-down, 2 of 5 frames are combed and the combed frames adjoin.
With 5-5-5-5 pull-down, only 2 of 10 frames are combed and the combed frames are separated by 4 progressive frames.

In other words, there's no noticable judder and decombing is not needed due to the 1/60th second duration of the combed frame.

In other words, 'telecine=pattern=5555' works as advertised for BDs, but not for DVDs. Now, why would that be, eh?

Last edited 13 months ago by markfilipak (previous) (diff)

comment:9 Changed 13 months ago by markfilipak

I found a trick workaround,

ffmpeg -r:v 24000/1001 -i IN.VOB -vf "telecine=pattern=5555,bwdif=mode=send_frame" -avoid_negative_ts 1 -c:v libx265 -crf 20 -preset medium -c:a copy -c:s copy OUT.MKV

but I still think 'telecine=pattern' should be fixed.

...Over and out. Have a good day; stay safe.

comment:10 Changed 13 months ago by cehoyos

Since other people may read this: Apart from the visual damage above filter chain does, the input option -r should be used with great caution, in this case the filter combination fps=24000/1001,telecine=pattern=5 is saner.

comment:11 in reply to: ↑ 8 ; follow-up: Changed 12 months ago by pdr0

Replying to markfilipak:

Well, Richard, you can see it for yourself.
ffmpeg -i IN.M2TS -vf "telecine=pattern=5555,bwdif=mode=send_frame" -avoid_negative_ts 1 -c:v libx265 -crf 20 -preset medium -c:a copy -c:s copy OUT.MKV

It works wonderfully for BD content, but not soft-telecined DVD content due to the design of 'telecine'.

With 2-3-2-3 pull-down, 2 of 5 frames are combed and the combed frames adjoin.
With 5-5-5-5 pull-down, only 2 of 10 frames are combed and the combed frames are separated by 4 progressive frames.

In other words, there's no noticable judder and decombing is not needed due to the 1/60th second duration of the combed frame.

In other words, 'telecine=pattern=5555' works as advertised for BDs, but not for DVDs. Now, why would that be, eh?

I tested this on a native 24.0p blu ray

The end result is hardcoding duplicate frames in a progressive 3:2 pattern. The output file fps is 60.0 CFR. There is no combing when examining the elementary output stream.

Examine the actual file (actual encoded frames), and their timecodes(timestamps). The average frame display time is 16-17ms (it fluctuates between 16,17 because of container timebase rounding), because it's 60.0 CFR. (1/60 = ~16.6667 ms per frame)

There is no visual cadence difference between this and doing nothing on a 60Hz display. You are just encoding 2.5x more frames for nothing (slower, lower quality at a given bitrate)

I repeated it on a native 23.976p blu ray. Same thing, no combing, 3:2 progressive repeats. Just 59.94 instead of 60.0

comment:12 in reply to: ↑ 11 ; follow-up: Changed 12 months ago by cehoyos

Replying to pdr0:

There is no visual cadence difference between this and doing nothing on a 60Hz display.

There is: Deinterlacing - sadly - comes at a cost.

comment:13 in reply to: ↑ 12 ; follow-up: Changed 12 months ago by pdr0

Replying to cehoyos:

Replying to pdr0:

There is no visual cadence difference between this and doing nothing on a 60Hz display.

There is: Deinterlacing - sadly - comes at a cost.

You are mistaken.

There is no difference cadence wise when checking a 24.0p and 23.976p native BD.

There are no fields produced from that command. It's progressive content with progressive duplicates, encoded progressively

The actual results, ES and timestamps disagree with you

comment:14 in reply to: ↑ 13 ; follow-up: Changed 12 months ago by cehoyos

Replying to pdr0:

Replying to cehoyos:

Replying to pdr0:

There is no visual cadence difference between this and doing nothing on a 60Hz display.

There is: Deinterlacing - sadly - comes at a cost.

You are mistaken.

There is no difference cadence wise when checking a 24.0p and 23.976p native BD.

There are no fields produced from that command. It's progressive content with progressive duplicates, encoded progressively

The actual results, ES and timestamps disagree with you

I meant the visual damage that the deinterlace does to progressive content (and I agree that my answer is misunderstandable, I am not a native speaker).

comment:15 in reply to: ↑ 14 Changed 12 months ago by markfilipak

Replying to cehoyos:

Replying to pdr0:

Replying to cehoyos:

Replying to pdr0:

There is no visual cadence difference between this and doing nothing on a 60Hz display.

There is: Deinterlacing - sadly - comes at a cost.

You are mistaken.

There is no difference cadence wise when checking a 24.0p and 23.976p native BD.

There are no fields produced from that command. It's progressive content with progressive duplicates, encoded progressively

The actual results, ES and timestamps disagree with you

I meant the visual damage that the deinterlace does to progressive content (and I agree that my answer is misunderstandable, I am not a native speaker).

What visual damage Carl Eugen? I see no visual damage. The 60fps is not interlaced.

I would put in some timing diagrams, but this rotten-text, i.e., !(rich-text), doesn't use fixed fonts, not even for code blocks.

comment:16 Changed 12 months ago by markfilipak

|<--------------------------1/6s-------------------------->|
[A/a][B/b][C/c][D/d]
[A/a_][B/b_][B/c_][C/d_][D/d_] ...2-3-2-3 30fps
[A/a_][A/a_][B/b_][B/b_][B/c_][B/c_][C/d_][C/d_][D/d_][D/d_] ...2x(2-3-2-3 30fps)
[A/a_][A/a_][B/b_][B/b_][B/b_][C/c_][C/c_][D/d_][D/d_][D/d_] ...4-6-4-6 60fps
[A/a_][A/a_][A/b_][B/b_][B/b_][C/c_][C/c_][C/d_][D/d_][D/d_] ...5-5-5-5 60fps

Cadance
|<--------------------------1/6s-------------------------->|
[A/a_][A/a_][B/b_][B/b_][B/c_][B/c_][C/d_][C/d_][D/d_][D/d_] ...2x(2-3-2-3 30fps)

AAAA AAAA BBBB BBBB BCBC BCBC CDCD CDCD DDDD DDDD

<---20%----><------30%-------><---20%----><------30%------->

[A/a_][A/a_][B/b_][B/b_][B/b_][C/c_][C/c_][D/d_][D/d_][D/d_] ...4-6-4-6 60fps

AAAA AAAA BBBB BBBB BBBB CCCC CCCC DDDD DDDD DDDD

<---20%----><------30%-------><---20%----><------30%------->

[A/a_][A/a_][A/b_][B/b_][B/b_][C/c_][C/c_][C/d_][D/d_][D/d_] ...5-5-5-5 60fps

AAAA AAAA ABAB BBBB BBBB CCCC CCCC CDCD DDDD DDDD

<-----25%-----><-----25%-----><-----25%-----><-----25%----->

Combing
|<--------------------------1/6s-------------------------->|
[A/a_][A/a_][B/b_][B/b_][B/c_][B/c_][C/d_][C/d_][D/d_][D/d_] ...4-6-4-6 60fps


combed

[A/a_][A/a_][A/b_][B/b_][B/b_][C/c_][C/c_][C/d_][D/d_][D/d_] ...5-5-5-5 60fps


combed combed

Edit: Gawd! This facility is awful.

Last edited 12 months ago by markfilipak (previous) (diff)

Changed 12 months ago by markfilipak

comment:17 follow-up: Changed 12 months ago by markfilipak

Please see the attachment I just attached.

This should be a simple call: 5-5-5-5 pull-down to 60fps gives pictures that are superior in every way, it works for p24 progressive (BD movies), but it fails for p24 soft-telecine (DVD movies) because the 'telecine' filter looks at the metadata (which is bogus). The 'telecine' filter should be fixed.

comment:18 in reply to: ↑ 17 Changed 12 months ago by Balling

Replying to markfilipak:

Please see the attachment I just attached.

This should be a simple call: 5-5-5-5 pull-down to 60fps gives pictures that are superior in every way, it works for p24 progressive (BD movies), but it fails for p24 soft-telecine (DVD movies) because the 'telecine' filter looks at the metadata (which is bogus). The 'telecine' filter should be fixed.

Mmm. There is 24.000 and 23.976, how are you fixing for this? Give mediainfo of that Blu-ray.

comment:19 follow-up: Changed 12 months ago by markfilipak

I'm heartened that this is now getting the attention it deserves... Thank you.

May I reitterate that 'telecine=pattern=5555' works correctly for p24 progressive (BD) video (i.e., 24/1.001 FPS video).

The problem is with p24 soft telecined (DVD) video (i.e., 24/1.001 FPS video that has metadata that specifies 30/1.001 FPS playback). The transcoded DVD video (but not the audio) runs 20% fast because the 'telecine' filter calculates PTS advance as 20/8*30/1.001 (incorrect) instead of 20/8*24/1.001 (correct). This mistake is probably due to a developer misunderstanding regarding what the 'frame_rate_code' MPEG flag means. The flag specifies playback (30/1.001), not what the stream actually is (24/1.001).

"There is 24.000 and 23.976, how are you fixing for this?"
BDs don't require any correction because 'telecine' correctly calculates PTS advance.
DVDs require a '-r:v 24000/1001' directive (as a workaround) to force ffmpeg to 24/1.001 (i.e., to ignore the 30/1.001 MPEG flag). With the workaround, ffmpeg passes 24/1.001 to 'telecine', and 'telecine' then calculates the correct PTS advance. In my opinion, the workaround should not be more than a workaround. In my opinion, the best solution is for 'telecine' to look at the interlace metadata, to recognize that the source is soft telecined, to recognize that 30/1.001 is bogus, and to apply 24/1.001 on its own without being forced.

Edit: To respond to your specific question: If the DVD is actually 24 FPS instead of 24/1.001, the MPEG flag will be 30 instead of 30/1.001. Here are the values of the 'frame_rate_code' flag:
'1' = 24/1.001 FPS
'2' = 24
'3' = 25
'4' = 30/1.001
'5' = 30
'6' = 50
'7' = 60/1.001
'8' = 60

More edit: I've never seen a DVD video that was 24 FPS. Have you?

Last edited 12 months ago by markfilipak (previous) (diff)

comment:20 Changed 12 months ago by markfilipak

May I point out some features of 5-5-5-5 pull-down v. 2-3-2-3 pull-down?

Even at 60 FPS, 2-3-2-3 pull-down (actually, it becomes 4-6-4-6 pull-down) has +/- 5% cadence @ 12Hz, and 40% combing @ 6Hz.

5-5-5-5 pull-down has no cadence and combing is 20% @ 12Hz. 12Hz is generally considered to be below the temporal limit of perception. Therefore, the combing can be uncompensated.

My actual testing confirms this.

comment:21 in reply to: ↑ 19 ; follow-up: Changed 12 months ago by pdr0

Replying to markfilipak:

May I reitterate that 'telecine=pattern=5555' works correctly for p24 progressive (BD) video (i.e., 24/1.001 FPS video).

Not with that command line. You are mistaken. It's not working like you think it is.

May I reiterate I tested this on 2 BD's, 24.0p and 23.976p variants (yes they come in both, NTSC DVD is 23.976 only)

The actual output from that commandline is 3:2 frame repeats. There is no combing

AaAaAa,BbBb,CcCcCc,DdDd

You can reproduce it on a simple animation too if you're worried about copyright. I uploaded a 23.976p animation "23.976p.mp4" and the 23.976_OUT.mkv

ffmpeg -i 23.976p.mp4 -vf "telecine=pattern=5555,bwdif=mode=send_frame" -avoid_negative_ts 1 -c:v libx265 -crf 20 -preset medium -an 23.976_OUT.MKV

If you look closely, you see the addition deinterlacing artifacts from bwdif cehoyos was referring to. There is a slight shimmer. If you check it frame by frame you can't miss it. The text has aliased edges from bwdif . This isn't part of "cadence" discussion, but you're degrading the footage, with no benefit

Cadence wise, it's exactly as I described earlier.

Changed 12 months ago by pdr0

Changed 12 months ago by pdr0

comment:22 follow-up: Changed 12 months ago by markfilipak

Thank you for making those videos. As you claimed, 'telecine=pattern=5' does indeed use 6-4-6-4 pull-down @ 60/1.001 FPS. I don't know why it looks better than 24/1.001 FPS except that my TV must be screwing the telecining of 24/1.001 FPS.

So, even for BD movies, 'telecine=pattern=5' doesn't work as advertised. That's just great.

Unless you folks have any suggestions, it looks like the only way I can save future-proof movies in an archive is to simply remux to MKV and save those. Oh, dear.

comment:23 in reply to: ↑ 22 ; follow-up: Changed 12 months ago by Balling

Replying to markfilipak:

Unless you folks have any suggestions, it looks like the only way I can save future-proof movies in an archive is to simply remux to MKV and save those. Oh, dear.

I do not really understand WHY you are doing that when you know that the right way to do it is to buy a good TV and source that can both switch to 23.976*4 and 120 Hz right away (source sends it not multiplied, yes). (Maybe even with 120 Hz BFI for OLED, yeah I am talking about new LG CX OLED, or use LG True Cinema that takes 24/23.976 in 60p and extracts 23.976 or 24 and then just multiplies every frame to 120 Hz with OLED Motion it looks just like on best 30 000$ reference displays, maybe with some little details.) Also you will need nice player, not Apple TV 4k that does not properly support 24.000. Shameful, Apple. Something like Nvidia shield maybe.

Last edited 12 months ago by Balling (previous) (diff)

comment:24 in reply to: ↑ 23 Changed 12 months ago by markfilipak

Replying to Balling:

Replying to markfilipak:

Unless you folks have any suggestions, it looks like the only way I can save future-proof movies in an archive is to simply remux to MKV and save those. Oh, dear.

I do not really understand WHY you are doing that...

Your responses are off-topic.

I have a nearly-full media server and seek to delete the cruft.

I don't have a fancy TV and the folks I visit don't have fancy TVs.

I'm retired on a fixed income and can't simply eat cake.

comment:25 in reply to: ↑ 21 ; follow-up: Changed 12 months ago by markfilipak

I did this:
ffmpeg -i 23.976p.mp4 -vf "telecine=pattern=5" telecine=pattern=5.mkv

When I play telecine=pattern=5.mkv via MPV, and I single-step, it is doing exactly what I expected and wanted.

I did this:
ffmpeg -i 23.976p.mp4 -vf "telecine=pattern=5,bwdif=mode=send_frame" telecine=pattern=5,bwdif=mode=send_frame.mkv

The problem with telecine=pattern=5,bwdif=mode=send_frame.mkv is that in each 1/6sec, all 10 frames are being decombed instead of just the 2 frames that actually are combed.
P P C P P P P C P P
Decombing a progressive picture shouldn't do anything, but apparently it is (very slight). But so what?

So, this 5-5-5-5 DOES appear to be doing what I want.

comment:26 in reply to: ↑ 25 ; follow-up: Changed 12 months ago by pdr0

Replying to markfilipak:

I did this:
ffmpeg -i 23.976p.mp4 -vf "telecine=pattern=5" telecine=pattern=5.mkv

When I play telecine=pattern=5.mkv via MPV, and I single-step, it is doing exactly what I expected and wanted.

I did this:
ffmpeg -i 23.976p.mp4 -vf "telecine=pattern=5,bwdif=mode=send_frame" telecine=pattern=5,bwdif=mode=send_frame.mkv

The problem with telecine=pattern=5,bwdif=mode=send_frame.mkv is that in each 1/6sec, all 10 frames are being decombed instead of just the 2 frames that actually are combed.
P P C P P P P C P P
Decombing a progressive picture shouldn't do anything, but apparently it is (very slight). But so what?

So, this 5-5-5-5 DOES appear to be doing what I want.

You would need a deinterlacer that either deinterlaces based on specific frames (e.g. every nth frame), or based on a combing threshold. The former can be problematic if there is a cadence break . The latter is adaptive but some types of content might be incorrectly deinterlaced, or some missed

But the judder would be no different when 60fps is returned . If you single rate deinterlace , with top field first, in frames notation you would get AAABBCCCDD . BFF would return AABBBCCDDD . This is 3:2 frame repeats or 2:3 frame repeats. In terms of cadence, that's the same as 24pN displayed on a 60Hz display

comment:27 in reply to: ↑ 26 ; follow-up: Changed 12 months ago by markfilipak

Replying to pdr0:

Replying to markfilipak:

I did this:
ffmpeg -i 23.976p.mp4 -vf "telecine=pattern=5" telecine=pattern=5.mkv

When I play telecine=pattern=5.mkv via MPV, and I single-step, it is doing exactly what I expected and wanted.

I did this:
ffmpeg -i 23.976p.mp4 -vf "telecine=pattern=5,bwdif=mode=send_frame" telecine=pattern=5,bwdif=mode=send_frame.mkv

The problem with telecine=pattern=5,bwdif=mode=send_frame.mkv is that in each 1/6sec, all 10 frames are being decombed instead of just the 2 frames that actually are combed.
P P C P P P P C P P
Decombing a progressive picture shouldn't do anything, but apparently it is (very slight). But so what?

So, this 5-5-5-5 DOES appear to be doing what I want.

You would need a deinterlacer that either deinterlaces based on specific frames (e.g. every nth frame), or based on a combing threshold. The former can be problematic if there is a cadence break ...

telecine=pattern=5 takes care of cadence. There is zero cadence. So (assuming that the 1st frame is frame 1), what I want is deinterlace of every 5th frame, starting with frame 3 (i.e., the 'C's in the 'P P C P P P P C P P' sequence).

... The latter is adaptive but some types of content might be incorrectly deinterlaced, or some missed

But the judder would be no different when 60fps is returned . If you single rate deinterlace , with top field first, in frames notation you would get AAABBCCCDD . BFF would return AABBBCCDDD . This is 3:2 frame repeats or 2:3 frame repeats. In terms of cadence, that's the same as 24pN displayed on a 60Hz display

You certainly know what you're talking about. What you describe is what I'm seeing. Frankly, it surprises me that deinterlace is favoring the 1st field. I thought that the resulting deinterlaced frame would be a line-doubled version of the pixel-by-pixel mean of the 2 fields -- now that I write that, is that 'blended'? -- because that's what I visualized when I dreamed up this whole scheme. It's my rather unsophisticated opinion that whatever is done to remove combing will be okay because the result is visible for only 1/60s. Even no deinterlace looks pretty damn good in my test videos (based on the transcodes of the p24 video that you so graciously provided), and way better than what my 60Hz TV displays when it's fed p24. (Did I mention that I hate judder more than any other video flaw?) :-)

The whole point of this ticket is that 'telecine=pattern=5' works for BD movies but not for DVD movies because -- my best guess -- 'telecine' is looking at the MPEG header's 'frame_rate_code' nibble, seeing '0100' (i.e., 30/1.001), and then miscalculating PTS advance so that the resulting video (but not the audio) runs 20% fast. I appreciate that 'telecine' has to look at the specified frame rate, but, in my humble opinion, it should also look for soft telecine metadata and automatically adjust (to 24/1.001) if it finds soft telecine. Since I've found that there's a '-r:v 24000/1001' (or '-vf fps=24000/1001') workaround, this is not an urgent issue, but it is, in my humble opinion, an issue that shouldn't be dismissed as "wontfix".

If there's a filter that can blend frames 3, 8, 13, 18, etc. I expect that using it (instead of 'bwdif=mode=send_frame') would yield the best results possible (which is what I originally visualized). I'm sorry I'm not an ffmpeg guru; at least, not yet. Is there such a programmable blender? For example, maybe splitting in a '-filter_complex' and selectively merging via some sort of blender that operates only on frames 3, 8, 13, 18, etc.?

May I also mention that I sincerely appreciate the time spent here replying to me. I'm not crazy -- at least, I don't think I'm crazy. :-)

comment:28 in reply to: ↑ 27 ; follow-up: Changed 12 months ago by pdr0

Replying to markfilipak:

what I want is deinterlace of every 5th frame, starting with frame 3 (i.e., the 'C's in the 'P P C P P P P C P P' sequence).

I thought that the resulting deinterlaced frame would be a line-doubled version of the pixel-by-pixel mean of the 2 fields -- now that I write that, is that 'blended'? -- because that's what I visualized when I dreamed up this whole scheme. It's my rather unsophisticated opinion that whatever is done to remove combing will be okay because the result is visible for only 1/60s. Even no deinterlace looks pretty damn good in my test videos (based on the transcodes of the p24 video that you so graciously provided), and way better than what my 60Hz TV displays when it's fed p24. (Did I mention that I hate judder more than any other video flaw?) :-)

I understand what you're saying - that you want to deinterlace only combed frames.

You can blend deinterlace as a general approach, but I don't think ffmpeg has one specifically for blend deinterlacing.

Blend deinterlacing is generally frowned upon, because it produces blurry, ghosted results and does not really produce smoother motion.

This can be visually more disturbing than the original judder. You can argue that it might be slightly more smooth, but not much. Some people would say the visual disturbance makes it worse. "strobes" would be a common description.

In frames notation, where each letter is a progressive frame, and where (AB) or (CD) is a blended frame 50/50 mix you would get

AA(AB)BBCC(CD)DD

You decide for yourself. I uploaded the test video with blend deinterlacing (of combed frames only) "blenddeint_combedonly.mp4"

And yes, there is still judder , because you don't have a blend inserted between BB , CC (BC), so there is a perceptual jump in motion compared to AA(AB)BB , or CC(CD)DD .

You certainly know what you're talking about. What you describe is what I'm seeing. Frankly, it surprises me that deinterlace is favoring the 1st field.

This is expected behaviour. When single rate deinterlacing, you can decide which field to keep, top or bottom. Top field means you would get the 1st of the field pair (the A field), so you end up with 3:2 . Bottom means you select bottom (the b field), you'd get the 2:3 cadence. You can choose by selecting the field order. Top is default (and should be for HD formats by convention)

If there's a filter that can blend frames 3, 8, 13, 18, etc. I expect that using it (instead of 'bwdif=mode=send_frame') would yield the best results possible (which is what I originally visualized). I'm sorry I'm not an ffmpeg guru; at least, not yet. Is there such a programmable blender? For example, maybe splitting in a '-filter_complex' and selectively merging via some sort of blender that operates only on frames 3, 8, 13, 18, etc.?

I don't think ffmpeg has blend deinterlacing.

You can't just blend the telecine output and replace specific frames, because you'd get combed blends, not frame blends (as if frame blends weren't bad enough...)

Changed 12 months ago by pdr0

comment:29 in reply to: ↑ 28 ; follow-up: Changed 12 months ago by markfilipak

Thank you so much. I'm afraid I taxed the patience of some on ffmpeg-user.

Replying to pdr0:

Replying to markfilipak:

what I want is deinterlace of every 5th frame, starting with frame 3 (i.e., the 'C's in the 'P P C P P P P C P P' sequence).

I thought that the resulting deinterlaced frame would be a line-doubled version of the pixel-by-pixel mean of the 2 fields -- now that I write that, is that 'blended'? -- because that's what I visualized when I dreamed up this whole scheme. It's my rather unsophisticated opinion that whatever is done to remove combing will be okay because the result is visible for only 1/60s. Even no deinterlace looks pretty damn good in my test videos (based on the transcodes of the p24 video that you so graciously provided), and way better than what my 60Hz TV displays when it's fed p24. (Did I mention that I hate judder more than any other video flaw?) :-)

I understand what you're saying - that you want to deinterlace only combed frames.

You can blend deinterlace as a general approach, but I don't think ffmpeg has one specifically for blend deinterlacing.

Blend deinterlacing is generally frowned upon, because it produces blurry, ghosted results and does not really produce smoother motion.

Ordinarily, at 30 FPS, 2-3-2-3 pull-down, of course I agree. But consider this: At 30 FPS, the 2 combed frames abut, so blending is 5/12s long and repeats @ 6Hz. But at 60 FPS, 5-5-5-5 pull-down, the 2 combed frames are separated by 4 progressive frames, so blending is 1/60s plus 1/60s separated by 1/15s and blending repeats @ 12Hz. From my experience based on experimentation, each combed frame is so brief that, even not deinterlaced, they are unnoticable. You can reproduce it using the p24 that you gave me and 'telecine=pattern=5' (but without the deinterlace part). I'm going to try to attach a 2MB comparison.

This can be visually more disturbing than the original judder. You can argue that it might be slightly more smooth, but not much. Some people would say the visual disturbance makes it worse. "strobes" would be a common description.

In frames notation, where each letter is a progressive frame, and where (AB) or (CD) is a blended frame 50/50 mix you would get

AA(AB)BBCC(CD)DD

You decide for yourself. I uploaded the test video with blend deinterlacing (of combed frames only) "blenddeint_combedonly.mp4"

I just viewed it. I see no blending. I see frames 3, 8, 13, 18, etc. combed but not blended. That's exactly what I got and what you will see in my attachment (if I can attach a 2MB file).

And yes, there is still judder , because you don't have a blend inserted between BB , CC (BC), so there is a perceptual jump in motion compared to AA(AB)BB , or CC(CD)DD .

I think that most of what you're seeing as judder is actually twitter from the combing. I think it would disappear if the combed fields could actually be blended. Certainly, if it is judder, there's less of it than for any other telecine scheme.

Mentally visualizing this stuff is mind bending, isn't it? :-)

The reason I say it's twitter, not judder, is that if you do the calculations of average temporal location of the temporal center of the pictures and their durations, the cadence really is 1/15s-1/15s-1/15s-1/15s.

You certainly know what you're talking about. What you describe is what I'm seeing. Frankly, it surprises me that deinterlace is favoring the 1st field.

This is expected behaviour. When single rate deinterlacing, you can decide which field to keep, top or bottom. Top field means you would get the 1st of the field pair (the A field), so you end up with 3:2 . Bottom means you select bottom (the b field), you'd get the 2:3 cadence. You can choose by selecting the field order. Top is default (and should be for HD formats by convention)

I understand (though I'm sure you mean the 60 FPS analog of 3:2). However, with 5-5-5-5 pull-down, and if only the combed frames were blended, then those 2 frames would not favor the 1st field or the 2nd field but something in between.

If there's a filter that can blend frames 3, 8, 13, 18, etc. I expect that using it (instead of 'bwdif=mode=send_frame') would yield the best results possible (which is what I originally visualized). I'm sorry I'm not an ffmpeg guru; at least, not yet. Is there such a programmable blender? For example, maybe splitting in a '-filter_complex' and selectively merging via some sort of blender that operates only on frames 3, 8, 13, 18, etc.?

I don't think ffmpeg has blend deinterlacing.

Damn. What about while the 'telecine' output is still in fields? Can 2 fields be blended within a complex_filter?

You can't just blend the telecine output and replace specific frames, because you'd get combed blends, not frame blends (as if frame blends weren't bad enough...)

"Combed blends"? Sorry, I don't know what you mean.

Changed 12 months ago by markfilipak

comment:30 in reply to: ↑ 29 ; follow-up: Changed 12 months ago by pdr0

Replying to markfilipak:


Ordinarily, at 30 FPS, 2-3-2-3 pull-down, of course I agree. But consider this: At 30 FPS, the 2 combed frames abut, so blending is 5/12s long and repeats @ 6Hz. But at 60 FPS, 5-5-5-5 pull-down, the 2 combed frames are separated by 4 progressive frames, so blending is 1/60s plus 1/60s separated by 1/15s and blending repeats @ 12Hz. From my experience based on experimentation, each combed frame is so brief that, even not deinterlaced, they are unnoticable. You can reproduce it using the p24 that you gave me and 'telecine=pattern=5' (but without the deinterlace part). I'm going to try to attach a 2MB comparison.

I just viewed it. I see no blending. I see frames 3, 8, 13, 18, etc. combed but not blended. That's exactly what I got and what you will see in my attachment (if I can attach a 2MB file).

And yes, there is still judder , because you don't have a blend inserted between BB , CC (BC), so there is a perceptual jump in motion compared to AA(AB)BB , or CC(CD)DD .

I think that most of what you're seeing as judder is actually twitter from the combing. I think it would disappear if the combed fields could actually be blended. Certainly, if it is judder, there's less of it than for any other telecine scheme.

Are you looking at the correct file ?

blenddeint_combedonly.mp4

I took -vf "telecine=pattern=5", and that produces combing , we both agree.

I took that encode out of ffmpeg, and applied a blend deinterlace to affected combed frames . This is the end result is you wanted (albeit partially done elsewhere)

Attached screenshot comparison

The reason I say it's twitter, not judder, is that if you do the calculations of average temporal location of the temporal center of the pictures and their durations, the cadence really is 1/15s-1/15s-1/15s-1/15s.

"twitter" is a reserved term for something else

The 3rd frame (starting from frame 1) is what time? It's a blend of 2 times, whether or not you leave it as combed, vs. blend deinterlace.

There is still judder, because either you lack the BC blend (or comb, if you didn't blend deinterlace) , or you have AB or CD inserts. Either way you look at it , there is some judder. It's as I described above

AA(AB)BBCC(CD)DD

Damn. What about while the 'telecine' output is still in fields? Can 2 fields be blended within a complex_filter?

You can't just blend the telecine output and replace specific frames, because you'd get combed blends, not frame blends (as if frame blends weren't bad enough...)

"Combed blends"? Sorry, I don't know what you mean.

Actually, nevermind it won't happen here because you have single frames combs only, never 2 or more in a row.

But to blend deinterlace, you need to blend the fields. Not sure how to do this in ffmpeg. Maybe separate fields , use tblend and select in filter_complex.

Changed 12 months ago by pdr0

comment:31 follow-up: Changed 12 months ago by pdr0

I see what you're saying, if you count the middle of the time of the blend, you have four evenly spaced (in time) frames. And you're right, it is smoother in that sense.

But there is a warping / strobing effect introduced from the blends. It has to do with perception and object permanence. It varies between people. The shape changes, and some people find the blends disconcerting or nauseating

comment:32 in reply to: ↑ 30 ; follow-up: Changed 12 months ago by markfilipak

Replying to pdr0:

Replying to markfilipak:

-snip-

I think that most of what you're seeing as judder is actually twitter from the combing. I think it would disappear if the combed fields could actually be blended. Certainly, if it is judder, there's less of it than for any other telecine scheme.

Are you looking at the correct file ?

blenddeint_combedonly.mp4

Yes, but -- embarrased -- I didn't see any difference. In single-step, I do now.

You know what? After looking at full-speed, 4-6-4-6 v. 5-5-5-5, both combed and deinterlaced, I actually think 5-5-5-5 combed looks best. I'll attach it.

-snip-

The reason I say it's twitter, not judder, is that if you do the calculations of average temporal location of the temporal center of the pictures and their durations, the cadence really is 1/15s-1/15s-1/15s-1/15s.

"twitter" is a reserved term for something else

It is. It is? I've not found a definition for "twitter", though it's cited a lot. Got a link?

The 3rd frame (starting from frame 1) is what time? It's a blend of 2 times, whether or not you leave it as combed, vs. blend deinterlace.

Agreed. But, you see, I think in terms of visual density, not edges. So, to me, the moving visual density of the combed version of 5-5-5-5 looks best... maybe it's just me.

AA(AB)BBCC(CD)DD

Oh, I get it. You're showing progressive frames as A, for example, and interlaced/combed as (AB). I portray the same thing like this:

[A/a][A/a][A/b][B/b][B/b][C/c][C/c][C/d][D/d][D/d]

Uppercase: Odd lines
Lowercase: even

You might find this handy: I portray 30fps-telecine thusly

[A/a][B/b][B/c][C/d][D/d]

and 30fps-telecast fields thusly

[1/-][-/2][3/-][-/4][5/-][-/6]

and with underline chars when needed to keep timing relationships. For example:

[A/a__________][B/b__________][C/c__________][D/d__________]
[A/a_______][B/b_______][B/c_______][C/d_______][D/d_______]
[1/-_][-/2_][3/-_][-/4_][5/-_][-/6_][7/-_][-/8_][9/-_][-/0_]

(Of course, this only works for monospaced fonts.)

Changed 12 months ago by markfilipak

comment:33 in reply to: ↑ 31 Changed 12 months ago by markfilipak

Replying to pdr0:

I see what you're saying, if you count the middle of the time of the blend, you have four evenly spaced (in time) frames. And you're right, it is smoother in that sense.

But there is a warping / strobing effect introduced from the blends. It has to do with perception and object permanence. It varies between people. The shape changes, and some people find the blends disconcerting or nauseating

I agree. That's why I think the combed version is better. It is what I just attached (but I got the name wrong ...it's not i60 ...f'ed up!)

comment:34 in reply to: ↑ 32 ; follow-up: Changed 12 months ago by pdr0

Replying to markfilipak:

-snip-

The reason I say it's twitter, not judder, is that if you do the calculations of average temporal location of the temporal center of the pictures and their durations, the cadence really is 1/15s-1/15s-1/15s-1/15s.

"twitter" is a reserved term for something else

It is. It is? I've not found a definition for "twitter", though it's cited a lot. Got a link?

"Twitter" is what kids use these days...joking :). It's a term that describes moire and aliasing artifacts in motion . The most common cause is interlace and deinterlacing artifacts from low quality deinterlacing, but there are other causes. If you click on the "show" on the wikipedia link, you can see simulated buzzing line "twitter" artifacts . I assumed you were looking at the correct file with blend deinterlacing. I wouldn't describe that as "twitter" . The combed version would be closer, but still not quite what you would call twitter
https://en.wikipedia.org/wiki/Interlaced_video#Interline_twitter

I guess it comes down personal preference. I'm not a fan of the combed version either; to me it's the worst overall because the artifacts are too overpowering to ignore. Out of the 3, I prefer the original , maybe because I grew up with it in North America. There is judder, but you can live with it. (It was mentioned earlier, these days, cheap judder free TV's are quite common, but I'll leave it at that)

And I did find a ffmpeg blend deinterlacer, -vf pp=lb blend deinterlaces everything, non selectively. Not sure how to apply it selectively in ffmpeg. But if you're happy with combed version, you're not going to pursue it anyways

Last edited 12 months ago by pdr0 (previous) (diff)

comment:35 in reply to: ↑ 34 ; follow-up: Changed 12 months ago by markfilipak

Replying to pdr0:

Replying to markfilipak:

The reason I say it's twitter, not judder, is that if you do the calculations of average temporal location of the temporal center of the pictures and their durations, the cadence really is 1/15s-1/15s-1/15s-1/15s.

"twitter" is a reserved term for something else

It is. It is? I've not found a definition for "twitter", though it's cited a lot. Got a link?

"Twitter" is what kids use these days...joking :). It's a term that describes moire and aliasing artifacts in motion . The most common cause is interlace and deinterlacing artifacts from low quality deinterlacing, but there are other causes. If you click on the "show" on the wikipedia link, you can see simulated buzzing line "twitter" artifacts . I assumed you were looking at the correct file with blend deinterlacing. I wouldn't describe that as "twitter" . The combed version would be closer, but still not quite what you would call twitter
https://en.wikipedia.org/wiki/Interlaced_video#Interline_twitter

I do some technical writing. I'd appreciate your opinion of the following

"Twitter is when finely textured areas such as ventilation grates, brick walls, and textured paper appear to rapidly flash on-off-on-off or to flash changing colors. Twitter can make an actor's face appear to pulsate during close-ups. Twitter is actually a milder case of combing, but over an area. For uniformly patterned surfaces, twitter can produce moire."

I'm not sure that twitter applies to faces appearing to pulsate.

I guess it comes down personal preference. I'm not a fan of the combed version either; to me it's the worst overall because the artifacts are too overpowering to ignore. Out of the 3, I prefer the original...

Meaning: The 24fps video you made, right? That's a good source for testing aspects of transcoding animations, but I wonder how applicable it is to analog movies. 5-5-5-5 seems to work wonderfully for movies. This brings up a bushel basket of issues.

You know, I pissed off some folks at ffmpeg-user because I don't accept expedient answers like, "It's a matter of personal taste." Does such an answer really inform?

When digital TV was introduced, I was surprised that it was architected solely as the digital analog of analog TV, that digital TV didn't (and still doesn't) support random access. I naively expected that when a picture was sent to a digital TV it would remain on the screen awaiting changes -- with the addition of just 1 transistor, the pixels of flat panel TV screens can be memory cells. I'm confident that you're savvy enough to see how efficient that would be regarding streams & compression, and how beneficial it would be to free TV from frames and refresh rates. I mention this as out-of-the-box (or maybe off-planet) thinking. I understand how I can piss people off and I try to not overwhelm, but I often fail.

You folks at ffmpeg are leaders in video. I'd like to contribute but have been frustrated.

comment:36 in reply to: ↑ 35 ; follow-up: Changed 12 months ago by pdr0

Replying to markfilipak:

I do some technical writing. I'd appreciate your opinion of the following

"Twitter is when finely textured areas such as ventilation grates, brick walls, and textured paper appear to rapidly flash on-off-on-off or to flash changing colors. Twitter can make an actor's face appear to pulsate during close-ups. Twitter is actually a milder case of combing, but over an area. For uniformly patterned surfaces, twitter can produce moire."

I'm not sure that twitter applies to faces appearing to pulsate.

The face part isn't a good example, because faces are more organic and have curved lines and tend to hide those types of artifacts with the exception of the eyelids (straight line).

Twitter can be described in terms of sampling theory and nyquist theorem. It's aliasing artifacts, when viewed in motion. It boils down to undersampling. Essentially gaps in information, which are more easily identified in things such as straight lines , and that's what twitter usually refers to (lines, edges). In layman' s terms it looks likes jaggy buzzing lines. (Of course you can get "jaggy buzzing lines" from other things too, such as compression artifacts, but twitter has a characteristic look)

Meaning: The 24fps video you made, right? That's a good source for testing aspects of transcoding animations, but I wonder how applicable it is to analog movies. 5-5-5-5 seems to work wonderfully for movies. This brings up a bushel basket of issues.

Yes, it's a synthetic high contrast test, but it's still predictive of the issues you see on real content. Real content will typically also have motion blur, so the effect can be reduced somewhat

Go test it out. I can see the combing when on trying this on a BD. Yes, certain scenes and types of content hide it well. On others it sticks out like a sore thumb. You have a nice high quality progressive BD, and now there is combing artifacts, sometimes everywhere, across the whole screen, not just limited to a tiny "text" area. But it's there and you can see it. It's terrible in my opinion, even worse than blends. It's so distracting that it ruins the viewing experience - not a hyperbole.

Some types of displays might have additional processing and decomb it, so you might not see it. My living room TV is 120Hz, but the computer monitor I'm testing this on is 60Hz and I have everything set to to play progressive, no processing. It looks bad. You said you hated the judder, so maybe it's an acceptable compromise for you

I'd like to contribute but have been frustrated.

Same. Frustrated at times too...

1) Often programmers seem like a different breed....But there are many different types of people in the world. Learn to live with it.

2) There is a certain decorum or way of expected behaviour on certain boards and forums. Observe and learn what is expected before you contribute

3) Don't stop trying to contribute. That's how you improve open source projects.

4) If you keep getting stonewalled, try a different approach, frame it slightly differently, or provide more facts and evidence, try to build a more convincing case.

comment:37 in reply to: ↑ 36 ; follow-up: Changed 12 months ago by markfilipak

Replying to pdr0:

Replying to markfilipak:

I do some technical writing. I'd appreciate your opinion of the following

"Twitter is when finely textured areas such as ventilation grates, brick walls, and textured paper appear to rapidly flash on-off-on-off or to flash changing colors. Twitter can make an actor's face appear to pulsate during close-ups. Twitter is actually a milder case of combing, but over an area. For uniformly patterned surfaces, twitter can produce moire."

I'm not sure that twitter applies to faces appearing to pulsate.

The face part isn't a good example, because faces are more organic and have curved lines and tend to hide those types of artifacts with the exception of the eyelids (straight line).

Twitter can be described in terms of sampling theory and nyquist theorem.

I'm familiar with Dr. Nyquist: Gausian transfer of energy (analog impulsive response) requiring 2x sampling in the digital (z-plane) domain, etc. It's sort of the frequency analog of complex (aka imaginary) power when sinusoidal voltage and current are out of phase due to reactive (inductive and/or capacitive) components (which is real, so I don't use the word "imaginary").

It's aliasing artifacts, when viewed in motion. It boils down to undersampling.

Well, undersampling should produce image fade (amplitude loss due to insufficent energy transfer), not aliasing.

Essentially gaps in information, which are more easily identified in things such as straight lines , and that's what twitter usually refers to (lines, edges). In layman' s terms it looks likes jaggy buzzing lines. (Of course you can get "jaggy buzzing lines" from other things too, such as compression artifacts, but twitter has a characteristic look)

I believe that, except for compression artifacts, what you describe is combing, which is a temporal artifact of the clash of interlaced field lines from differing sample frames/times.

Meaning: The 24fps video you made, right? That's a good source for testing aspects of transcoding animations, but I wonder how applicable it is to analog movies. 5-5-5-5 seems to work wonderfully for movies. This brings up a bushel basket of issues.

Yes, it's a synthetic high contrast test, but it's still predictive of the issues you see on real content. Real content will typically also have motion blur, so the effect can be reduced somewhat

Well, motion blur is the human response to combing of motion objects, especially if smoothing is applied as most TVs apply smoothing. That's the first thing I turn off in TV setups. The second is "image enhancement" (i.e., sharpening and contrast enhancement or active contrast or motion enhancement, etc., whatever they call it). Basically, I flatten everything and desaturate the picture until it portrays scenes naturally, like in a movie theater. Then I put up a test picture and compare it with the photograph used to make the test picture and adjust gamma & RGB gain.

Go test it out. I can see the combing when on trying this on a BD. Yes, certain scenes and types of content hide it well. On others it sticks out like a sore thumb. You have a nice high quality progressive BD, and now there is combing artifacts, sometimes everywhere, ...

Indeed, it would be everywhere in panning shots.

... across the whole screen, not just limited to a tiny "text" area. But it's there and you can see it. It's terrible in my opinion, even worse than blends. It's so distracting that it ruins the viewing experience - not a hyperbole.

Some types of displays might have additional processing and decomb it, so you might not see it. My living room TV is 120Hz, but the computer monitor I'm testing this on is 60Hz and I have everything set to to play progressive, no processing. It looks bad. You said you hated the judder, so maybe it's an acceptable compromise for you


I'd like to contribute but have been frustrated.


Same. Frustrated at times too...


1) Often programmers seem like a different breed...

Before I retired, I lived in/around San Jose, California for 25 years. All my friends were programmers. To a man, they thought that ordinary people (i.e., non-programmers) were uniformly unimaginative/stupid. To a man, they thought that the world would be a better place if only they ruled. Combine ego and naivity and that's what you get. Bad as the current U.S. president is, they would be worse. I got along fine with them of course, and we had fun, but I smiled a lot.

comment:38 in reply to: ↑ 37 ; follow-up: Changed 12 months ago by pdr0

Replying to markfilipak:

It's aliasing artifacts, when viewed in motion. It boils down to undersampling.

Well, undersampling should produce image fade (amplitude loss due to insufficent energy transfer), not aliasing.

It's undersampling in the most simple sense. That is what interlace is: Spatial undersampling, but full temporal sampling. Each field has half the spatial information of a full progressive frame. A straight line becomes jagged dotted line when deinterlaced, because that field is resized to a full sized frame and only 50% the line samples are present. That line information is undersampled. In motion, those lines appear to "twitter". Higher quality deinterlacing algorithms attempt to adaptively fill in the gaps and smooth everything over, so it appears as if it was a true progressive frame.

As motioned earlier, there are other causes, but low quality deinterlacing is the most common. Other common ones are pixel binning or sampling every nth pixel. Eg. large sensor DSLR's when shooting video mode. Instead of a proper resize (with interpolation kernels), dropping pixels is performed. The image has gaps and is undersampled . These manifest as moire patterns and buzzing "twitterin" lines

Essentially gaps in information, which are more easily identified in things such as straight lines , and that's what twitter usually refers to (lines, edges). In layman' s terms it looks likes jaggy buzzing lines. (Of course you can get "jaggy buzzing lines" from other things too, such as compression artifacts, but twitter has a characteristic look)

I believe that, except for compression artifacts, what you describe is combing, which is a temporal artifact of the clash of interlaced field lines from differing sample frames/times.

Yes , combing is usually 2 fields are sampled from 2 different points in time. Interlace looks like this. Progressive field shifting as well. These are temporal issues.

There are other causes, but it comes down to anything that causes misalignment of fields will manifest as combing eg. film warping when it was run through the scanner. This is actually a spatial problem. The field pairs actually come from the same point in time

Yes, it's a synthetic high contrast test, but it's still predictive of the issues you see on real content. Real content will typically also have motion blur, so the effect can be reduced somewhat

Well, motion blur is the human response to combing of motion objects, especially if smoothing is applied as most TVs apply smoothing. That's the first thing I turn off in TV setups. The second is "image enhancement" (i.e., sharpening and contrast enhancement or active contrast or motion enhancement, etc., whatever they call it). Basically, I flatten everything and desaturate the picture until it portrays scenes naturally, like in a movie theater. Then I put up a test picture and compare it with the photograph used to make the test picture and adjust gamma & RGB gain.

The motion blur being referred to here is the one applied by the camera shutter. Images caught on film or digital equivalent will have motion blur. The faster the shutter speed, the less the motion blur.

This animation was made without motion blur on purpose (typically you add motion blur during or in post to animations) . The high contrast edges without blur serve to highlight various issues.

comment:39 in reply to: ↑ 38 ; follow-up: Changed 12 months ago by markfilipak

Replying to pdr0:

Replying to markfilipak:

It's aliasing artifacts, when viewed in motion. It boils down to undersampling.

Well, undersampling should produce image fade (amplitude loss due to insufficent energy transfer), not aliasing.

It's undersampling in the most simple sense. That is what interlace is: Spatial undersampling, ...

I hesitate to take more of your time, but you seem to want to continue and that's fine with me. Exploration is fun, and we seem to have a private channel in this closed ticket.

Undersampling is really a vague term, isn't it? Is sampling film at a lower resolution than half the size of a silver halide grain undersampling? Or is undersampling resampling a digital image (or audio stream) at less than twice the frequency (or half the area) of the original samples. Dr. Nyquist would say that they are both "undersampled".

Answer for yourself: Is dividing a picture into 2 half-pictures really undersampling? Is it really sampling at all?

... but full temporal sampling. ...

Full temporal sampling? Life doesn't pass in 1/24th second increments. Film and video both undersample life. But of course that's not what you mean. :)

What you mean is that transcoding a 24fps stream to anything less than 48fps is temporal subsampling, and in that, Dr. Nyquist and I would both agree with you.

When you say that that transcoding 24fps to 24fps is not subsampling but merely dividing a picture into half-pictures is subsampling, are you being consistent?

... Each field has half the spatial information of a full progressive frame. ...

Does that make it subsampling?

Before CCDs, when a 35mm Academy format film frame was sampled, it was 'snapped' through a flying spot apperture

29x27 µm for 576-line SD
29x33 µm for 480-line SD
14.5x14.5 µm for HD (2K)

Merely dividing those samples into odd and even numbered lines of pixels without changing the area of the samples (µm)... Is that really subsampling?

... A straight line becomes jagged dotted line when deinterlaced, because that field is resized to a full sized frame and only 50% the line samples are present. ...

Now, I'm sure you know that fields -- I prefer to call them "half-pictures" -- aren't converted to frames without first reinterlacing -- I prefer to call it "reweaving" -- the half-picture-pairs to reconstruct the original pictures. And I'm sure you know that the only way a reconstructed picture has a jagged line is if the original film frame had a jagged line.

So I assume that what you are describing is bobbing. But bobbing isn't undersampling either. If anything, bobbing is underdisplaying, don't you agree?

... That line information is undersampled. In motion, those lines appear to "twitter". Higher quality deinterlacing algorithms attempt to adaptively fill in the gaps and smooth everything over, so it appears as if it was a true progressive frame.

My opinion is that deinterlacing algorithms should reweave the half-picture lines and nothing more. A user can insert more filters if more processing is desired, but trying to work around 'mystery' behavior of filters that do more than you think they do is crazy making.

As motioned earlier, there are other causes, but low quality deinterlacing is the most common. Other common ones are pixel binning ...

You know, I've run across that term maybe once or twice... I don't know what it means.

... or sampling every nth pixel. Eg. large sensor DSLR's when shooting video mode.

Is that undersampling or simply insufficient resolution?

You see, a word like "undersampling" can become so broad that it's utility as a word is lost.

I will continue my responses, but I have to reboot now because the javascript at the Johns Hopkins COVID-19 site is causing problems with my browser. I'm surprised this window survived.

By the way, if you exclude COVID-19 cases that are on-going and consider only cases that have been resolved (recovered plus fatal), over 20% of the people who get COVID-19 die.

Last edited 12 months ago by markfilipak (previous) (diff)

comment:40 in reply to: ↑ 39 ; follow-up: Changed 12 months ago by pdr0

Replying to markfilipak:

Undersampling is really a vague term, isn't it? Is sampling film at a lower resolution than half the size of a silver halide grain undersampling? Or is undersampling resampling a digital image (or audio stream) at less than twice the frequency (or half the area) of the original samples. Dr. Nyquist would say that they are both "undersampled".

Undersampling here is being used in the most simple sense. If you have 100 samples of something, and you throw away half , now you have 50 . Which has more samples ? It's not a trick question..

Answer for yourself: Is dividing a picture into 2 half-pictures really undersampling? Is it really sampling at all?

That's not undersampling , because you can reassemble 2 half picture to get the full original picture . This is not the case with interlace content (in motion scenes, that is)

Full temporal sampling? Life doesn't pass in 1/24th second increments. Film and video both undersample life. But of course that's not what you mean. :)

What you mean is that transcoding a 24fps stream to anything less than 48fps is temporal subsampling, and in that, Dr. Nyquist and I would both agree with you.

Sampling in this context is in terms of a reference point. It's relative. If you have 24 fps, that's your original reference point . There are 24 pictures taken per second. Or , if you like, 1 second is represented by 24 samples. If you discard half the motion samples, you now have 12 pictures/s. That is temporal undersampling with respect to the original 24 fps.

When you say that that transcoding 24fps to 24fps is not subsampling but merely dividing a picture into half-pictures is subsampling, are you being consistent?

Yes it's consistent. Nothing is being discarded when you go 24fps to 24fps (ignoring lossy compression for this discussion) . Dividing into half pictures - nothing is being discarded

BUT interlace content - now something is being discarded . Do you see the difference ?

(In motion , that is. When there is no motion, a "smart" motion adapative deinterlacer is supposed to weave the picture)

... Each field has half the spatial information of a full progressive frame. ...

Does that make it subsampling?

For interlace content - yes it does. Half the spatial information is missing in scenes with motion

For progressive content, arranged in fields (2:2 pulldown, or PsF) , no, because you can re-arrange it back to the original (ignoring lossy compression again)

... A straight line becomes jagged dotted line when deinterlaced, because that field is resized to a full sized frame and only 50% the line samples are present. ...

Now, I'm sure you know that fields -- I prefer to call them "half-pictures" -- aren't converted to frames without first reinterlacing -- I prefer to call it "reweaving" -- the half-picture-pairs to reconstruct the original pictures. And I'm sure you know that the only way a reconstructed picture has a jagged line is if the original film frame had a jagged line.

You need to differentiate between interlaced content and progressive content.

With progressive content, 2 field pairs come from the same frame and can be reassembled and weaved back to a full progressive frame. No problem. This happens everyday . eg. Progressive, PAL content is 2:2 ( broadcast, DVD's) , PSF . It can occur out of order too, the fields can be found later or earlier, then matched with it's proper field pair in the process of "field matching"

But with interlaced content, the 2 field pairs are different in motion scenes. You're missing 1/2 the spatial information. There is no matching field pair. That is where you get the twitter artifacts (you have spatial undersampling). This is also the case with your combed frame with your 5555 pulldown . You have 2 fields that come from different times. Each is undersampled in terms of their own time. They are missing their "partner" field pair . Of course you can find it earlier or later in the stream, from field matching, but then you have to decide on which, top or bottom again, and you get your 3:2 or 2:3 frame repeats. If you instead weave, or try to treat is as progressive, you get back the combing.

So I assume that what you are describing is bobbing. But bobbing isn't undersampling either. If anything, bobbing is underdisplaying, don't you agree?

Historically, "bobbing" is just separating fields. When you view separate fields - each even, odd field has an offset. It looks like up/down/up/down or "bobbing" up and down. But the term has come to include both separating fields and resizing the fields to frames .

So if you bob deinterlace progressive content - then yes you are causing undersampling (and resulting artifacts, like twitter) . Each field pair of the progressive frame can no longer be field matched properly. 24fps becomes 48fps and each frame will have aliasing artifacts

But if you bob deinterlace interlaced content, the undersampling was there to begin with (with the exception of no motion content) . The process of making the interlace, dropping the spatial samples is what caused the undersampling

... That line information is undersampled. In motion, those lines appear to "twitter". Higher quality deinterlacing algorithms attempt to adaptively fill in the gaps and smooth everything over, so it appears as if it was a true progressive frame.

My opinion is that deinterlacing algorithms should reweave the half-picture lines and nothing more. A user can insert more filters if more processing is desired, but trying to work around 'mystery' behavior of filters that do more than you think they do is crazy making.

For progressive content, you do not deinterlace, because it damages the picture. You saw that in the very first example. You should field match. Deinterlacing is not the same thing as field matching

For interlace content, you can't "reweave the half-picture lines" , or you get back the combing

As motioned earlier, there are other causes, but low quality deinterlacing is the most common. Other common ones are pixel binning ...

You know, I've run across that term maybe once or twice... I don't know what it means.

... or sampling every nth pixel. Eg. large sensor DSLR's when shooting video mode.

Is that undersampling or simply insufficient resolution?

It's undersampling. The resolution of DSLR sensors is very high. 30-50 megapixels on average. 120-150 Megapixel cameras available. 1920x1080 HD video is only ~2K . 2Megapixels. It should be massively oversampled for video. But because of overheating and processing requirements (it takes lots of CPU or hardware to do proper downscaling in realtime), they (especially 1st 5-6 generations of DSLRs) , drop every nth pixel when taking video. Some newer ones are better now, but this is a very common source of aliasing and line twitter

You see, a word like "undersampling" can become so broad that it's utility as a word is lost.

It can be if you want it to be, but it's just the most simple sense of the word here

I will continue my responses, but I have to reboot now because the javascript at the Johns Hopkins COVID-19 site is causing problems with my browser. I'm surprised this window survived.

By the way, if you exclude COVID-19 cases that are on-going and consider only cases that have been resolved (recovered plus fatal), over 20% of the people who get COVID-19 die.

Stay safe

Last edited 12 months ago by pdr0 (previous) (diff)

comment:41 in reply to: ↑ 40 ; follow-up: Changed 12 months ago by markfilipak

Replying to pdr0:

Replying to markfilipak:

Undersampling is really a vague term, isn't it? Is sampling film at a lower resolution than half the size of a silver halide grain undersampling? Or is undersampling resampling a digital image (or audio stream) at less than twice the frequency (or half the area) of the original samples. Dr. Nyquist would say that they are both "undersampled".

Undersampling here is being used in the most simple sense. If you have 100 samples of something, and you throw away half , now you have 50 . Which has more samples ? It's not a trick question..

Sorry, but I don't know what you had in mind when you wrote "Undersampling here". To what does "here" refer? Are you referring to sampling film? Are you referring to resampling an image? Or are you referring to something else?

I thought we were discussing interlace and aliasing. Instead of 100 samples, throw away 50, how about a real case? Each picture on a 720x480 DVD has 345600 samples. Are you saying that hard telecine with interlaced fields, for example, that divides the pictures into half-pictures with 172800 samples each is throwing half the samples away? What video product throws half the pixels away?

Answer for yourself: Is dividing a picture into 2 half-pictures really undersampling? Is it really sampling at all?

That's not undersampling , because you can reassemble 2 half picture to get the full original picture . This is not the case with interlace content

Okay, I see this is a case of miscommunication. According to the MPEG spec, "interlaced" means i30-telecast. What I diagram as

[1/-][-/2][3/-][-/4] etc.   ...TFF
[-/1][2/-][-/3][4/-]        ...BFF

But most people erroneously also call progressive video that has been unweaved into fields, "interlaced". What I diagram as

[A/a_____][B/b_____][C/c_____][D/d_____] etc.   ...source frames
[A/-][-/a][B/-][-/b][C/-][-/c][D/-][-/d] etc.   ...field-pairs

Full temporal sampling? Life doesn't pass in 1/24th second increments. Film and video both undersample life. But of course that's not what you mean. :)

What you mean is that transcoding a 24fps stream to anything less than 48fps is temporal subsampling, and in that, Dr. Nyquist and I would both agree with you.

Sampling in this context is in terms of a reference point. It's relative. If you have 24 fps, that's your original reference point . There are 24 pictures taken per second. If you discard half the motion samples, you now have 12 pictures/s. That is temporal undersampling with respect to the original 24 fps.

Agreed.

When you say that that transcoding 24fps to 24fps is not subsampling but merely dividing a picture into half-pictures is subsampling, are you being consistent?

Yes it's consistent. Nothing is being discarded when you go 24fps to 24fps (ignoring lossy compression for this discussion) . Dividing into half pictures - nothing is being discarded

But that's not subsampling, is it?

BUT interlace content - now something is being discarded . Do you see the difference ?

No. I don't. Each DVD picture has 345600 samples regardless of whether those 345600 samples are in one frame or two interlace fields.

... Each field has half the spatial information of a full progressive frame. ...

Does that make it subsampling?

For interlace content - yes it does. Half the spatial information is missing

No, it's not. It's just formatted into fields instead of frames. Three's nothing missing.

For progressive content, arranged in fields (2:2 pulldown, or PsF) , no, because you can re-arrange it back to the original (ignoring lossy compression again)

Now you are making sense to me, so I must be misinterpreting everything preceeding that paragraph.

... A straight line becomes jagged dotted line when deinterlaced, because that field is resized to a full sized frame and only 50% the line samples are present. ...

Now, I'm sure you know that fields -- I prefer to call them "half-pictures" -- aren't converted to frames without first reinterlacing -- I prefer to call it "reweaving" -- the half-picture-pairs to reconstruct the original pictures. And I'm sure you know that the only way a reconstructed picture has a jagged line is if the original film frame had a jagged line.

You need to differentiate between interlaced content and progressive content.

Now you seem to be going back to the definition of "interlace" used in the MPEG spec. If, by "interlaced", you mean the fields in a i30-telecast (or i25-telecast) -- "NTSC" & "PAL" do not actually exist in digital media -- then each temporal field is discrete and has 172800 samples. Again, nothing is thrown away.

The only process that I know of that takes one field and makes a frame of it is bobbing. Nothing is being thown away, even then, but the results indeed are half resolution. Bobbing certainly is the only sane way to turn telecast fields into frames. If that's the source of the line aliasing to which you refer, then I agree of course (and the discussion is so basic I can't see why we're even engaging in it), and I'm surprised because that is such an obscure application that has nothing to do with progressive sources like movies and, again, nothing is thrown away. So when you refer to something thrown away... throwing away half the pixels, I have no idea to what you refer. Sorry.

With progressive content, 2 field pairs come from the same frame and can be reassembled and weaved back to a full progressive frame. No problem. This happens everyday . eg. Progressive, PAL content is 2:2 ( broadcast, DVD's) , PSF . It can occur out of order too, the fields can be found later or earlier, then matched with it's proper field pair in the process of "field matching"

I know that instead of "2 field pairs come" you meant to write "a field pair comes". I didn't know about field matching, but then, I know little of the internals of streams aside from layers and chroma and macroblocks. I do know MPEG PES quite well though, including the offsets to all the headers and their metadata and the header codes that are used to locate them in the stream.

But with interlaced content, the 2 field pairs are different. You're missing 1/2 the spatial information. There is no matching field pair.

Thank you for trying to put this discussion into the terms I favor. Let me help because I can tell you're confused.

When I refer to field pair, I mean the 2 fields that contain the 2 half-pictures that have been unweaved from a single origin picture. By "origin" I mean the film frame (or digital cinema frame) that serves as the input to the mastering process. By "source" I mean the samples that are output by the mastering process. By "target" I mean the transformed images that result from transcoding.

By "picture" I mean what is defined in the MPEG spec as "picture". I reserve "field" solely for telecast video only, whereas I use "field-pair" for progressive content that's been unweaved and interlaced.

You see, having precise terms is important, and I'm pretty confident that you agree. I've seen so many discussions that degraded into abusive rhetoric due to misunderstandings caused by vague and/or ambiguous terms. But whenever I've advocated for better terms and have proposed more precise terms, I've been pretty viciously attacked. So now, I just use my terms and wait for people to ask me what I mean. I suppose that, if the baked-in video terminology (which is pretty awful) changes, then, like politics, the change will take a full generation.

That is where you get the twitter artifacts (you have spatial undersampling). This is also the case with your combed frame with your 5555 pulldown . You have 2 fields that come from different times. Each is undersampled in terms of their own time. They are missing their "partner" field pair . Of course you can find it earlier or later in the stream, from field matching, but then you have to decide on which, top or bottom again, and you get your 3:2 or 2:3 frame repeats. If you instead weave, or try to treat is as progressive, you get back the combing.

Pictures are better than words.

|<--------------------------1/6s-------------------------->|
[A/a__________][B/b__________][C/c__________][D/d__________]   ...p24 origin
[A/a_______][B/b_______][B/c_______][C/d_______][D/d_______]   ...2-3-2-3 pull-down
                        <--------combed-------->
[A/a_______][A/b_______][B/c_______][C/c_______][D/d_______]   ...3-2-3-2 pull-down
            <--------combed-------->
[A/a_][A/a_][A/b_][B/b_][B/b_][C/c_][C/c_][C/d_][D/d_][D/d_]   ...5-5-5-5 pull-down
            <comb>                        <comb>

So I assume that what you are describing is bobbing. But bobbing isn't undersampling either. If anything, bobbing is underdisplaying, don't you agree?

Historically, "bobbing" is just separating fields. When you view separate fields, each even, odd field has an offset. It looks like up/down/up/down or "bobbing" up and down. But the term has come to include both separating fields and resizing the fields to frames .

So if you bob deinterlace progressive content - then yes you are causing undersampling (and resulting artifacts, like twitter) . Each field pair of the progressive frame can no longer be field matched properly. 24fps becomes 48fps and each frame will have aliasing artifacts

Nobody bobs progressive content. Bring up such a silly case is pointless.

But if you bob deinterlace interlaced content, the undersampling was there to begin with . The process of making the interlace, dropping the spatial samples is what caused the undersampling

You're talking about dropping samples again. For telecasts, the non-existent field never existed. The 'current' field can never be part of a field-pair. Nothing is dropped. it never existed.

... That line information is undersampled. In motion, those lines appear to "twitter". Higher quality deinterlacing algorithms attempt to adaptively fill in the gaps and smooth everything over, so it appears as if it was a true progressive frame.

My opinion is that deinterlacing algorithms should reweave the half-picture lines and nothing more. A user can insert more filters if more processing is desired, but trying to work around 'mystery' behavior of filters that do more than you think they do is crazy making.

For progressive content, you do not deinterlace, because it damages the picture. You saw that in the very first example. You should field match. Deinterlacing is not the same thing as field matching

For interlace content, you can't "reweave the half-picture lines" , or you get back the combing

Those are not half-picture lines. They cannot be reweaved because they were never weaved. Those are 2 fields from 2 differing times. They can be weaved, but combing results. I know you know this. Do you see how better terms work better?

As motioned earlier, there are other causes, but low quality deinterlacing is the most common. Other common ones are pixel binning ...

You know, I've run across that term maybe once or twice... I don't know what it means.

... or sampling every nth pixel. Eg. large sensor DSLR's when shooting video mode.

Is that undersampling or simply insufficient resolution?

It's undersampling. The resolution of DSLR sensors is very high. 30-50 megapixels on average. 120-150 Megapixel cameras available. 1920x1080 HD video is only ~2K . 2Megapixels. It should be massively oversampled for video. But because of overheating and processing requirements (it takes lots of CPU or hardware to do proper downscaling in realtime), they (especially 1st 5-6 generations of DSLRs) , drop every nth pixel when taking video. Some newer ones are better now, but this is a very common source of aliasing and line twitter

Okay, I don't consider that undersampling, but you're right, regarding the resolution of the CCD, it is undersampling. You see, I don't consider the taking camera as part of the mastering process. To me, the origin is what comes out of the camera, not what could come out of the camera if the sensor was 100% utilized.

You see, a word like "undersampling" can become so broad that it's utility as a word is lost.

It can be if you want it to be, but it's just the most simple sense of the word here

There is no simple, mutually-agreed sense/understanding/meaning of the word "undersample".

Stay safe

Oh, yeah. You too. I'm 73 years old, I understand that almost all people my age and older who contract COVID-19 die. I have 3 months of supplies.

Last edited 12 months ago by markfilipak (previous) (diff)

comment:42 in reply to: ↑ 41 Changed 12 months ago by pdr0

Replying to markfilipak:

Sorry, but I don't know what you had in mind when you wrote "Undersampling here". To what does "here" refer? Are you referring to sampling film? Are you referring to resampling an image? Or are you referring to something else?

We were discussing undersampling as it relates to aliasing and twitter. That it was the most common underlying cause. Spatial undersampling . Discarding pixels or line skipping or groups of pixels in the cases of pixel binning

I thought we were discussing interlace and aliasing. Instead of 100 samples, throw away 50, how about a real case? Each picture on a 720x480 DVD has 345600 samples. Are you saying that hard telecine with interlaced fields, for example, that divides the pictures into half-pictures with 172800 samples each is throwing half the samples away? What video product throws half the pixels away?

Yes, interlace, aliasing, twitter artifacts

Interlace content throws half the spatial samples away, progressive 29.97p content throws away half the temporal samples away (both compared to 59.94p, which isn't "legal" for DVD-video, but that's how it's acquired these days) . Both are old compromises only because of historical bandwidth issues. That's the main reason we even have interlace.

Hard telecine is 23.976 encoded as 29.97i fields (or if you like 59.94i fields, same thing different naming convention). If you started with 23.976, then obviously nothing is thrown away. In fact you're adding repeated duplicate fields (extra repeat fields actually encoded). Soft telecine does not encode the extra repeat fields, it just uses repeat field flags to signal - I think you know this because you mentioned soft in the 1st post

When you say that that transcoding 24fps to 24fps is not subsampling but merely dividing a picture into half-pictures is subsampling, are you being consistent?

Yes it's consistent. Nothing is being discarded when you go 24fps to 24fps (ignoring lossy compression for this discussion) . Dividing into half pictures - nothing is being discarded

But that's not subsampling, is it?

Correct, it's not subsampling

(And "subsampling" in terms of video is a term usually reserved for "chroma subsampling")

BUT interlace content - now something is being discarded . Do you see the difference ?

No. I don't. Each DVD picture has 345600 samples regardless of whether those 345600 samples are in one frame or two interlace fields.

Ok you're assuming DVD.

Yes, this is true. 720x480=345600. But interlace content starts at 59.94p before it got interlaced (ok there are some old CCD sensors that begin life interlaced , but anything in the last 10-15 years uses progressive CMOS sensors).

... Each field has half the spatial information of a full progressive frame. ...

Does that make it subsampling?

It does when you start with 59.94p

For interlace content - yes it does. Half the spatial information is missing

No, it's not. It's just formatted into fields instead of frames. Three's nothing missing.

And this is where the misunderstanding is -

When you assume the starting point is a CCD interlaced sensor acquisition , sure nothing is missing compared to that (again, ignoring lossy compresssion)

But anything in the last 10-15 years started life as progressive CMOS 59.94p (or higher) acquisition

Eitherway - The bottom line is interlace is a spatially undersampled picture, a bandwidth saving measure. You can't represent a moving diagonal line cleanly with interlace content, because half the information is missing compared to a full 59.94p progressive version. And that's how and why interlace relates to aliasing , and twitter .

The only process that I know of that takes one field and makes a frame of it is bobbing.

Nothing is being thown away, even then, but the results indeed are half resolution.

Yes, it's called "separating fields" these days. Nothing is thrown away if you separate fields. You can weave them back.

Bobbing certainly is the only sane way to turn telecast fields into frames. If that's the source of the line aliasing to which you refer, then I agree of course (and the discussion is so basic I can't see why we're even engaging in it), and I'm surprised because that is such an obscure application that has nothing to do with progressive sources like movies and, again, nothing is thrown away. So when you refer to something thrown away... throwing away half the pixels, I have no idea to what you refer. Sorry.

The "source" of aliasing is because is a single field does not make a full progressive frame. When you have interlaced content, you don't have complete matching field pairs to make up a full progressive frame. Interlace is spatially undersampled compared to the 59.94 version. We were talking about interlace ,aliasing, and twitter, remember ?

Yes, nothing is thrown away with progressive sources like movies. They were acquired at 23.976 (or 24.0p in many cases). (An exception might be some HFR acquisition movies)

But if you bob deinterlace interlaced content, the undersampling was there to begin with . The process of making the interlace, dropping the spatial samples is what caused the undersampling

You're talking about dropping samples again. For telecasts, the non-existent field never existed. The 'current' field can never be part of a field-pair. Nothing is dropped. it never existed.

The information is there in acquisition. It certainly is these days. (Ok there might be some exceptions, for quick turn around scenarios, like ENG that still use interlaced acquisition)

It's undersampling. The resolution of DSLR sensors is very high. 30-50 megapixels on average. 120-150 Megapixel cameras available. 1920x1080 HD video is only ~2K . 2Megapixels. It should be massively oversampled for video. But because of overheating and processing requirements (it takes lots of CPU or hardware to do proper downscaling in realtime), they (especially 1st 5-6 generations of DSLRs) , drop every nth pixel when taking video. Some newer ones are better now, but this is a very common source of aliasing and line twitter

Okay, I don't consider that undersampling, but you're right, regarding the resolution of the CCD, it is undersampling. You see, I don't consider the taking camera as part of the mastering process. To me, the origin is what comes out of the camera, not what could come out of the camera if the sensor was 100% utilized.

Not CCD, almost nothing uses CCD anymore :) CMOS. Progressive.

The still images that come out of the camera from the sensor are almost 100% utilized, in burst mode you can get 12-16fps. Massive images. Video is just still images strung together.

But the point about the DSLR example is how it relates to aliasing and twittering. It's the sampling that is done downstream that causes it, much like the interlace example. It's line skipping and dropping pixels compared full images.

There is no simple, mutually-agreed sense/understanding/meaning of the word "undersample".

Sure, call it something else if you like

The point about this part of the discussion was how interlace - spatial undersampling compared to a full progressive frame, was a common mechanism for aliasing and twitter.

comment:43 Changed 12 months ago by markfilipak

Well, pdr0, the thing that was missing was that to you the origin is a 60fps camera output whereas to me the origin is a BD or DVD. So when you write about losses, you mean losses relative to 60fps samples. I read what you write and I try to relate it to BD & DVD video and I can't figure out why you seem so stupid. :) Hahaha...

It's funny how a difference of perspective can cramp a conversation.

Before I depart, I want to thank you -- Thank You! -- and I want to leave you with something of value to repay your kindness.

I write this as a 30-year documentarian: You are exceptionally patient and caring, but you could be more effective. I'll tell you how.

Avoid object pronouns that, 1, refer forwards, even when the object (object noun or noun phrase) is the predicate of the same sentence, or 2, backwards to a previous sentence. In other words, use object pronouns that refer solely to a previous subject that's in the same sentence.

For example, above I wrote: "I read what you write and I try to relate it to BD & DVD video..." I could have written: "I read it and I try to relate what you write to BD & DVD video..." Both are grammatically correct, but in the second sentence, "it" refers forward. Don't do that. A deferred forward reference is confusing. Likewise, don't refer, even backwards, to a subject that's in a previous sentence. Instead, simply repeat the subject of the previous sentence in the current sentence. Doing so will cause you to write a little more in the short run, but it will save you a lot of explanation in the long run.

Likewise, avoid completely other object pronouns such as "this" & "that". The same is true of personal pronouns (example: "Roger and Bob ran because he was afraid" -- is "he" Roger or Bob or someone else? -- instead of "Roger and Bob ran because Bob was afraid"), but personal pronouns are not often found in technical writing.

When I proofread what I write prior to submittal, I look especially for bad object pronoun references and fix them or repeat the original subject. I do that as a conscious step much as a coder scans for obvious syntax errors prior to compiling.

Very Best Regards,
Mark Filipak.

Last edited 12 months ago by markfilipak (previous) (diff)

comment:44 Changed 12 months ago by pdr0

Replying to markfilipak:

Thanks for the tips. I'll try to keep them in mind.(My English teacher is yelling at me from the cobwebs in my brain.) I know I should proofread more, and to check for speling :) and grammatical errors before hitting send

Cheers

Note: See TracTickets for help on using tickets.