Opened 8 years ago

Closed 5 years ago

#2686 closed defect (fixed)

Native AAC encoder collapses at high bitrates on some samples

Reported by: Kamedo2 Owned by: klaussfreire
Priority: normal Component: avcodec
Version: git-master Keywords: aac regression
Cc: klaussfreire@gmail.com, timothygu99@gmail.com, atomnuker@gmail.com, rodger.combs@gmail.com Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: yes

Description

Summary of the bug:
FFmpeg native aac encoder outputs horrible sound around 256kbps or more on particular samples. It happens at higher bitrates. The quality degrades as I increase the bitrates, and become most degraded at 320-400kbps.

How to reproduce:

ffmpeg -i ffmpeg_aac320k_collapse.flac -vn -c:a aac -strict experimental -b:a 320k ffmpeg_aac320k_collapse.mp4

I couldn't reproduce the results when I trimmed the most problematic sample down to 8 seconds, but by adding 10 seconds of silence before the sample, the bug could be reproduced. So I'm going to upload the sample with 10 seconds of silence attached. The native aac encoder was ok on many music clips at 320kbps, and only some clips exhibit noticeably bad quality aac files, to an extent I'd call it 'bug'.

Console Output:

ffmpeg version N-54096-ge41bf19 Copyright (c) 2000-2013 the FFmpeg developers
  built on Jun 19 2013 00:20:06 with gcc 4.8.1 (GCC)
  configuration: --enable-gpl --enable-version3 --enable-libmp3lame --enable-lib
vorbis --enable-nonfree --enable-libfdk-aac --enable-libvo_aacenc --enable-libfa
ac --extra-ldflags=-static --extra-cflags='-march=nocona -mfpmath=sse' --optflag
s=-O2
  libavutil      52. 37.101 / 52. 37.101
  libavcodec     55. 16.100 / 55. 16.100
  libavformat    55.  9.100 / 55.  9.100
  libavdevice    55.  2.100 / 55.  2.100
  libavfilter     3. 77.101 /  3. 77.101
  libswscale      2.  3.100 /  2.  3.100
  libswresample   0. 17.102 /  0. 17.102
  libpostproc    52.  3.100 / 52.  3.100
[flac @ 0003f160] max_analyze_duration 5000000 reached at 5015510 microseconds
Input #0, flac, from '05-true_my_heart_2m50s.flac':
  Duration: 00:00:18.01, bitrate: 573 kb/s
    Stream #0:0: Audio: flac, 44100 Hz, stereo, s16
Output #0, mp4, to '05-true_my_heart_2m50s_320k.mp4':
  Metadata:
    encoder         : Lavf55.9.100
    Stream #0:0: Audio: aac ([64][0][0][0] / 0x0040), 44100 Hz, stereo, fltp, 32
0 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (flac -> aac)
Press [q] to stop, [?] for help
size=     331kB time=00:00:18.01 bitrate= 150.4kbits/s
video:0kB audio:327kB subtitle:0 global headers:0kB muxing overhead 1.151111%

Attachments (68)

ffmpeg_aac320k_collapse.flac (1.2 MB ) - added by Kamedo2 8 years ago.
The sound file that cripples a native AAC encoder. True My Heart [DVTS-2121][07.09.03] Track05 2m50s~58s
ffmpeg_aac320k_collapse2.flac (1.5 MB ) - added by Kamedo2 8 years ago.
A sound that degrades on FFmpeg native aac encoder. Sounds like a spray can. Billie Holiday : I'm A Fool To Want You (trimmed to 20sec, first and last)
aac-improvements-wip.patch (25.1 KB ) - added by klaussfreire 8 years ago.
AAC native encoder improvements, work in progress
ffmpeg_aac320k_collapse3.flac (1.0 MB ) - added by Kamedo2 8 years ago.
A sound that degrades on FFmpeg native aac encoder. Euphoria - Yui Makino [VTCL-35073][06.4.26] Track04 Amefuribana(inst.) 2:45~2:55
whitenoise_256k.mp4 (141.4 KB ) - added by Kamedo2 8 years ago.
White noise, encoded by native aac encoder at 256kbps. The sound is obviously collapsed.
Whitenoise.flac (946.2 KB ) - added by Kamedo2 8 years ago.
White noise, created by SoundEngine Free ver.4.59. Using aevalsrc as in comment:11 do the same job.
aac-improvements-wip-v2-rclookahead.patch (30.5 KB ) - added by klaussfreire 8 years ago.
Second version of AAC improvements, with improvements on rate control, hopefully gets rid of all remaining "collapsations on high bit rates". Tested various music tracks on 64k, 128k, 256k and 384k.
aac-improvements-wip-v2-rclookahead.2.patch (30.0 KB ) - added by klaussfreire 8 years ago.
Second version of AAC improvements, with improvements on rate control, hopefully gets rid of all remaining "collapsations on high bit rates". Tested various music tracks on 64k, 128k, 256k and 384k.
ffmpeg_aac320k_collapse4.flac (1.4 MB ) - added by Kamedo2 8 years ago.
A sound that degrades on FFmpeg native aac encoder.
18.6_22kHz_noise.flac (2.2 MB ) - added by Kamedo2 8 years ago.
Partial white noise, clipped by 256th-order lanczos function, to include only signals between 18.6 and 22kHz. the signal wanders around the freq.
ffmpeg_aac320k_collapse5.flac (901.5 KB ) - added by Kamedo2 8 years ago.
A sound that degrades on FFmpeg native aac encoder.
ffmpeg_aacvbr_pulse1.flac (1.6 MB ) - added by Kamedo2 8 years ago.
Sound disappears for about 20ms in VBR mode -q:a 5, -q:a 10. Sounds like an annoying pulse.
aac-improvements-wip-v3-vbr.patch (35.9 KB ) - added by klaussfreire 8 years ago.
VBR improvements over wip-v2-rclookahead
ffmpeg_aacvbr_pulse2.flac (2.2 MB ) - added by Kamedo2 8 years ago.
Partial white noise, splitted by 256th lanczos filter. HF pulse noise that sounds like stopwatch is added in VBR around -a:q 0.3
aac-improvements-wip-v4-vbr.patch (40.4 KB ) - added by klaussfreire 8 years ago.
Improved VBR, fixed psy threshold reduction bug
fdkaac_10_12.zip (2.1 MB ) - added by Kamedo2 8 years ago.
samples #10-#12 encoded by fdkaac. *2.mp4 are the 128kbps samples, the others are the 96kbps samples.
fdkaac_13_16.zip (2.2 MB ) - added by Kamedo2 8 years ago.
samples # 10 - # 12 encoded by fdkaac. *2.mp4 are the 128kbps samples, the others are the 96kbps samples.
aac-improvements-wip-v5.patch (40.5 KB ) - added by klaussfreire 8 years ago.
V5 patch, twoloop RD fixed (I think)
aac-improvements-wip-v6.patch (42.3 KB ) - added by klaussfreire 8 years ago.
Improved (mostly constrained) VBR, fixed RC bug from v5. There's some dead code that begs to be removed, but it's better to start testing before cleaning.
ffmpeg_aacvbr_degrade1.flac (1.4 MB ) - added by Kamedo2 8 years ago.
A sound that degrades on VBR. from GIZA studio Masterpiece BLEND 2001 Disc2 Track3 Stand Up (Mai Kuraki)
ffmpeg_aac_lead_voice.flac (1.4 MB ) - added by Kamedo2 8 years ago.
Degrades on FFmpeg aac encoder, both on vbr and abr. The original sound is very odd and may not be worthy to put a lot of effort improving it.
aac-improvements-wip-v7.patch (48.2 KB ) - added by klaussfreire 8 years ago.
v7 patch - mostly bugfixing on v6, but quite significant bugs - still incomplete (needs sample rate fixes and Mahler still sounds weird)
sine_tester.flac (1006.7 KB ) - added by Kamedo2 8 years ago.
Sine waves for a warbling test. 50 440 1000 3000 7000 10000 20000Hz. 24bit 48kHz PCM.
aac-improvements-wip-v8.patch (71.8 KB ) - added by klaussfreire 8 years ago.
v8 patch - tweaked tonal band priorization, especially in transients, fixed M/S encoding and made default, and other assorted bugs. Added missing include changes.
Whitenoise_left.flac (479.1 KB ) - added by Kamedo2 8 years ago.
Whitenoise.flac without the sound of right channel. A strange noise appears in the center in v8.
ffmpeg_aac256k_degrade.flac (1.9 MB ) - added by Kamedo2 8 years ago.
The sound degrades on v8 around 256kbps. Mainly right channel suffers. from Kohmi Hirose GIFT/Ai wa tokkoyaku Track3
ItCouldBeSweet.ffv8_128k.diff.flac (1.7 MB ) - added by Kamedo2 8 years ago.
The diff of the ItCouldBeSweet, before and after the v8 AAC encode, 128kbps.
ItCouldBeSweet.ffv8_192k.diff.flac (1.9 MB ) - added by Kamedo2 8 years ago.
The diff of the ItCouldBeSweet, before and after the v8 AAC encode, 192kbps.
ItCouldBeSweet.ffv8_320k.diff.flac (1.3 MB ) - added by Kamedo2 8 years ago.
The diff of the ItCouldBeSweet, before and after the v8 AAC encode, 320kbps.
ItCouldBeSweet.ffv8_q1.5.diff.flac (1.5 MB ) - added by Kamedo2 8 years ago.
The diff of the ItCouldBeSweet, before and after the v8 AAC encode, quality option -q:a 1.5
aac-improvements-wip-v8-fix.patch (1.8 KB ) - added by klaussfreire 7 years ago.
Cumulative patch over v8 to fix M/S coding
ItCouldBeSweet.qaac_cvbr128k.diff.flac (1.7 MB ) - added by Kamedo2 7 years ago.
Just for comparison. The diff of the ItCouldBeSweet, between the original and qaac encode, 128kbps.
ItCouldBeSweet.fdk_128k.diff.flac (1.7 MB ) - added by Kamedo2 7 years ago.
Just for comparison. The diff of the ItCouldBeSweet, between the original and FDK-AAC encode, 128kbps.
aac-improvements-wip-v8f.patch (73.6 KB ) - added by klaussfreire 7 years ago.
Combined v8 + fix
ItCouldBeSweet.ffv8f_128k.diff.flac (1.7 MB ) - added by Kamedo2 7 years ago.
The diff of the ItCouldBeSweet, between the original and the patch v8f AAC encode, 128kbps.
ItCouldBeSweet.ffv8f_192k.diff.flac (1.6 MB ) - added by Kamedo2 7 years ago.
The diff of the ItCouldBeSweet, between the original and the patch v8f AAC encode, 192kbps.
ItCouldBeSweet.ffv8f_320k.diff.flac (1.4 MB ) - added by Kamedo2 7 years ago.
The diff of the ItCouldBeSweet, between the original and the patch v8f AAC encode, 320kbps.
aac-improvements-wip-v8g.patch (76.6 KB ) - added by klaussfreire 7 years ago.
Fix M/S encoding in ABR
ItCouldBeSweet.ffv8g_128k.diff.flac (1.8 MB ) - added by Kamedo2 7 years ago.
ItCouldBeSweet.ffv8g_192k.diff.flac (1.7 MB ) - added by Kamedo2 7 years ago.
The diff of the ItCouldBeSweet, between the original and the patch v8g AAC encode, 192kbps.
ItCouldBeSweet.ffv8g_320k.diff.flac (1.4 MB ) - added by Kamedo2 7 years ago.
The diff of the ItCouldBeSweet, between the original and the patch v8g AAC encode, 320kbps.
aac-improvements-wip-v7-new.patch (48.2 KB ) - added by Kamedo2 7 years ago.
v7 patch altered to reflect the latest change by Michael Niedermayer at 20140525. This should work for the git head.
aac-improvements-wip-v8g-new.patch (76.6 KB ) - added by Kamedo2 7 years ago.
v8g patch altered to reflect the latest change by Michael Niedermayer at 20140525. This should work for the git head.
ffmpeg_anmr_error.flac (157.7 KB ) - added by Kamedo2 7 years ago.
It causes the assertion error at aacenc.c line 399 by -aac_coder anmr on all -b:a and -q:a 0.1695 or bigger.
ffmpeg_anmr_error2.flac (1.1 MB ) - added by Kamedo2 7 years ago.
EBU–TECH 3253 Sound Quality Assessment Material recordings for subjective tests, 50 Male speech, English.
aac-improvements-wip-v9.patch (92.9 KB ) - added by klaussfreire 7 years ago.
Hopefully final version of the AAC patch
ItCouldBeSweet.ffv9_128k.diff.flac (1.7 MB ) - added by Kamedo2 7 years ago.
The diff of the ItCouldBeSweet, between the original and the patch v9 AAC encode, 128kbps.
ItCouldBeSweet.ffv9_192k.diff.flac (1.7 MB ) - added by Kamedo2 7 years ago.
The diff of the ItCouldBeSweet, between the original and the patch v9 AAC encode, 192kbps.
ItCouldBeSweet.ffv9_320k.diff.flac (1.4 MB ) - added by Kamedo2 7 years ago.
The diff of the ItCouldBeSweet, between the original and the patch v9 AAC encode, 320kbps.
ffmpeg_anmr_error3.flac (221.4 KB ) - added by Kamedo2 7 years ago.
EBU–TECH 3253 Sound Quality Assessment Material recordings for subjective tests, 3 Electronic gong 100 Hz.(sine wave)
FFmpeg_anmr_error4.flac (250.4 KB ) - added by Kamedo2 7 years ago.
This causes the assertion error on both -b:a 128k and -q:a 1. 4000Hz sine wave, stereo.
FFmpeg_anmr_error5.flac (140.0 KB ) - added by Kamedo2 7 years ago.
This causes the assertion error on both -b:a 128k and -q:a 1. 11000Hz sine wave, stereo.
aac-improvements-wip-v9b.patch (98.6 KB ) - added by klaussfreire 7 years ago.
v9b version, based on v9, matched behavior against v7
FFmpeg_anmr_error6.flac (267.1 KB ) - added by Kamedo2 7 years ago.
This causes the assertion error on -b:a 96k, 128k, 160k on v9b. -q:a is OK. 9000Hz sine wave, stereo.
FFmpeg_anmr_error7.flac (2.4 MB ) - added by Kamedo2 7 years ago.
This causes the assertion error on -b:a 192k on v9b. Dave Matthews Band - Crush, http://www.hydrogenaud.io/forums/index.php?showtopic=102079&hl=
aac-improvements-wip-v7-new.2.patch (48.2 KB ) - added by Kamedo2 7 years ago.
v7 patch altered to reflect the latest changes. This should work for the git head.
aac-improvements-wip-v9b-new.patch (98.6 KB ) - added by Kamedo2 7 years ago.
v9b patch altered to reflect the latest changes. This should work for the git head.
FFmpeg_anmr_error8.flac (1.8 MB ) - added by Kamedo2 7 years ago.
This causes the assertion error on -q:a 1 on v9b. -q:a 0.99 or 1.01 is safe. Susanne Vega, Tom's Diner http://www.rarewares.org/test_samples/
ffmpeg_aac_error1.flac (1.7 MB ) - added by Kamedo2 6 years ago.
FFmpeg doesn't stop when the sample rate is 8kHz and the bitrate is high. -ar 8000 -b:a 96k, -q:a 0.958 or higher. Fear Factory, Digimortal, Linchpin.
SinceAlways.flac (2.2 MB ) - added by Kamedo2 6 years ago.
This is one exceptional case that degrades on v9b.
mybloodrusts.flac (2.4 MB ) - added by Kamedo2 6 years ago.
This is one exceptional case that degrades on v9b.
castanets.flac (588.9 KB ) - added by Kamedo2 6 years ago.
This is one exceptional case that degrades on v9b.
ffmpeg_96k_error.flac (58.5 KB ) - added by Kamedo2 6 years ago.
Low Freq. Sine Sweep Stereo with right channel inverted; inaudible on mono.
mybloodrusts.ff74961_128k.mp4 (323.3 KB ) - added by Kamedo2 6 years ago.
mybloodrusts.flac encoded at -b:a 128k by ffmpeg74961-g61009a7.
mybloodrusts.ff75043_128k.mp4 (323.4 KB ) - added by Kamedo2 6 years ago.
mybloodrusts.flac encoded at -b:a 128k by ffmpeg75043-gb31041a.
assertion_diff_shimoseka.m4a (795.5 KB ) - added by llogan 6 years ago.
Assertion diff >= 0 && diff <= 120 failed at libavcodec/aacenc.c:363
ffmpeg_aac_error2.flac (1.2 MB ) - added by Kamedo2 6 years ago.
This causes error on -profile:a aac_ltp -b:a 96k. The error msg are "av_interleaved_write_frame(): Not enough space" or "Audio encoding failed (avcodec_encode_audio2)". The sound is 08._Sarah_McLachlan_Ice_ringing.flac
short_block_test_2.flac (167.1 KB ) - added by Kamedo2 6 years ago.

Change History (577)

by Kamedo2, 8 years ago

The sound file that cripples a native AAC encoder. True My Heart [DVTS-2121][07.09.03] Track05 2m50s~58s

by Kamedo2, 8 years ago

A sound that degrades on FFmpeg native aac encoder. Sounds like a spray can. Billie Holiday : I'm A Fool To Want You (trimmed to 20sec, first and last)

comment:1 by Carl Eugen Hoyos, 8 years ago

Component: FFmpegavcodec
Keywords: native encoder sound quality 256kbps 320kbps removed
Version: 1.0.7git-master

Did the output (aac) files sound better with the (original!) release 1.2?
(Not a later release of the 1.2 series.)

comment:2 by Kamedo2, 8 years ago

Yes, the output aac files sounded better with release 1.2.1 I've downloaded from
http://www.ffmpeg.org/releases/ffmpeg-1.2.1.tar.bz2

Still, the quality of the native aac at 320kbps is poorer than the native aac 256kbps.

ffmpeg version 1.2.1 Copyright (c) 2000-2013 the FFmpeg developers
  built on Jun 19 2013 12:38:13 with gcc 4.8.1 (GCC)
  configuration: --enable-gpl --enable-version3 --enable-libmp3lame --enable-lib
vorbis --enable-nonfree --enable-libfdk-aac --enable-libvo_aacenc --enable-libfa
ac --extra-ldflags=-static --extra-cflags='-march=nocona -mfpmath=sse' --optflag
s=-O2
  libavutil      52. 18.100 / 52. 18.100
  libavcodec     54. 92.100 / 54. 92.100
  libavformat    54. 63.104 / 54. 63.104
  libavdevice    54.  3.103 / 54.  3.103
  libavfilter     3. 42.103 /  3. 42.103
  libswscale      2.  2.100 /  2.  2.100
  libswresample   0. 17.102 /  0. 17.102
  libpostproc    52.  2.100 / 52.  2.100
[flac @ 01405c20] max_analyze_duration 5000000 reached at 5015510 microseconds
Input #0, flac, from 'ffmpeg_aac320k_collapse.flac
':
  Duration: 00:00:18.01, bitrate: 573 kb/s
    Stream #0:0: Audio: flac, 44100 Hz, stereo, s16
Output #0, mp4, to 'ffmpeg_aac320k_collapse.mp4':
  Metadata:
    encoder         : Lavf54.63.104
    Stream #0:0: Audio: aac ([64][0][0][0] / 0x0040), 44100 Hz, stereo, fltp, 32
0 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (flac -> aac)
Press [q] to stop, [?] for help
size=     289kB time=00:00:18.01 bitrate= 131.3kbits/s
video:0kB audio:285kB subtitle:0 global headers:0kB muxing overhead 1.321136%

comment:3 by Kamedo2, 8 years ago

Oops, you said original release 1.2.
Release 1.2 and 1.2.1 had the same behavior -- the first sample collapses at 432-464kbps.
As for N-54096-ge41bf19 I've got from git -- the first sample collapses at 256-432kbps.
These two groups have the distinct "degradation range". Release 1.2 and 1.2.1 have much narrower degradation range, and the 1.2* is less severe at the range. N-54096-ge41bf19 at 352kbps is the worst quality.

ffmpeg version 1.2 Copyright (c) 2000-2013 the FFmpeg developers
  built on Jun 20 2013 03:06:34 with gcc 4.8.1 (GCC)
  configuration: --enable-version3 --enable-nonfree --enable-libfdk-aac --extra-
ldflags=-static --extra-cflags='-march=native' --optflags=-O2
  libavutil      52. 18.100 / 52. 18.100
  libavcodec     54. 92.100 / 54. 92.100
  libavformat    54. 63.104 / 54. 63.104
  libavdevice    54.  3.103 / 54.  3.103
  libavfilter     3. 42.103 /  3. 42.103
  libswscale      2.  2.100 /  2.  2.100
  libswresample   0. 17.102 /  0. 17.102
[flac @ 03295c20] max_analyze_duration 5000000 reached at 5015510 microseconds
Input #0, flac, from 'C:\Users\PCC\Documents\ABC-HR\ffmpeg_aac320k_collapse.flac
':
  Duration: 00:00:18.01, bitrate: 573 kb/s
    Stream #0:0: Audio: flac, 44100 Hz, stereo, s16
Output #0, mp4, to 'C:\Users\PCC\Documents\ABC-HR\05-true_my_heart_2m50s_320k_12
.mp4':
  Metadata:
    encoder         : Lavf54.63.104
    Stream #0:0: Audio: aac ([64][0][0][0] / 0x0040), 44100 Hz, stereo, fltp, 32
0 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (flac -> aac)
Press [q] to stop, [?] for help
size=     289kB time=00:00:18.01 bitrate= 131.3kbits/s
video:0kB audio:285kB subtitle:0 global headers:0kB muxing overhead 1.321136%

comment:4 by klaussfreire, 8 years ago

This patch I'm going to attach fixes both issues. But I must warn that it's a WIP, I still have to split it into individual issues and fix a bug it exhibits in rare circumstances when working in VBR mode.

by klaussfreire, 8 years ago

Attachment: aac-improvements-wip.patch added

AAC native encoder improvements, work in progress

comment:5 by Carl Eugen Hoyos, 8 years ago

Keywords: regression added
Reproduced by developer: set
Status: newopen

comment:6 by Kamedo2, 8 years ago

I appreciate your effort, klaussfreire.
I want to test the aac-improvements-wip.patch, but how can I do that?

/c/mingw/ffmpeg/ffmpeg-1.2
$ patch -u -p1 < aac-improvements-wip.patch
patching file libavcodec/aaccoder.c
Hunk #3 FAILED at 711.
Hunk #4 succeeded at 776 (offset -5 lines).
Hunk #5 succeeded at 818 (offset -5 lines).
Hunk #6 FAILED at 845.
Hunk #7 FAILED at 1055.
Hunk #8 FAILED at 1068.
Hunk #9 FAILED at 1092.
Hunk #10 FAILED at 1110.
6 out of 10 hunks FAILED -- saving rejects to file libavcodec/aaccoder.c.rej
patching file libavcodec/aacenc.c
Hunk #3 FAILED at 622.
1 out of 3 hunks FAILED -- saving rejects to file libavcodec/aacenc.c.rej
patching file libavcodec/aacpsy.c
Hunk #1 succeeded at 293 (offset -4 lines).
Hunk #2 succeeded at 385 (offset -4 lines).
Hunk #3 succeeded at 646 (offset -33 lines).
patching file libavcodec/psymodel.h

comment:7 by Carl Eugen Hoyos, 8 years ago

Without trying myself, I would bet that the patch only applies to current git head.

comment:8 by Kamedo2, 8 years ago

I tried $ git clone git://source.ffmpeg.org/ffmpeg.git, but still, the patch fails.

comment:9 by Carl Eugen Hoyos, 8 years ago

I can confirm that the patch does not apply.

comment:10 by Kamedo2, 8 years ago

I tried the wip patch again. No good. I think the patch is broken.

$ patch -p1 < aac-improvements-wip.patch
patching file libavcodec/aaccoder.c
Hunk #3 FAILED at 711.
Hunk #4 succeeded at 776 (offset -5 lines).
Hunk #5 succeeded at 818 (offset -5 lines).
Hunk #6 FAILED at 845.
Hunk #7 FAILED at 1055.
Hunk #8 FAILED at 1068.
Hunk #9 FAILED at 1092.
Hunk #10 FAILED at 1110.
6 out of 10 hunks FAILED -- saving rejects to file libavcodec/aaccoder.c.rej
patching file libavcodec/aacenc.c
Hunk #1 FAILED at 591.
Hunk #2 FAILED at 609.
Hunk #3 FAILED at 621.
3 out of 3 hunks FAILED -- saving rejects to file libavcodec/aacenc.c.rej
patching file libavcodec/aacpsy.c
Hunk #1 succeeded at 299 (offset 2 lines).
Hunk #2 succeeded at 391 (offset 2 lines).
Hunk #3 succeeded at 681 (offset 2 lines).
patching file libavcodec/psymodel.h

by Kamedo2, 8 years ago

A sound that degrades on FFmpeg native aac encoder. Euphoria - Yui Makino [VTCL-35073][06.4.26] Track04 Amefuribana(inst.) 2:45~2:55

comment:11 by Kamedo2, 8 years ago

I successfully applied the patch. klaussfreire's repository is in here. http://ffmpeg.org/pipermail/ffmpeg-devel/2013-May/143216.html
Or, you can use https://dl.dropboxusercontent.com/u/81238453/aac.patch (Thank you Takuan @K4095) to patch from current git head.

However, still, it has a distinctive bug. The sound disappears partially when the sound is white noise-like.
The bug #2706 was that the sound warbles when the sound was a sine wave. That was solved by this patch, but this creates new problem.

ffmpeg54292 -v 9 -loglevel
99 -filter_complex "aevalsrc=-0.5+random(0)" -c:a aac -strict experimental -ar 4
4100 -ac 2 -b:a 256k -t 4 "C:\Users\PCC\Documents\ABC-HR\whitenoise_256k.mp4"
ffmpeg version N-54292-g97947d9 Copyright (c) 2000-2013 the FFmpeg developers
  built on Jun 30 2013 20:34:13 with gcc 4.8.1 (GCC)
  configuration: --enable-gpl --enable-version3 --enable-nonfree --enable-libfdk
-aac --extra-ldflags=-static --extra-cflags='-march=nocona -mfpmath=sse' --optfl
ags=-O2
  libavutil      52. 38.100 / 52. 38.100
  libavcodec     55. 18.100 / 55. 18.100
  libavformat    55. 10.100 / 55. 10.100
  libavdevice    55.  2.100 / 55.  2.100
  libavfilter     3. 77.101 /  3. 77.101
  libswscale      2.  3.100 /  2.  3.100
  libswresample   0. 17.102 /  0. 17.102
  libpostproc    52.  3.100 / 52.  3.100
Splitting the commandline.
Reading option '-v' ... matched as option 'v' (set logging level) with argument
'9'.
Reading option '-loglevel' ... matched as option 'loglevel' (set logging level)
with argument '99'.
Reading option '-filter_complex' ... matched as option 'filter_complex' (create
a complex filtergraph) with argument 'aevalsrc=-0.5+random(0)'.
Reading option '-c:a' ... matched as option 'c' (codec name) with argument 'aac'
.
Reading option '-strict' ... matched as AVOption 'strict' with argument 'experim
ental'.
Reading option '-ar' ... matched as option 'ar' (set audio sampling rate (in Hz)
) with argument '44100'.
Reading option '-ac' ... matched as option 'ac' (set number of audio channels) w
ith argument '2'.
Reading option '-b:a' ... matched as option 'b' (video bitrate (please use -b:v)
) with argument '256k'.
Reading option '-t' ... matched as option 't' (record or transcode "duration" se
conds of audio/video) with argument '4'.
Reading option 'C:\Users\PCC\Documents\ABC-HR\whitenoise_256k.mp4' ... matched a
s output file.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option v (set logging level) with argument 9.
Applying option filter_complex (create a complex filtergraph) with argument aeva
lsrc=-0.5+random(0).
Successfully parsed a group of options.
Parsing a group of options: output file C:\Users\PCC\Documents\ABC-HR\whitenoise
_256k.mp4.
Applying option c:a (codec name) with argument aac.
Applying option ar (set audio sampling rate (in Hz)) with argument 44100.
Applying option ac (set number of audio channels) with argument 2.
Applying option b:a (video bitrate (please use -b:v)) with argument 256k.
Applying option t (record or transcode "duration" seconds of audio/video) with a
rgument 4.
Successfully parsed a group of options.
Opening an output file: C:\Users\PCC\Documents\ABC-HR\whitenoise_256k.mp4.
detected 8 logical cores
[Parsed_aevalsrc_0 @ 0140bea0] compat: called with args=[-0.5+random(0)]
[Parsed_aevalsrc_0 @ 0140bea0] Setting 'exprs' to value '-0.5+random(0)'
[audio format for output stream 0:0 @ 01412880] Setting 'sample_fmts' to value '
fltp'
[audio format for output stream 0:0 @ 01412880] Setting 'sample_rates' to value
'44100'
[audio format for output stream 0:0 @ 01412880] Setting 'channel_layouts' to val
ue '0x3'
Successfully opened the file.
[audio format for output stream 0:0 @ 01412880] auto-inserting filter 'auto-inse
rted resampler 0' between the filter 'Parsed_aevalsrc_0' and the filter 'audio f
ormat for output stream 0:0'
[AVFilterGraph @ 0039f3c0] query_formats: 3 queried, 6 merged, 3 already done, 0
 delayed
[Parsed_aevalsrc_0 @ 0140bea0] sample_rate:44100 chlayout:mono duration:-1.00000
0
[auto-inserted resampler 0 @ 0039f2a0] [SWR @ 00393160] Using double precision m
ode
0.707107
0.707107
[auto-inserted resampler 0 @ 0039f2a0] ch:1 chl:mono fmt:dblp r:44100Hz -> ch:2
chl:stereo fmt:fltp r:44100Hz
Output #0, mp4, to 'C:\Users\PCC\Documents\ABC-HR\whitenoise_256k.mp4':
  Metadata:
    encoder         : Lavf55.10.100
    Stream #0:0, 0, 1/44100: Audio: aac ([64][0][0][0] / 0x0040), 44100 Hz, ster
eo, fltp, 256 kb/s
Stream mapping:
  aevalsrc -> Stream #0:0 (aac)
Press [q] to stop, [?] for help
No more output streams to write to, finishing.
size=     141kB time=00:00:04.01 bitrate= 288.4kbits/s
video:0kB audio:140kB subtitle:0 global headers:0kB muxing overhead 1.001409%
0 frames successfully decoded, 0 decoding errors
[AVIOContext @ 0141b640] Statistics: 30 seeks, 197 writeouts

The output mp4 I'm going to post sounds nothing like white noise.

by Kamedo2, 8 years ago

Attachment: whitenoise_256k.mp4 added

White noise, encoded by native aac encoder at 256kbps. The sound is obviously collapsed.

comment:12 by Kamedo2, 8 years ago

Another bug, typically happens when hi-hats are present. The sound disappears for about 20ms.
Short, but it's still audible and sounds like a annoying pulse.

http://i40.tinypic.com/v3dt39.png

When these problems are solved, I'm going to conduct an extensive blind listening test, to assess sound quality of AAC encoders available from FFmpeg.

comment:13 by Kamedo2, 8 years ago

Another type of holes. There are no holes like this in the original sound, but are present in encoded mp4s.
http://i41.tinypic.com/axisf5.png

comment:14 by klaussfreire, 8 years ago

Sorry, I expected to get email notifications, but got none.

That bug is probably a ratecontrol bug I thought I had erradicated. I'll try to test with white noise, but just in case the exact input matters, can you attach a flac version?

in reply to:  14 comment:15 by Carl Eugen Hoyos, 8 years ago

Replying to klaussfreire:

Sorry, I expected to get email notifications, but got none.

You will get them if you add yourself to CC.

comment:16 by klaussfreire, 8 years ago

Cc: klaussfreire@gmail.com added

comment:17 by klaussfreire, 8 years ago

In aacenc.c, changing

s->lambda *= ratio

by

s->lambda *= sqrtf(sqrtf(ratio));

Fixes the white nose thing, so indeed it's RC messup.

But that brings some other trouble in more normal signals, so I guess I'll have to play with RC a little bit more.

comment:18 by klaussfreire, 8 years ago

I think AAC's ratecontrol needs a lookahead buffer.

by Kamedo2, 8 years ago

Attachment: Whitenoise.flac added

White noise, created by SoundEngine Free ver.4.59. Using aevalsrc as in comment:11 do the same job.

in reply to:  16 comment:19 by Carl Eugen Hoyos, 8 years ago

Replying to klaussfreire:

You may also want to look at ticket #2706.
(Is it a duplicate of this ticket?)

in reply to:  18 comment:20 by Kamedo2, 8 years ago

Replying to klaussfreire:

I think AAC's ratecontrol needs a lookahead buffer.

Can you implement the feature until July 13th?
I'm going to be free and have time to do some double-blind listening tests of the codec.
Results will be like this: http://www.hydrogenaudio.org/forums/index.php?showtopic=100896

comment:21 by klaussfreire, 8 years ago

Maybe a very simple one-block one. I've been thinking such a simple lookahead might be enough to fix the bugs, with a better one perhaps for a further patch.

I'll give this high priority, but we're only 3 days away from that deadline you know...

in reply to:  21 comment:22 by Kamedo2, 8 years ago

Replying to klaussfreire:

Maybe a very simple one-block one. I've been thinking such a simple lookahead might be enough to fix the bugs, with a better one perhaps for a further patch.

I'll give this high priority, but we're only 3 days away from that deadline you know...

Thank you very much! A delay of some days is acceptable.

comment:23 by klaussfreire, 8 years ago

Alright, attaching another version. This seems to work better, but it's a bit rushed. I'll try to improve on it, but if I delay, feel free to test this version.

by klaussfreire, 8 years ago

Second version of AAC improvements, with improvements on rate control, hopefully gets rid of all remaining "collapsations on high bit rates". Tested various music tracks on 64k, 128k, 256k and 384k.

in reply to:  23 comment:24 by Carl Eugen Hoyos, 8 years ago

Replying to klaussfreire:

Alright, attaching another version.

The patch does not apply here to current git head.

comment:25 by Kamedo2, 8 years ago

The patch does not apply, neither. I read http://ffmpeg.org/pipermail/ffmpeg-devel/2013-May/143216.html and http://ffmpeg.org/pipermail/ffmpeg-devel/2013-May/143222.html and guessed what should I do, but still, it fails.

by klaussfreire, 8 years ago

Second version of AAC improvements, with improvements on rate control, hopefully gets rid of all remaining "collapsations on high bit rates". Tested various music tracks on 64k, 128k, 256k and 384k.

comment:26 by klaussfreire, 8 years ago

Yes, sorry, I'm not working on a clean checkout.

I should move to a clean checkout.

There I attached a rebased patch.

comment:27 by Kamedo2, 8 years ago

Very good one! The only serious artifact I've heard so far is whitenoise.flac at 8, 16, 24, 32kbps and 192kbps.

comment:28 by Kamedo2, 8 years ago

Whitenoise.flac at 384kbps, ffmpeg_aac320k_collapse.flac at 320kbps is strange, too.

in reply to:  28 comment:29 by klaussfreire, 8 years ago

Replying to Kamedo2:

Whitenoise.flac at 384kbps, ffmpeg_aac320k_collapse.flac at 320kbps is strange, too.

I didn't try the collapse ones at 320k, though I tried at 384 and sounded nice. I'll try again when I have a chance though.

However, whitenoise 384 gives me an error, seems 384kbps is too much for mono. The whitenoise I mention is generated with the random generator, I'll try with the flac first chance I get.

comment:30 by Kamedo2, 8 years ago

Isn't the lower spreading function applied too much? The quality of lower frequency is bad when the higher frequency bin is strong. And what makes 320kbps particularly bad? The quality degrades when we have enough ('overkill') bits. I think something fatal is happening, like integer overflow or something.

by Kamedo2, 8 years ago

A sound that degrades on FFmpeg native aac encoder.

comment:31 by Kamedo2, 8 years ago

Isn't line 334 of libavcodec/aacpsy.c:

        for (g = 0; g < ctx->num_bands[j]-1; g++) {
            AacPsyCoeffs *coeff = &coeffs[g];
            float bark_width = coeffs[g+1].barks - coeffs->barks;
            coeff->spread_low[0] = pow(10.0, -bark_width * PSY_3GPP_THR_SPREAD_LOW);
            coeff->spread_hi [0] = pow(10.0, -bark_width * PSY_3GPP_THR_SPREAD_HI);
            coeff->spread_low[1] = pow(10.0, -bark_width * en_spread_low);
            coeff->spread_hi [1] = pow(10.0, -bark_width * en_spread_hi);
            pe_min = bark_pe * bark_width;
            minsnr = exp2(pe_min / band_sizes[g]) - 1.5f;
            coeff->min_snr = av_clipf(1.0f / minsnr, PSY_SNR_25DB, PSY_SNR_1DB);
        }

strange? I doubt the sanity of lower spreading function at the highest band, because using -cutoff 18000 option improves the quality on problematic samples, and these problematic samples always includes strong 20-22kHz sounds. (The default cutoff is 18k at 192kbps, 20k at 256kbps, and 22k at 320kbps.)

by Kamedo2, 8 years ago

Attachment: 18.6_22kHz_noise.flac added

Partial white noise, clipped by 256th-order lanczos function, to include only signals between 18.6 and 22kHz. the signal wanders around the freq.

comment:32 by Kamedo2, 8 years ago

I've got it. When the native aac encoder calcs a masking curve, almost inaudible sounds like 18kHz, 20kHz, 22kHz is taking into account, and audible sound like 14kHz is masked by the inaudibles. Add the inaudible noise above to the source sound and the encoded sound will be significantly degraded. I recommend that any signals above 16kHz is disregarded in psychoacoustic engines.

comment:33 by klaussfreire, 8 years ago

Alright. Good catch.

I'd recommend not ignoring, because masking within that band will still be important for bit allocation purposes. Rather, back-spreading rolloff (towards the lower frequencies) should be tweaked a bit.

comment:34 by Kamedo2, 8 years ago

Things start to make sense.

Could you tweak the back-spreading and provide the patch for me? I'd like to test that.

comment:35 by klaussfreire, 8 years ago

Yes, will do this tonight (at work right now).

by Kamedo2, 8 years ago

A sound that degrades on FFmpeg native aac encoder.

comment:36 by Kamedo2, 8 years ago

-cutoff 18000 seems to work, but the lowpass filter is too dull, compared to many practical encoders. libavcodec/psymodel.c has the constant FILT_ORDER, and changing the order from 4 to 8 sharpens the filter. But 12 and 16 fails somehow.

comment:37 by klaussfreire, 8 years ago

I hope you're testing with good headphones. HF quality is hard to gauge with speakers, especially since good speakers cost a fortune.

comment:38 by Kamedo2, 8 years ago

Yes, I'm testing with good headphones.

in reply to:  38 ; comment:39 by klaussfreire, 8 years ago

Replying to Kamedo2:

Yes, I'm testing with good headphones.

The reason I mention this is because, from my experience, FAAC tends to have a low cutoff for some bitrates, that seem optimal with speakers, but sound noticeably dull with headphones.

in reply to:  39 comment:40 by Kamedo2, 8 years ago

Replying to klaussfreire:

The reason I mention this is because, from my experience, FAAC tends to have a low cutoff for some bitrates, that seem optimal with speakers, but sound noticeably dull with headphones.

Exactly. FAAC cutoff is rather annoyingly low in 96kbps, 64kbps, and 32kbps, and the filter is the major reason why FAAC never beats Nero.

BTW, any prospects for fixing samples 1, 4, 5, and white noise? 4 and 5 is bad at 320kbps and whitenoise.flac is bad at 384kbps. Both regain quality by -cutoff 18000.

comment:41 by Kamedo2, 8 years ago

from line 300:

    const int chan_bitrate = ctx->avctx->bit_rate / ((ctx->avctx->flags & CODEC_FLAG_QSCALE) ? 2.0f : ctx->avctx->channels);

to:

    const int chan_bitrate = FFMIN(ctx->avctx->bit_rate, 240000) / ((ctx->avctx->flags & CODEC_FLAG_QSCALE) ? 2.0 : ctx->avctx->channels);

significantly improves the quality. Bitrates remain relatively high in this change.
I have not tested all cases, but it works on 256kbps, 320kbps, and 384kbps on many sounds.

comment:42 by Kamedo2, 8 years ago

I've listened to over 100 samples of diverse music and speech records. No problem so far. It works on 96, 112, 128,... 256kbps, but hangs on 288kbps.

comment:43 by klaussfreire, 8 years ago

Yeah, but because you're capping psy's bitrate target to non-problematic rates. I don't think that's ideal, though that indeed proves the problem lies in psy.

in reply to:  43 comment:44 by Kamedo2, 8 years ago

Replying to klaussfreire:

Yeah, but because you're capping psy's bitrate target to non-problematic rates. I don't think that's ideal, though that indeed proves the problem lies in psy.

Rates go up even after capping. So it's not merely a cap. I think we're close to the solution.

comment:45 by klaussfreire, 8 years ago

They go up because twoloop will push all scalefactors down uniformly until it achieves the desired bitrate, but:

  • It won't work with VBR, VBR almost wholly depends on psy to dictate scalefactor band noise floors. Twoloop will push scalefactors down a bit more I think but not much at those high bitrates
  • It's still suboptimal, it's better to let psy decide, since psy understands perceptual entropy better

Sadly, I didn't have time today to work on it. Lets hope I can do so tomorrow. With your analysis I'm confident I can patch psy without having to cap anything.

comment:46 by Kamedo2, 8 years ago

How is the development going?

comment:47 by klaussfreire, 8 years ago

Reading the specs right now. I had a hunch that the spec might say something about this.

comment:48 by klaussfreire, 8 years ago

There. Line 308:

pctx->frame_bits   = chan_bitrate * AAC_BLOCK_SIZE_LONG / ctx->avctx->sample_rate;

Must be

pctx->frame_bits   = FFMIN(3000, chan_bitrate * AAC_BLOCK_SIZE_LONG / ctx->avctx->sample_rate);

That is indeed said on the spec.

Step 15 of subpart 4: Steps in threshold calculation: then bit allocation is limited to 0 < bit_allocation < 3000. It seems they thought of it all.

comment:49 by Kamedo2, 8 years ago

Great! I'm goint to have time to test that improvement 5 hours later, so I'm going to test that. Extensively. And I think I have to look for ways to sharpen the LPF, using more order, at the cost of more computational time. Currently it's not very clear cut.

comment:50 by klaussfreire, 8 years ago

2560 (the number you found) works better for us though. That's certainly in relation to some deficiency in twoloop, but hey. Lets just document that this should be a 3000 but can't and be done.

comment:51 by klaussfreire, 8 years ago

The LPF could be accomplished by zeroing the coefficients in the FFT. To get the lowest possible ripple, the boundary coefficient needs some care, but AFAIR it's the best method, and it's free for something that's already doing FFT.

comment:52 by Kamedo2, 8 years ago

It's not a regression, but surround bitrate seems to be capped and do not change by -b:a 256k, 320k, 384k.
Surround sample file is in here. http://people.xiph.org/~xiphmont/demo/opus/demo3.shtml
I'm currently using tx->frame_bits = FFMIN(3000,...
No obvious bugs so far.

comment:53 by Kamedo2, 8 years ago

http://i44.tinypic.com/2s805y0.png

I used tx->frame_bits = FFMIN(2560, and psymodel.h line 32:

#define AAC_CUTOFF(s) (s->bit_rate ? FFMIN3(FFMIN3(s->bit_rate/s->channels/2, 4000 + s->bit_rate/s->channels/4, 12000 + s->bit_rate/s->channels/16), 20000, s->sample_rate / 2): (s->sample_rate / 2))

This is better on mono, surround, and on very low bitrates(such as 32kbps stereo).
truncut.wav has few HF content, so the bitrate saturates in 172kbps.

comment:54 by Kamedo2, 8 years ago

In 4 hours of hearing more than 100 musical, vocal, ambient and artificial sounds, on 64-480kbps, 44.1kHz, 48kHz, stereo, surround, I have found no problematic samples. This solution is great. Thank you for fixing, klaussfreire.

I think I'm going to test mono, collecting more surround samples to test, 32kHz or less, and VBR modes tomorrow.

comment:55 by Kamedo2, 8 years ago

http://i43.tinypic.com/15714c5.png
All are stereo. I listened to some of the encoded AACs, and there were no problem.

comment:56 by Kamedo2, 8 years ago

Should I use ffmpeg_g to spot the bug? Thousands of diverse sound files are now encoded to see whether it doesn't freeze or fail.

comment:57 by Kamedo2, 8 years ago

Recommended cutoff frequency for FFmpeg AAC.
http://i41.tinypic.com/28al1fn.png
psymodel.h line 32:

#define AAC_CUTOFF(s) (s->bit_rate ? FFMIN3(FFMIN3(s->bit_rate/s->channels/2, 3000 + s->bit_rate/s->channels/4, 12000 + s->bit_rate/s->channels/16), 20000, s->sample_rate / 2): (s->sample_rate / 2))

The LPF is not applied in VBR now, resulting in noticeably poor quality.

comment:58 by Kamedo2, 8 years ago

http://i40.tinypic.com/14smbo0.png
songs: 5 min snippets of pops and jazz, 44.1kHz, stereo
non-music sounds: 16 min of artificial sounds, difficult samples, speech, etc, 48kHz, stereo

LAME equivalentBitrateVBR number
160.029
-V9.9320.053
480.097
-V9640.23
-V8800.43
-V7960.55
-V61120.66
-V51280.86
1441.06
-V41601.17
-V31761.29
-V21921.43
-V12242.2
-V02564.3
2886.2
3207
3527.7
38410

comment:59 by klaussfreire, 8 years ago

How about the subjective quality on the various VBR modes, as compared to CBR (actually ABR, since a CBR setting in AAC produces ABR).

I worked hard to get good results, but there's still problematic samples, that sound better on equivalent ABR than VBR.

in reply to:  57 ; comment:60 by klaussfreire, 8 years ago

Replying to Kamedo2:

psymodel.h line 32:

#define AAC_CUTOFF(s) (s->bit_rate ? FFMIN3(FFMIN3(s->bit_rate/s->channels/2, 3000 + s->bit_rate/s->channels/4, 12000 + s->bit_rate/s->channels/16), 20000, s->sample_rate / 2): (s->sample_rate / 2))

The LPF is not applied in VBR now, resulting in noticeably poor quality.

Try this cutoff:

#define _AAC_CUTOFF(bit_rate,channels,sample_rate) (bit_rate ? FFMIN3(FFMIN3( \
    bit_rate/channels, \
    3000 + bit_rate/channels/2, \
    16000 + bit_rate/channels/8), \
    20000, \
    sample_rate / 2): (sample_rate / 2))
#define AAC_CUTOFF(s) ( \
    (s->flags & CODEC_FLAG_QSCALE) \
    ? _AAC_CUTOFF(s->bit_rate, s->channels, s->sample_rate) \
    : _AAC_CUTOFF((int)(s->bit_rate * (s->global_quality ? s->global_quality : 120) / 120.0), 2, s->sample_rate) \
)

I find it works better, the other was was pretty dull for 64k/ch, which ought to be transparent for AAC. This one also works on VBR.

by Kamedo2, 8 years ago

Attachment: ffmpeg_aacvbr_pulse1.flac added

Sound disappears for about 20ms in VBR mode -q:a 5, -q:a 10. Sounds like an annoying pulse.

in reply to:  60 comment:61 by Kamedo2, 8 years ago

Replying to klaussfreire:

Try this cutoff:

#define _AAC_CUTOFF(bit_rate,channels,sample_rate) (bit_rate ? FFMIN3(FFMIN3( \
    bit_rate/channels, \
    3000 + bit_rate/channels/2, \
    16000 + bit_rate/channels/8), \
    20000, \
    sample_rate / 2): (sample_rate / 2))
#define AAC_CUTOFF(s) ( \
    (s->flags & CODEC_FLAG_QSCALE) \
    ? _AAC_CUTOFF(s->bit_rate, s->channels, s->sample_rate) \
    : _AAC_CUTOFF((int)(s->bit_rate * (s->global_quality ? s->global_quality : 120) / 120.0), 2, s->sample_rate) \
)

I tried, but isn't this cutoff strange? It sounds like the lowpass is always 20kHz.
The problem of ffmpeg_aacvbr_pulse1.flac is solved by this.

I'm using current git head 54813 + aac-improvements-wip-v2-rclookahead.2.patch + aacpsy.c Line 308

pctx->frame_bits   = FFMIN(2560, chan_bitrate * AAC_BLOCK_SIZE_LONG / ctx->avctx->sample_rate);

comment:62 by klaussfreire, 8 years ago

LOL, sorry, the VBR condition is backwards. An old idiocy of mine, I always reverse if conditions. Kinda like coding dyslexia.

It should be

#define _AAC_CUTOFF(bit_rate,channels,sample_rate) (bit_rate ? FFMIN3(FFMIN3( \
    bit_rate/channels, \
    3000 + bit_rate/channels/2, \
    12000 + bit_rate/channels/8), \
    20000, \
    sample_rate / 2): (sample_rate / 2))
#define AAC_CUTOFF(s) ( \
    (s->flags & CODEC_FLAG_QSCALE) \
    ? _AAC_CUTOFF((int)(s->bit_rate * (s->global_quality ? s->global_quality : 120) / 120.0), 2, s->sample_rate) \
    : _AAC_CUTOFF(s->bit_rate, s->channels, s->sample_rate) \
)

Though I'm getting some weird results with very low quality settings.

comment:63 by Kamedo2, 8 years ago

Aren't you trying to access s->bit_rate when it's VBR? Or am I missing something?

comment:64 by Kamedo2, 8 years ago

Is s->global_quality different from VBR number -q:a x?

LAME equivalentStereo BitrateVBR numberRecommended cutoff
160.0294000
-V9.9320.0537000
480.0979000
-V9640.2311000
-V8800.4313000
-V7960.5515000
-V61120.6615500
-V51280.8616000
1441.0616500
-V41601.1717000
-V31761.2917500
-V21921.4318000
-V12242.219000
-V02564.320000
2886.220000
320720000
3527.720000
3841020000

in reply to:  63 ; comment:65 by klaussfreire, 8 years ago

Replying to Kamedo2:

Aren't you trying to access s->bit_rate when it's VBR? Or am I missing something?

Yes, bit_rate in that case holds the default of 128kbps. Psy does the same, but it works well since that's considered to be AAC's transparent rate. So, for VBR, you make psy work at transparent settings, and compensate bit allocation based on RD scaling.

in reply to:  64 comment:66 by klaussfreire, 8 years ago

Replying to Kamedo2:

Is s->global_quality different from VBR number -q:a x?

It's x * 120 AFAIK

comment:67 by klaussfreire, 8 years ago

I think I finally got VBR to talk to psy.

It's looking good. I'll post an updated patch with all this in a while (still lots of tests to perform)

in reply to:  65 comment:68 by Kamedo2, 8 years ago

Replying to klaussfreire:

Yes, bit_rate in that case holds the default of 128kbps. Psy does the same, but it works well since that's considered to be AAC's transparent rate.

AAC is not transparent in 128kbps stereo, although Apple used to advertise that way. http://d.hatena.ne.jp/kamedo2/20111029/1319840519

in reply to:  60 comment:69 by Kamedo2, 8 years ago

Replying to klaussfreire:

#define _AAC_CUTOFF(bit_rate,channels,sample_rate) (bit_rate ? FFMIN3(FFMIN3( \
    bit_rate/channels, \
    3000 + bit_rate/channels/2, \
    16000 + bit_rate/channels/8), \
    20000, \
    sample_rate / 2): (sample_rate / 2))
#define AAC_CUTOFF(s) ( \
    (s->flags & CODEC_FLAG_QSCALE) \
    ? _AAC_CUTOFF(s->bit_rate, s->channels, s->sample_rate) \
    : _AAC_CUTOFF((int)(s->bit_rate * (s->global_quality ? s->global_quality : 120) / 120.0), 2, s->sample_rate) \
)

I find it works better, the other was was pretty dull for 64k/ch, which ought to be transparent for AAC. This one also works on VBR.

The high cutoff causes trouble for whitenoise.flac below 55kbps.
And I'm almost certain 16kHz is optimal at 128kbps stereo.
http://d.hatena.ne.jp/kamedo2/20120221/1329845124
http://d.hatena.ne.jp/kamedo2/20120729/1343545890
http://i43.tinypic.com/cmhx3.png
http://i39.tinypic.com/2ecdv0o.png

comment:70 by Kamedo2, 8 years ago

I recommend psymodel.h line 24 to be:

#include "libavutil/libm.h"
#include "avcodec.h"

/** maximum possible number of bands */
#define PSY_MAX_BANDS 128
/** maximum number of channels */
#define PSY_MAX_CHANS 24

#define _AAC_CUTOFF(bit_rate,channels,sample_rate) (bit_rate ? FFMIN3(FFMIN3( \
    bit_rate/channels/2, \
    3000 + bit_rate/channels/4, \
    12000 + bit_rate/channels/16), \
    20000, \
    sample_rate / 2): (sample_rate / 2))
#define AAC_CUTOFF(s) ( \
    (s->flags & CODEC_FLAG_QSCALE) \
    ? _AAC_CUTOFF(((int)(135000.0f*sqrtf(s->global_quality ? s->global_quality/120.0f : 1.0f))), 2, s->sample_rate) \
    : _AAC_CUTOFF(s->bit_rate, s->channels, s->sample_rate) \
)

In this way, I can set cutoff to VBR modes as well.
PSY_MAX_CHANS 24 is to accommodate NHK 22.2ch.

I notice that in -q:a 0.2 and -q:a 0.4, the lower freq is in trouble. It sounds like a thunder far away.

comment:71 by klaussfreire, 8 years ago

Yes, I'm fixing the lower frequency right now. It's a matter with tonal band priorization that in VBR doesn't really work as intended. I'm preparing a better patch now. I'll test your cutoffs.

comment:72 by Kamedo2, 8 years ago

After applying the new LPF at comment:70, the result bitrate of music changed a bit. I think I have to replot the graph. And one more problem. -q:a 0.029 or -q:a 10 is unfriendly for an average user. I think the value should be roughly equivalent of LAME. I mean, if one use -q:a 2, the result of average sound is roughly 96kbps/channel, which is the same behavior as LAME -V2. Is applying new LPF method comment:51 easy?

comment:73 by klaussfreire, 8 years ago

After two days of toying around, the butterworth filter used in psy is actually counterproductive. Keeping all things equal, lowering the cutoff actually increases bitrate, if a fixed RD is forced. So, for VBR, it's a no-no.

I'm trying an FFT-based LP by simply zeroing coeffs, with care at the boundary to minimize ripple, and it seems to work a lot better, at least for VBR.

Right now, the implementation is just a POC. It's very dirty. But I'm getting convinced this is the way for VBR... and maybe for ABR too. I'm not sure.

Edit: And, to boot, an FFT is phase-linear. I can actually hear group delay with the butterwroth. Ugly.

Last edited 8 years ago by klaussfreire (previous) (diff)

comment:74 by Kamedo2, 8 years ago

Is that FFT, not MDCT?
I'm guessing that lowering the cutoff increases the bitrate is the effect of comment:32. Very strange, as HF contents usually takes up more bits, but it makes sense.

comment:75 by klaussfreire, 8 years ago

You're right, the one I have done right now is MDCT, because it's done within the bit allocator. But I've been meaning to implement an actual FFT filter later on, if not too hard, and if the technique pans out.

comment:76 by klaussfreire, 8 years ago

Thing is, the butterworth doesn't really remove that much content, and it changes the masking thresholds in a way that actually requires more bits to encode. A higher-order butterworth might work, but it would have way too much group delay.

comment:77 by klaussfreire, 8 years ago

BTW, wait before you redo that graph, I have a much better VBR patch almost ready.

comment:78 by klaussfreire, 8 years ago

Alright, i'm attaching a new VBR patch. CBR/ABR shouldn't have changed (shouldn't, but might). I will probably want to apply the same logic to CBR/ABR as well, since it works very well (ie: cutoff not with a filter but with the bit allocator, stop spending bits on HF if we're starving for bits).

A heads-up: VBR's q-to-kbps curve has changed, and there's some artifacts that sound like scratchy noises (especially audible in the sine sample), that are due to clipping. I think it's not specific to this patch, but I just noticed it. I'm not sure how to attack it. Normally, I'd apply compression on the IMDCT stage, but since that's on the decoder side, I'll probably have to find a clever way to predict clipping on the encoder and compensate. Craptastic.

Anyway, I do think VBR has been greatly improved on this patch. Let me know what you think.

by klaussfreire, 8 years ago

VBR improvements over wip-v2-rclookahead

comment:79 by Carl Eugen Hoyos, 8 years ago

I believe your latest patch contains trailing whitespace (that cannot be committed to FFmpeg git), consider running tools/patcheck over the diff.

comment:80 by Kamedo2, 8 years ago

I successfully applied the patch from latest git head N-54889-g47d57f2.

comment:81 by Kamedo2, 8 years ago

http://i41.tinypic.com/28mo7jn.png
Very strange behavior, and whitenoise.flac at -q:a 1 completely lacks LF contents.
Somehow, this encoder tends to omit the lowest tone in white noise which is audible.
-q:a 1.7 and -q:a 2.7 (the peak of bitrate) of whitenoise.flac is strange, too.

comment:82 by klaussfreire, 8 years ago

Yeah it seems to have an anomaly around 1. I had only tested whitenoise up to 0.7. I'll try to patch it up.

comment:83 by klaussfreire, 8 years ago

Ah, yeah, I know. It's probably the scaler offset. It must be unpredictable in whitenoise because of how flat the envelope is.

comment:84 by Kamedo2, 8 years ago

I don't recommend to ambitiously try to save the HF content above 18kHz when there are enough bits. It sounds unstable. Some 1990s early MP3 encoders had the tactic, but none of them were good. Rather, clean, fixed LPF should be applied at all time. Avoid the situation that one can hear the 12-20kHz content in some part of the music, and hearing the dull 12kHz LPF-like sound in the other part of the music.

As for

pctx->frame_bits   = FFMIN(2560, chan_bitrate * AAC_BLOCK_SIZE_LONG / ctx->avctx->sample_rate);

do we get more stable results when the number 2560 is lowered?
(240kbps is a 'megadose' or 'overkill' bitrate for AAC, so slight degradation is not a major problem.)

comment:85 by Kamedo2, 8 years ago

I notice that the LPF on some short blocks is not working in at least -q:a 0.3 and 0.4.

http://i42.tinypic.com/okor9c.png

Stereo BitrateVBR number
320.14
640.25
960.33
1280.39
1600.46
1920.55

in reply to:  84 ; comment:86 by klaussfreire, 8 years ago

Replying to Kamedo2:

I don't recommend to ambitiously try to save the HF content above 18kHz when there are enough bits. It sounds unstable. Some 1990s early MP3 encoders had the tactic, but none of them were good. Rather, clean, fixed LPF should be applied at all time. Avoid the situation that one can hear the 12-20kHz content in some part of the music, and hearing the dull 12kHz LPF-like sound in the other part of the music.

I just want to preserve the HF component of transients. There might be better ways of doing that. I guess I'll keep iterating on it. However, I believe the way it's being done now works well. If you check, the LP cutoff is chosen from the allocation given by psy. Psy contains bit reservoir logic, which means it will momentarily increase bits (and cutoff) for some difficult transients. Right now, it works wonders for hi-hats.

I will probably have to be stricter about the cutoff, though. As you say, when the signal by itself (not by psy's indication, but signal strength alone) suddenly jumps in HF content, the result is unpleasant. I think I have cleaned up most of those cases, but who knows. It's hard to discern those from actual transients.

As for

pctx->frame_bits   = FFMIN(2560, chan_bitrate * AAC_BLOCK_SIZE_LONG / ctx->avctx->sample_rate);

do we get more stable results when the number 2560 is lowered?
(240kbps is a 'megadose' or 'overkill' bitrate for AAC, so slight degradation is not a major problem.)

If it doesn't limit the ability to increase allocation for transients, it might. I'll look into it.

in reply to:  86 ; comment:87 by Kamedo2, 8 years ago

Replying to klaussfreire:

I just want to preserve the HF component of transients. There might be better ways of doing that. I guess I'll keep iterating on it. However, I believe the way it's being done now works well. If you check, the LP cutoff is chosen from the allocation given by psy. Psy contains bit reservoir logic, which means it will momentarily increase bits (and cutoff) for some difficult transients. Right now, it works wonders for hi-hats.

So, if there is a group of beat sounds that is on the threshold of tonal/transients, the LPF is sometimes on and sometimes off? Currently, the on/off switch itself is audible and is quite annoying. It sounds like a stopwatch.

I will probably have to be stricter about the cutoff, though. As you say, when the signal by itself (not by psy's indication, but signal strength alone) suddenly jumps in HF content, the result is unpleasant. I think I have cleaned up most of those cases, but who knows. It's hard to discern those from actual transients.

ffmpeg_aacvbr_pulse1.flac at -q:a 0.25 produces strange HF sounds.

by Kamedo2, 8 years ago

Attachment: ffmpeg_aacvbr_pulse2.flac added

Partial white noise, splitted by 256th lanczos filter. HF pulse noise that sounds like stopwatch is added in VBR around -a:q 0.3

in reply to:  87 comment:88 by klaussfreire, 8 years ago

Replying to Kamedo2:

Replying to klaussfreire:

I just want to preserve the HF component of transients. There might be better ways of doing that. I guess I'll keep iterating on it. However, I believe the way it's being done now works well. If you check, the LP cutoff is chosen from the allocation given by psy. Psy contains bit reservoir logic, which means it will momentarily increase bits (and cutoff) for some difficult transients. Right now, it works wonders for hi-hats.

So, if there is a group of beat sounds that is on the threshold of tonal/transients, the LPF is sometimes on and sometimes off? Currently, the on/off switch itself is audible and is quite annoying. It sounds like a stopwatch.

No, the cutoff moves up and down, but the LP remains on.

I'll have to check the sample

comment:89 by Kamedo2, 8 years ago

You seems to be using the heuristics that transients HF components are loud and tonal HF components are quiet.

in reply to:  89 comment:90 by klaussfreire, 8 years ago

Replying to Kamedo2:

You seems to be using the heuristics that transients HF components are loud and tonal HF components are quiet.

No, I let psy detect the transients. The only heuristic, is that I attempt to encode a little bit more of the HF with decreased quality.

Ie, from 0-cutoff, normal quantization. From cutoff-cutoff * 1.2, coarse (progressively coarser in fact) quantization. Now, I let bit allocation zero out beyond 1.2. I may have to force it to avoid the artifacts you mention.

comment:91 by Kamedo2, 8 years ago

Seeing the spectrogram, sometimes, up to 22kHz is encoded. No way we can hear that high. However, because of your algorithm, the cutoff seems to be much higher than it actually is, and the sound is much clearer in typical cases. But we have to be careful of exceptions. I think I feel strange when the encoded_highest_sound - normal_cutoff is more than 3kHz. Sounds something like plip, plip. Is coarse quantization at cutoff~cutoff*1.2 applied only to transients?

comment:92 by klaussfreire, 8 years ago

No, that's applied to tonal signals as well. A way to squeeze a little extra bandwidth. It proved to be a winning move for music, though I didn't test that much with noise.

Last edited 8 years ago by klaussfreire (previous) (diff)

comment:93 by Kamedo2, 8 years ago

Is that included in a wip-v3-vbr.patch, or a new feature? It sounds like the extra HF content encode is only on transients. And some transients are indeed encoded up to 22kHz.
Are HF contents over cutoff*1.2 totally discarded? (I believe this is the best move.)

comment:94 by Kamedo2, 8 years ago

The LAME sometimes acts like your algorithm, but within 2kHz or so. It's related to -Y switch, and LAME sometimes encodes 16~18kHz contents.

in reply to:  93 comment:95 by klaussfreire, 8 years ago

Replying to Kamedo2:

Is that included in a wip-v3-vbr.patch

Yes

Are HF contents over cutoff*1.2 totally discarded? (I believe this is the best move.)

No, and maybe that's the problem. 1.2 just happens to be the point at which the increased quantization floor starts zeroing out all components. Until that, RD optimization brings down the quantization floor to maintain acceptable quality, so you don't notice the floor rising (and it fact it doesn't for fully tonal bands, that's what RD optimization is about, whereas it does rise for noisy ones).

So, in essence, up to cutoff * 1.2, tonal components are retained at the expense of HF noise, which seems like a sensible tradeoff.

What must be happening, is that, on some signals, the zeroing point happens above 1.2, significantly above. So it's perhaps wise to hardcode that 1.2 value, and force a zero on those bands instead.

comment:96 by Kamedo2, 8 years ago

I think we should hardcode min(cutoff+2500, cutoff*1.2). When cutoff is 18kHz, cutoff*1.2 is 21.6kHz which is too high. Could you provide the relation between -q:a value and cutoff so we can have better grasp on what's happening?

comment:97 by klaussfreire, 8 years ago

So, I tracked the anomaly near -q:a 1 to the ESC_BT codebook. It seems when noise floors are too low, the coefficients can't be properly encoded, and all kinds of bad things ensue. I'll see how to fix it.

comment:98 by Kamedo2, 8 years ago

I noticed that this new VBR encoder has zero delay. ABR encoder at 64kbps stereo has 1 sample delay. Probably because the lack of the butterworth LPF.

comment:99 by klaussfreire, 8 years ago

That's why I want to get rid of the butterworth. It's good, but FFT is better, since it's phase-linear. With all the quantization noise I don't think we care that much about ripple, but even if we did, FFTs can be made to minimize it.

comment:100 by Kamedo2, 8 years ago

I think I can start the blind test from August 3rd. With the results, we can overwrite the outdated FFmpeg AAC Encoding Guide. https://trac.ffmpeg.org/wiki/AACEncodingGuide

comment:101 by Kamedo2, 8 years ago

Is the comment:97 fixable? I think it will contribute to higher quality in 160kbps and 192kbps. Currently, it is still worse than the mighty Apple AAC.

I assume most blocks are long(1024 samples) tonal blocks, and short, transient blocks are rare, that are apparently causing problems, am I right?

comment:102 by klaussfreire, 8 years ago

Yes, I have a fix in the works. That limitation is the reason the standard limits allocation to 3000 bits, most likely.

comment:103 by Kamedo2, 8 years ago

Isn't aaccoder.c line 787~795 strange? I believe somewhere making cutoff value or using cutoff value should be the source of the trouble, which causes weird sounds in low bitrates such as -q:a 0.25.

comment:104 by klaussfreire, 8 years ago

So, I tried a whole new approach, and it seems vastly superior.

I modified psy's "Rate control" to work differently for VBR. Instead of using the bit reservoir, it just computes the optimum PE and scales it by quality. And it works nicely. I still had to push scalers a bit more on the allocator and do the LP filtering to reach the very low bit rates with VBR, but it's sounding a lot better.

I'll do some more testing and then upload the updated patch.

comment:105 by Kamedo2, 8 years ago

Wow, great!

comment:106 by Kamedo2, 8 years ago

I inserted

av_log(NULL, AV_LOG_DEBUG, "\n cutoff=%d, lambda=%f, frame_bit_rate=%d, bandwidth=%d\n",cutoff,lambda,frame_bit_rate,bandwidth);

in aaccoder.c twoloop line 795, and found cutoff differs between different frames. I used -q:a 0.4, stereo 44.1kHz. I assume <99 cutoffs are the short blocks and 500< cutoffs are the long tonal blocks. The cutoff varies throughout the same music. 11.7k~13.6k for the short blocks, 11.5k~13.2k for the long blocks. (Calculated from the 25 raw examples below)

 cutoff=77, lambda=47.000000, frame_bit_rate=46034, bandwidth=14508

 cutoff=614, lambda=47.000000, frame_bit_rate=45648, bandwidth=14412

 cutoff=76, lambda=47.000000, frame_bit_rate=45648, bandwidth=14412

 cutoff=612, lambda=47.000000, frame_bit_rate=45417, bandwidth=14354

 cutoff=76, lambda=47.000000, frame_bit_rate=45417, bandwidth=14354

 cutoff=532, lambda=47.000000, frame_bit_rate=37937, bandwidth=12484
    Last message repeated 1 times

 cutoff=538, lambda=47.000000, frame_bit_rate=38477, bandwidth=12619
    Last message repeated 1 times
size=     242kB time=00:00:15.80 bitrate= 125.2kbits/s
 cutoff=68, lambda=47.000000, frame_bit_rate=39017, bandwidth=12754

 cutoff=544, lambda=47.000000, frame_bit_rate=39017, bandwidth=12754

 cutoff=548, lambda=47.000000, frame_bit_rate=39402, bandwidth=12850
    Last message repeated 1 times

 cutoff=551, lambda=47.000000, frame_bit_rate=39711, bandwidth=12927
    Last message repeated 1 times

 cutoff=554, lambda=47.000000, frame_bit_rate=39942, bandwidth=12985
    Last message repeated 1 times

 cutoff=69, lambda=47.000000, frame_bit_rate=40173, bandwidth=13043

 cutoff=556, lambda=47.000000, frame_bit_rate=40173, bandwidth=13043

 cutoff=69, lambda=47.000000, frame_bit_rate=40405, bandwidth=13101

 cutoff=558, lambda=47.000000, frame_bit_rate=40405, bandwidth=13101

 cutoff=561, lambda=47.000000, frame_bit_rate=40636, bandwidth=13159
    Last message repeated 1 times

 cutoff=562, lambda=47.000000, frame_bit_rate=40713, bandwidth=13178
    Last message repeated 1 times

 cutoff=71, lambda=47.000000, frame_bit_rate=41870, bandwidth=13467

 cutoff=574, lambda=47.000000, frame_bit_rate=41870, bandwidth=13467

 cutoff=79, lambda=47.000000, frame_bit_rate=47653, bandwidth=14913
    Last message repeated 1 times

 cutoff=78, lambda=47.000000, frame_bit_rate=46651, bandwidth=14662
    Last message repeated 1 times

 cutoff=76, lambda=47.000000, frame_bit_rate=45031, bandwidth=14257
    Last message repeated 1 times
[output stream 0:0 @ 04adab60] EOF on sink link output stream 0:0:default.
No more output streams to write to, finishing.

 cutoff=75, lambda=47.000000, frame_bit_rate=44337, bandwidth=14084
    Last message repeated 1 times

 cutoff=68, lambda=47.000000, frame_bit_rate=39711, bandwidth=12927
    Last message repeated 1 times
[aac @ 04aaf580] Trying to remove 504 more samples than there are in the queue
size=     253kB time=00:00:16.10 bitrate= 128.9kbits/s
video:0kB audio:250kB subtitle:0 global headers:0kB muxing overhead 1.475195%
755 frames successfully decoded, 0 decoding errors
[AVIOContext @ 04ad0440] Statistics: 30 seeks, 779 writeouts
[AVIOContext @ 04d6f8a0] Statistics: 3123324 bytes read, 2 seeks
ffmpeg54890g.exe -v 9 -loglevel 99 -i ffmpeg_aacvbr_pulse2.wav -c:a aac -strict experimental -q:a 0.4 ffmpeg_aacvbr_pulse2.mp4

I tried to automate it by batch script, including preserving the av_log output but somehow it freezes.

comment:107 by klaussfreire, 8 years ago

Don't worry, for the new patch I'm using refbits instead of destbits, refbits is a direct derivation of lambda, so it won't change. I couldn't make the changing bandwidth work in a stable fashion without a lot more work, so I'll reserve that for a further patch, maybe.

comment:108 by Kamedo2, 8 years ago

The next patch seems to be a good one.

comment:109 by Kamedo2, 8 years ago

Is the patch available now?

comment:110 by klaussfreire, 8 years ago

Patience. Later today, or perhaps tomorrow, depending on your time zone

comment:111 by klaussfreire, 8 years ago

Damn. The patch works wonderfully well in VBR, but breaks CBR. I'll have to look into it during the weekend.

Patience indeed.

comment:112 by Kamedo2, 8 years ago

Yes, the VBR sounds dull and is currently(at v3) poorer than CBR, and it should have a lot of room to improve.

comment:113 by Kamedo2, 8 years ago

I've encoded weeks of AACs using v3 patch, using diverse samples and diverse bitrates and there were no problem(empty files, return with errors, freezes).

comment:114 by Kamedo2, 8 years ago

klaussfreire, could you provide the VBR-only patch? I'd like to test it. I may be able to detect the problem(s).

by klaussfreire, 8 years ago

Improved VBR, fixed psy threshold reduction bug

comment:115 by klaussfreire, 8 years ago

Attached the current WIP.

An explanation of what caused the bug for high q values: there was a bug in psy's threshold reduction for hole avoidance. When a second pass was needed, it would accumulate errors due to a simple typo (reduction += instead of reduction =).

I don't have the 3GPP spec to check, but I just noticed the code made no sense with the +=, but did with =.

Then there's the ESC_BT thing.

I think most serious anomalies have been fixed in this bug, I haven't had time to properly test CBR, but it seems to mostly work now. That was very subtle bit reservoir a bug on my "lookahead" patch that didn't surface until I fixed psy.

Anyway, I still would like to make VBR achieve lower bitrates without having to resort to LP filtering. I somehow sense it should be possible. In any case, I made CBR also use the same scalefactor-band-based LP filtering to remove the need for the butterworth that didn't save many bits anyway, and now it responds to the -cutoff argument, so if you don't like the default cutoff you can override yourself. It seemed worth parameterizing since I've found some sources that sound better at low bit rates with higher cutoffs, and some that don't. So it's source-dependent.

Anyway, enjoy the patch, I'm not sure I'll have time to work on a more permanent (one that I'd push to trunk) one till next weekend.

comment:116 by Kamedo2, 8 years ago

Yes, the cutoff is quite source-dependent, and listener-dependent too. Older people may prefer lower cutoffs. BTW, I'm 25 yrs old.

comment:117 by Kamedo2, 8 years ago

http://i44.tinypic.com/2m43o9i.png

comment:118 by Kamedo2, 8 years ago

http://i44.tinypic.com/1zmczg5.png
aaccoder.c line 806 from

                ? (refbits * 1.6f * avctx->sample_rate / 1024) 

to

                ? (refbits * 2.5f * avctx->sample_rate / 1024) 

raises the LPF and the sound is much clearer(at the cost of more noise, but it's certainly better per real bitrate).
I feel the sound is bad in only tonal part of the music in VBR. And this encoder uses fewer bits, sometimes nearly half less, for the tonal part, unlike Opus, which has a distinctive tonality boost function.

comment:119 by klaussfreire, 8 years ago

Yes, I was in the middle of tweaking rdlambda scale for VBR (which is what gives the tonality boost). It seems way off target for VBR, since a lambda that in VBR results in 64kbps, in CBR it will give you about 32 or less.

With that properly tweaked, we can save lots of bits from noisy bands and put them to better use on tonal bands. For VBR, that means lower bitrates for the same quality level.

Increasing cutoff like you did there has the unwanted side effect of lowering quality a bit too much on tonal bands, for a set file size. I do my tests by searching through -q:a until I get a file roughly the same size as a reference CBR-encoded version, and comparing quality among those. With higher cutoffs, that procedure resulted in noticeable distortion on the HF bands, which is why I left it at 1.6, and it's what I believe will be fixed by tweaking rdlambda for VBR.

It can also be fixed by implementing codebook 13. But that's for another (future, way future) patch, since I see no easy way to implement CB 13 with twoloop, so I'll have to rewrite it.

comment:120 by Kamedo2, 8 years ago

This paper, fig. 6 shows bit allocation curves, although this is Opus.
http://jmvalin.ca/papers/aes135_opus_celt.pdf

comment:121 by klaussfreire, 8 years ago

Cool paper. Still, everything seems quite specific to Opus.

comment:122 by Kamedo2, 8 years ago

Is aaccoder.c line 829:

                if (start >= cutoff || band->energy <= (band->threshold * zeroscale) || band->threshold == 0.0) { 

correct? Not start >= cutoff+cutoff/5?

comment:123 by klaussfreire, 8 years ago

Yep, the cutoff is used as-is in this patch, the offset is already accounted for in its computation above that.

comment:124 by Kamedo2, 8 years ago

I've encoded weeks of AACs using v4 patch, using diverse samples and diverse bitrates and there were no problem(empty files, return with errors, freezes).

Is 'tweaking rdlambda for VBR' ready? If not, I think I should test v4 ABR first, because it's stable, have less artifacts in tonal samples. The blind test will be conducted in ABC/HR methodology, and there should be some opponents. I'm thinking of...

  • current git head with no patch, abr
  • v4 patch(or anything latest), abr
  • fdk-aac, abr

The bitrate will be 96kbps and 128kbps.

comment:125 by Kamedo2, 8 years ago

Or, I can drop fdk-aac and instead test on 3 bitrates. Do you have any idea?

comment:126 by Carl Eugen Hoyos, 8 years ago

Comparing with libfaac would be useful...

in reply to:  126 comment:127 by Kamedo2, 8 years ago

Replying to cehoyos:

Comparing with libfaac would be useful...

Is comment:69 not enough? (The test was in 2012 July.)

comment:128 by Carl Eugen Hoyos, 8 years ago

I thought that additional improvements were made since (and if ffaac does not beat libfaac and assuming fdk-aac beats libfaac, it might make more sense to compare with libfaac) but please don't let me misguide you.

comment:129 by Kamedo2, 8 years ago

I don't think many people will use libfaac. Both libfaac and libfdk_aac are non-free, and if many people prefer fdk-aac over faac, the new results of the new fdk-aac is more interesting than the another results of the old faac. (As far as I know, there are no blind test of fdk-aac.)

in reply to:  124 comment:130 by klaussfreire, 8 years ago

Replying to Kamedo2:

Is 'tweaking rdlambda for VBR' ready?

No, I'll have time starting tomorrow.

comment:131 by Kamedo2, 8 years ago

This is not my last test, and for a desire to compare this encoder with other encoders, I can do so later. By that time, I hope the new VBR is the state-of-the-art encoder.

comment:132 by Kamedo2, 8 years ago

I'm going to use these 20 samples below. There are six opponents(the first 3 are 96kbps, and the last 3 are 128kbps), so I have to score 6*20=120 sounds. The test is ready.
http://www.hydrogenaudio.org/forums/index.php?showtopic=98003

comment:133 by Mark, 8 years ago

Hi All,
Great to see that the native AAC encoder is getting some attention, and trying to make it mainstream. Using Windows 7 and Zeranoe's FFmpeg builds, I only get a choice of "The Native Encoder" or "libvo_aacenc".
From what I have read "libvo_aacenc" only seams to support sterio not 5.1 or higher.
I am no audiophile and a little hard of hearing so I cannot find fault with the Native Encoder but I can tell the difference between 2 and 6 channels :-)

Keep up the good work on a great piece of software.

Regards,
Mark

comment:134 by Kamedo2, 8 years ago

ffmpeg55212 -y -i input.wav -c:a aac -strict experimental -b:a 96k output.mp4
ffmpeg55212_patchv4 -y -i input.wav -c:a aac -strict experimental -b:a 96k output.mp4
ffmpeg55212 -y -i input.wav -c:a libfdk_aac -b:a 96k -afterburner 1 output.mp4

ffmpeg55212 -y -i input.wav -c:a aac -strict experimental -b:a 128k output.mp4
ffmpeg55212_patchv4 -y -i input.wav -c:a aac -strict experimental -b:a 128k output.mp4
ffmpeg55212 -y -i input.wav -c:a libfdk_aac -b:a 128k -afterburner 1 output.mp4

faad -b 4 -o output.float.wav output.mp4

The ABC/HR test is ongoing. These six outputs were shuffled and I listen to them without knowing which is which. I've done 2 samples out of 20. 10% done.

comment:136 by klaussfreire, 8 years ago

Do you have the files encoded with fdk for comparison?

by Kamedo2, 8 years ago

Attachment: fdkaac_10_12.zip added

samples #10-#12 encoded by fdkaac. *2.mp4 are the 128kbps samples, the others are the 96kbps samples.

by Kamedo2, 8 years ago

Attachment: fdkaac_13_16.zip added

samples # 10 - # 12 encoded by fdkaac. *2.mp4 are the 128kbps samples, the others are the 96kbps samples.

comment:137 by Kamedo2, 8 years ago

Oops, samples # 10 ~ # 12 and # 13 ~ # 16.

comment:138 by klaussfreire, 8 years ago

I think I've found the source of most of the "annoying" artifacts. With the recent fix to psy's hole avoidance, lots of the rate control hacks in the lookahead code are no longer necessary, since the bit reservoir now actually works. Though if I do completely disable them, the target bit rate is largely missed, so some RC stuff is still needed.

In short, RC hacks screw up on transients. I guess I'll have to explicitly limit RC hacks to non-transients (with perhaps some hysteresis). I'm working on a v5 fixing that.

Still, to get to fdk quality, I think we'll need to fix M/S encoding (which still has some artifacts, if it didn't, it can be a big efficiency bost) and implement codebook 13 (which fdk seems to use, though I haven't confirmed this). That's a much bigger project though.

comment:139 by Kamedo2, 8 years ago

Great, I'm guessing it's the reason why some samples got much poorer results than the fdk. Should I abort the v4 abr test and instead test on v5 after the release of 5? I'm on holiday now, but after August 26th, I'll move to more quiet place, so I can test more effectively.

comment:140 by klaussfreire, 8 years ago

I think I'll get you the v5 soonish, but I have an office to move this weekend so it may not be as soon as you'd like. In any case, soonish.

comment:141 by Kamedo2, 8 years ago

How is the development of v5?

in reply to:  141 comment:142 by klaussfreire, 8 years ago

Replying to Kamedo2:

How is the development of v5?

Sorry, urgent personal issues prevented me from reaching my self-imposed deadline. I'll try to dedicate some time to it as soon as I'm able, though. Next post ought to be a patch.

comment:143 by Kamedo2, 8 years ago

I resumed the ABC/HR test, and I've done 13 samples out of 20. How is the development going?

Last edited 8 years ago by Kamedo2 (previous) (diff)

in reply to:  143 comment:144 by klaussfreire, 8 years ago

Replying to Kamedo2:

I resumed the ABC/HR test, and I've done 13 samples out of 20. How is the development going?

Stalled for now, but I'll be able to resume soon

comment:145 by Timothy Gu, 8 years ago

Cc: timothygu99@gmail.com added

comment:146 by Kamedo2, 8 years ago

Thank you. Should I upload the current data?

comment:147 by klaussfreire, 8 years ago

Yes, please do. I'll make sure to address those concerns as well, and we'll save one round trip

comment:148 by Kamedo2, 8 years ago

http://i43.tinypic.com/35l9h94.png
You can download the original sound here. http://www.hydrogenaudio.org/forums/index.php?showtopic=98003

comment:149 by Kamedo2, 8 years ago

Oops, -b:a 128k, not -b:a 96k in the 128kbps exp+v4 column.
By the way, why is the FFT used in LPF? Couldn't it use MDCT and simply zeroing higher coefficients? Maybe I am missing something.

comment:150 by Kamedo2, 8 years ago

I'll finish the test soon(16/20, 80%). What should be the next opponents in the next blind listening test including the newer patch? I'm thinking of...

  • current git head with no patch, abr
  • next patch, abr
  • next patch, vbr
  • fdk-aac, abr

and possibly...

  • libopus, vbr
  • libmp3lame, vbr

Do you have any idea?

in reply to:  150 ; comment:151 by Carl Eugen Hoyos, 8 years ago

Replying to Kamedo2:

and possibly...

  • libopus, vbr
  • libmp3lame, vbr

Do you have any idea?

If you have time, it would be interesting to compare to the quality of other FFmpeg audio encoders, ie ac3, eac3 and mp2.

in reply to:  151 ; comment:152 by Kamedo2, 8 years ago

Replying to cehoyos:

If you have time, it would be interesting to compare to the quality of other FFmpeg audio encoders, ie ac3, eac3 and mp2.

It may be wrong, but I guess the ac3 is the most used variant. The bitrate will be around 128kbps, so the extremely high bitrate of eac3 will not fit the frame, I think. Are there some important use of eac3 and mp2, other than the BD and VCD encoding? (For BD the space is huge and quality at lower bitrate is insignificant.)

in reply to:  152 comment:153 by Carl Eugen Hoyos, 8 years ago

Replying to Kamedo2:

Replying to cehoyos:

If you have time, it would be interesting to compare to the quality of other FFmpeg audio encoders, ie ac3, eac3 and mp2.

It may be wrong, but I guess the ac3 is the most used variant. The bitrate will be around 128kbps, so the extremely high bitrate of eac3 will not fit the frame,

I am not sure I understand you.
Afaik, nobody ever made a listening test using different internal FFmpeg encoders (not even a very cursory one). It would be interesting to know that "96kb eac3 ~ 128 kb ac3 ~ 128kb aac ~ 256kb mp2" (I assume this isn't the case, just as an example). Even if done with much less effort than your above tests (if you just mention your impression of each encoder after a few tests), I believe this would be interesting information.
It was sometimes claimed that the wma encoders produce abysmal quality, so your comment on them (possibly with higher bitrates) would also be welcome.

I think. Are there some important use of eac3 and mp2, other than the BD and VCD encoding? (For BD the space is huge and quality at lower bitrate is insignificant.)

I believe that ac3 is a very important codec (WMP plays it out-of-the-box in different containers), knowing if eac3 beats it would be interesting.

comment:154 by Kamedo2, 8 years ago

I don't think of any good use of eac3, other than for BD. BD can have 32Mbps, and eac3 can have up to 6144kbps. If audio quality matters, simply use the maximum bitrate. And having more opponents in parallel slow down the test. However, we need a low anchor and possibly a high anchor. I think libopus will act as a high anchor and aac without patch act as a low anchor.

There are some good uses of wma, such as encoding for an old car stereo that plays MP3/WMA, but WMAEncode 0.2.9b is far more usable. The quality is in between LAME and Apple AAC.

https://trac.ffmpeg.org/wiki/GuidelinesHighQualityAudio

comment:155 by Kamedo2, 8 years ago

This document recommends to use -cutoff 15000 option. Too outdated, the cutoff is automatically applied since July 2012.
http://ffmpeg.org/ffmpeg-codecs.html#aac

This is the data I sent in 2012.
http://i41.tinypic.com/24dri41.png

By the way, the progress of the listening test is 95%(19/20) now.

comment:156 by Kamedo2, 8 years ago

I finished the test and I uploaded the results.
http://www.hydrogenaudio.org/forums/index.php?showtopic=102699
http://i42.tinypic.com/1043igy.png
http://i44.tinypic.com/2ld9bhl.png

comment:157 by klaussfreire, 8 years ago

Cool. I'll try to work on this tonight.

by klaussfreire, 8 years ago

V5 patch, twoloop RD fixed (I think)

comment:158 by klaussfreire, 8 years ago

So, I attached a patch that moves in the right direction (I think).

Most of the worse-performing samples, I noticed, had to do with hole avoidance being quickly violated when using low bit rates. So I re-did twoloop's RD improvement step to better respect hole avoidance, to be asymmetric in its scale manipulation (ie: to avoid adding all 1 or all 2, which would be quickly undone by the bitrate adjustment step), and everything seemed to work a lot better.

However, on the "asymmetric" little word, there's a huge hack involved. I wouldn't want to waste your time without a warning: this hack can most assuredly be improved. But I don't think I'll waste time improving a hack, since the real solution is to implement a dynamic programming coder, which I intend to do in the future. So while hackish and probably suboptimal, I'll probably leave it as-is since it works well enough.

I haven't tested VBR much. From what I tested, it seems mostly unharmed, but it still needs a better calibrated cutoff. That will take time (lets say it'll be v6).

So, this patch should be good enough for ABR. VBR will need a v6, and some day (time permitting) I'll post the patch with the dynamic coder.

I couldn't quite match FDK performance, but I suspect there's two reasons for this. First, M/S coding isn't as good as it should be. And 2, FDK probably uses a dynamic coder. So I think we'll catch FDK with the dynamic coder (which can also do the M/S part, so it'll fix both with one shot).

However, I tested most of the samples in your session, and they've all improved. Some more than others, of course. So, if not all the samples, you might want to retest the worst offenders.

Edit: I also haven't tested higher bit rates. I will tomorrow.

Last edited 8 years ago by klaussfreire (previous) (diff)

comment:159 by Kamedo2, 8 years ago

The v5 patch is encoding at 15-50x realtime, depending on bitrate and type of music encoded.

Last edited 8 years ago by Kamedo2 (previous) (diff)

comment:160 by Kamedo2, 8 years ago

I changed aaccoder.c line 806 from

                ? (refbits * 1.6f * avctx->sample_rate / 1024) 

to

                ? (refbits * 2.4f * avctx->sample_rate / 1024) 

This is certainly better, although exact optimal value is debatable.

I encoded 2 days of diverse sounds with many settings, and listened to 2 hours of the sounds. This encoder do a relatively good job even in abr 96kbps. It's not a blind test, but I feel the improvement. Also, I compared abr 128kbps vs vbr -q 0.3, but still, abr is better. The vbr exposes its weak point in relatively quiet, tonal sections. Low S/N and stronger LPF effect.

http://i40.tinypic.com/r1l7j4.png

comment:161 by Kamedo2, 8 years ago

I listened to about 8 hours of songs, movies, sine and white noise, and 5.1ch surround source. I'd say that abr is mature.

klaussfreire, could you add a "redirect" feature that when set bitrate is too high, redirect to the maximum bitrate possible, rather than to print the error message and stop. This simplify many batch encodes, including when encoding from hundreds of videos that have various audio frequencies and number of channels. Currently it gets:

[aac @ 013efa60] Too many bits per frame requested

Also, I notice that this commandline

ffmpeg -i ffmpeg_aacvbr_pulse1.wav -c:a aac -strict experimental -q:a 0.1 -ar 8000 -ac 1 ffmpeg_aacvbr_pulse1.mp4

gets the same Too many bits warning, and lowering the quality -q:a don't work. It only works when using -b:a, or setting higher frequency such as -ar 22050. It could be a problem when encoding from a video taken by some old digital cameras with 8kHz pcm audio attached.

The error message:

ffmpeg56470.exe -y -i ffmpeg_aacvbr_pulse1.wav -c:a aac -strict experimental -q:a
 0.3 -ar 8000 ffmpeg_aacvbr_pulse1.mp4
ffmpeg version N-56469-gf6622f9 Copyright (c) 2000-2013 the FFmpeg developers
  built on Sep 20 2013 15:29:55 with gcc 4.8.1 (GCC)
  configuration: --enable-gpl --enable-version3 --enable-nonfree --enable-libfdk
-aac --extra-ldflags=-static --extra-cflags='-march=native -mfpmath=sse' --optfl
ags=-O2
  libavutil      52. 45.100 / 52. 45.100
  libavcodec     55. 33.100 / 55. 33.100
  libavformat    55. 18.100 / 55. 18.100
  libavdevice    55.  3.100 / 55.  3.100
  libavfilter     3. 86.102 /  3. 86.102
  libswscale      2.  5.100 /  2.  5.100
  libswresample   0. 17.103 /  0. 17.103
  libpostproc    52.  3.100 / 52.  3.100
Guessed Channel Layout for  Input Stream #0.0 : stereo
Input #0, wav, from 'ffmpeg_aacvbr_pulse1.wav':
  Metadata:
    encoder         : Coderium SoundEngine 4.59
  Duration: 00:00:12.12, bitrate: 1411 kb/s
    Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16
, 1411 kb/s
[aac @ 030cbf00] Too many bits per frame requested
Output #0, mp4, to 'ffmpeg_aacvbr_pulse1.mp4':
  Metadata:
    encoder         : Coderium SoundEngine 4.59
    Stream #0:0: Audio: aac, 8000 Hz, stereo, fltp, 128 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (pcm_s16le -> aac)
Error while opening encoder for output stream #0:0 - maybe incorrect parameters
such as bit_rate, rate, width or height

I think this is about time we remove the -strict experimental flag.

in reply to:  159 comment:162 by klaussfreire, 8 years ago

Replying to Kamedo2:

The v5 patch is encoding at 15-50x realtime, depending on bitrate and type of music encoded.

I believe I may have to disappoint you there. One of the optimizations that does that, is acting up on ABR, I noticed improved quality by restricting it, so the v6 with optimized VBR will have that disabled as well (and thus be a tad slower).

I thought that optimization was result-neutral, but it seems it isn't.

comment:163 by Kamedo2, 8 years ago

15x speed is 'tolerable' :)

I've encoded more than 50GB of mp4s, including surround 5.1ch with more than 1Mbps etc... and listened to 12 hours of mainly Pop music. v5 seems to be stable. Is fixing "Too many bits per frame requested" error easy?

comment:164 by klaussfreire, 8 years ago

I can make it only applicable when using ABR, but I think it's a useful message.

I could also turn it into a warning, I think.

comment:165 by Kamedo2, 8 years ago

I prefer warnings, rather than the error messages and stop. Kind, and easier to use.

By the way, I'll be free from September 28th, and I'm considering a listening test of

  • v4 abr
  • v6 abr
  • v6 vbr
  • fdk-aac vbr
  • ac3 abr
  • libmp3lame vbr

I've got a request of testing libfaac, mp2, and eac3, but I'm running out of the "slot".
From my normal non-blind listening of average music, my current impression is:

fdk-aac > libmp3lame > v5 abr >> v4 abr > v5 vbr > ac3

comment:166 by Kamedo2, 8 years ago

v5 vbr is still quite worse than the abr. I feel that whenever tonal sounds are there, the frequency bin around the tone degrades. Tones are poorer at hiding other sounds than the noise, that's why harpsichords remains to be one of the most critical and hardest instruments to code. http://wiki.hydrogenaudio.org/index.php?title=Perceptual_Noise_Substitution

comment:167 by klaussfreire, 8 years ago

Well, v6 is almost ready. I just need to clean it up a bit. I'll probably do that tonight.

In v6, my non-blind tests make me believe that v6 vbr > v6 abr > v5 abr.

Not sure how you compare abr vs vbr, what I do is pick a file or set of files, do a binary search of the quality level that results in the same overall file size, and then compare. In that kind of test, v6 vbr sometimes requires lots more bits for some pathological files (techno seems to drive it crazy, can't blame it). I exclude those, since they're pathological.

When I push the patches to the ML, I'll make most of what makes v6 vbr go crazy on techno (the relatively high peak bit rate allowance) configurable anyway.

in reply to:  167 ; comment:168 by Kamedo2, 8 years ago

Replying to klaussfreire:

Not sure how you compare abr vs vbr, what I do is pick a file or set of files, do a binary search of the quality level that results in the same overall file size, and then compare. In that kind of test, v6 vbr sometimes requires lots more bits for some pathological files (techno seems to drive it crazy, can't blame it). I exclude those, since they're pathological.

I compare abr vs vbr by a graph. I plot a "q vs bitrate" graph over a "standard" set of large set of sounds I extracted from diverse CDs. Then, search a number of q that have the desired bitrate. Then, make sure that average tested sample bitrate isn't very far from the "standard" bitrate. This method is common in the hydrogenaudio.
http://listening-tests.hydrogenaudio.org/sebastian/mp3-128-1/index.htm


When I push the patches to the ML, I'll make most of what makes v6 vbr go crazy on techno (the relatively high peak bit rate allowance) configurable anyway.

I think it's a good idea to automatically "cap" the bitrate based on the q number. 3x of the "standard" bitrate of the q or something.

Also, I think it's beneficial for the end users to set the -q:a value and typically gets a file with the bitrate around the set value. If one sets -q:a 256k, one gets a file of roughly 256kbps. (Or 210kbps, 289kbps, etc based on the sound content, but that's fine.) iTunes have that interface, and it's easier to use. This can be controversial as people may refer to some old documents of -q:a option and try to do the same, but the problem can be avoided by moving to a "classic mode" when the value is very small, like -q:a 0.3.

in reply to:  168 ; comment:169 by klaussfreire, 8 years ago

Replying to Kamedo2:

Replying to klaussfreire:

Not sure how you compare abr vs vbr, what I do is pick a file or set of files, do a binary search of the quality level that results in the same overall file size, and then compare. In that kind of test, v6 vbr sometimes requires lots more bits for some pathological files (techno seems to drive it crazy, can't blame it). I exclude those, since they're pathological.

I compare abr vs vbr by a graph. I plot a "q vs bitrate" graph over a "standard" set of large set of sounds I extracted from diverse CDs.

Yeah, I've seen those

Then, search a number of q that have the desired bitrate. Then, make sure that average tested sample bitrate isn't very far from the "standard" bitrate.

Just how do you check bit rate? Because I've noticed ffmpeg -i file tends to give bogus rates when used on VBR-encoded files (not even average).

Also, I think it's beneficial for the end users to set the -q:a value and typically gets a file with the bitrate around the set value. If one sets -q:a 256k, one gets a file of roughly 256kbps.

That's not doable without refactoring ffmpeg. -q:a sets the global_quality parameter, which is specified to have a somewhat standardized interpretation (1.0 = 100%, what 100% means is what some other codec means by it, can't remember which OTOMH).

However, you can get (I think) a similar result by specifying both -q:a and -b:a, like so:

ffmpeg -i somefile.flac -c:a aac -b:a 256k -q:a 1 -strict experimental somefile.aac

Although that seldom gives you 256k. The bitrate there is like a lower bound (aim for 256k, spend more if needed).

in reply to:  169 ; comment:170 by Kamedo2, 8 years ago

Then, search a number of q that have the desired bitrate. Then, make sure that average tested sample bitrate isn't very far from the "standard" bitrate.

Just how do you check bit rate? Because I've noticed ffmpeg -i file tends to give bogus rates when used on VBR-encoded files (not even average).

filesize[Byte]*8/Sample_length[Sec], But be careful of very short files, it can be bogus too.

Also, I think it's beneficial for the end users to set the -q:a value and typically gets a file with the bitrate around the set value. If one sets -q:a 256k, one gets a file of roughly 256kbps.

That's not doable without refactoring ffmpeg. -q:a sets the global_quality parameter, which is specified to have a somewhat standardized interpretation (1.0 = 100%, what 100% means is what some other codec means by it, can't remember which OTOMH).

Is LAME breaking the convention?
https://trac.ffmpeg.org/wiki/Encoding%20VBR%20%28Variable%20Bit%20Rate%29%20mp3%20audio

However, you can get (I think) a similar result by specifying both -q:a and -b:a, like so:

ffmpeg -i somefile.flac -c:a aac -b:a 256k -q:a 1 -strict experimental somefile.aac

Although that seldom gives you 256k. The bitrate there is like a lower bound (aim for 256k, spend more if needed).

Thank you for the info. Your behavior seems much like the cvbr(most used mode), Apple iTunes.

in reply to:  170 comment:171 by klaussfreire, 8 years ago

Replying to Kamedo2:

Then, search a number of q that have the desired bitrate. Then, make sure that average tested sample bitrate isn't very far from the "standard" bitrate.

Just how do you check bit rate? Because I've noticed ffmpeg -i file tends to give bogus rates when used on VBR-encoded files (not even average).

filesize[Byte]*8/Sample_length[Sec], But be careful of very short files, it can be bogus too.

As long as you're not also estimating sample_length with ffmpeg, which will also give you bogus, it should be fine ;)

Also, I think it's beneficial for the end users to set the -q:a value and typically gets a file with the bitrate around the set value. If one sets -q:a 256k, one gets a file of roughly 256kbps.

That's not doable without refactoring ffmpeg. -q:a sets the global_quality parameter, which is specified to have a somewhat standardized interpretation (1.0 = 100%, what 100% means is what some other codec means by it, can't remember which OTOMH).

Is LAME breaking the convention?
https://trac.ffmpeg.org/wiki/Encoding%20VBR%20%28Variable%20Bit%20Rate%29%20mp3%20audio

I think so. At least, it seems to be backwards (higher q should mean higher quality, but lame does it backwards).

comment:172 by Kamedo2, 8 years ago

libvorbis and libfaac break the convention, too. neroAacEnc.exe have the float quality value which 0 is lowest and 1 is highest, so if unchanged, the native encoder acts much like the nero.

in reply to:  170 ; comment:173 by Timothy Gu, 8 years ago

Replying to Kamedo2:

Also, I think it's beneficial for the end users to set the -q:a value and typically gets a file with the bitrate around the set value. If one sets -q:a 256k, one gets a file of roughly 256kbps.

That's not doable without refactoring ffmpeg. -q:a sets the global_quality parameter, which is specified to have a somewhat standardized interpretation (1.0 = 100%, what 100% means is what some other codec means by it, can't remember which OTOMH).

Is LAME breaking the convention?
https://trac.ffmpeg.org/wiki/Encoding%20VBR%20%28Variable%20Bit%20Rate%29%20mp3%20audio

However, you can get (I think) a similar result by specifying both -q:a and -b:a, like so:

ffmpeg -i somefile.flac -c:a aac -b:a 256k -q:a 1 -strict experimental somefile.aac

Although that seldom gives you 256k. The bitrate there is like a lower bound (aim for 256k, spend more if needed).

Thank you for the info. Your behavior seems much like the cvbr(most used mode), Apple iTunes.

If someone is to implement cvbr, I suggest to do it like the libopus encoder wrapper, where users are allowed to choose a "vbr" option like this http://ffmpeg.org/ffmpeg-codecs.html#Option-Mapping.

in reply to:  173 comment:174 by Kamedo2, 8 years ago

If someone is to implement cvbr, I suggest to do it like the libopus encoder wrapper, where users are allowed to choose a "vbr" option like this http://ffmpeg.org/ffmpeg-codecs.html#Option-Mapping.

Timothy_Gu, Thank you for the informative link. I'd like to use options like -b:a 256k -vbr.

in reply to:  169 comment:175 by Kamedo2, 8 years ago

However, you can get (I think) a similar result by specifying both -q:a and -b:a, like so:

ffmpeg -i somefile.flac -c:a aac -b:a 256k -q:a 1 -strict experimental somefile.aac

Although that seldom gives you 256k. The bitrate there is like a lower bound (aim for 256k, spend more if needed).

I tried it over 128 different songs and the result was:

-b:a 256k -q:a 1

  • Average 247kbps
  • SD +/-33kbps
  • Min 161kbps
  • Max 300kbps

-q:a 1

  • Average 235kbps
  • SD +/-30kbps
  • Min 154kbps
  • Max 287kbps

(comment:160 change is not applied in this test.)

comment:176 by Kamedo2, 8 years ago

I'm preparing for the next listening test.

# Native aac patch v4 abr
ffmpeg55212 -y -i in.wav -c:a aac -strict experimental -b:a 128k out.mp4
ffmpeg56470 -y -i out.mp4 -c:a pcm_s32le out.32bit.wav

# Native aac patch v5 abr
ffmpeg56470 -y -i in.wav -c:a aac -strict experimental -b:a 128k out.mp4
ffmpeg56470 -y -i out.mp4 -c:a pcm_s32le out.32bit.wav

# Native aac patch v5 vbr
ffmpeg56470 -y -i in.wav -c:a aac -strict experimental -q:a 0.3 out.mp4
ffmpeg56470 -y -i out.mp4 -c:a pcm_s32le out.32bit.wav

# FDK-AAC vbr 3
ffmpeg56470 -y -i in.wav -c:a libfdk_aac -vbr 3 out.mp4
ffmpeg56470 -y -i out.mp4 -c:a pcm_s32le out.32bit.wav

# LAME vbr -V5
ffmpeg55010 -y -i in.wav -c:a libmp3lame -q:a 5 out.mp3
ffmpeg56470 -y -i out.mp3 -c:a pcm_s32le out.32bit.wav

# FFmpeg ac3 cbr
ffmpeg56470 -y -i in.wav -c:a ac3 -b:a 128k out.ac3
ffmpeg56470 -y -i out.ac3 -c:a pcm_s32le out.32bit.wav

I thought of using float 32bit as the intermediate format, but FFmpeg's float pcm_f32le had the gain half of what it should be, and even after adjusting gain, much error(average of |lossy-original|) existed, unlike faad or madplay.

This is the statistics of 25 samples I'm going to use in the test.

v4 abrv5 abrv5 vbrFDK vbrlame V5ac3
25 Average129129151122135128
25 Std.Dev553920180
25 Min107108898687128
25 Max131133257173172128
Max sample25.Reunion Blues26.French26.French10.14.29.
Std.Average128128127127130128

Unit is kbps. Std.Average is the average bitrate of my large collection of CDs encoded.

I've found that v5 vbr boosts bitrate in speech samples. The speech sample 26.French was encoded in 257kbps, more than twice bitrate than the average bitrate of large set of diverse CD sounds. Another speech sample reached 216kbps. It's a problem, hopefully fixed in the next v6 patch.

comment:177 by klaussfreire, 8 years ago

Wait a little bit, I'll get you the v6 patch asap, even if not as clean as I'd like it to be.

comment:178 by klaussfreire, 8 years ago<