Opened 11 years ago
Closed 8 years ago
#2686 closed defect (fixed)
Native AAC encoder collapses at high bitrates on some samples
Reported by: | Kamedo2 | Owned by: | klaussfreire |
---|---|---|---|
Priority: | normal | Component: | avcodec |
Version: | git-master | Keywords: | aac regression |
Cc: | klaussfreire@gmail.com, timothygu99@gmail.com, atomnuker@gmail.com, rodger.combs@gmail.com | Blocked By: | |
Blocking: | Reproduced by developer: | yes | |
Analyzed by developer: | yes |
Description
Summary of the bug:
FFmpeg native aac encoder outputs horrible sound around 256kbps or more on particular samples. It happens at higher bitrates. The quality degrades as I increase the bitrates, and become most degraded at 320-400kbps.
How to reproduce:
ffmpeg -i ffmpeg_aac320k_collapse.flac -vn -c:a aac -strict experimental -b:a 320k ffmpeg_aac320k_collapse.mp4
I couldn't reproduce the results when I trimmed the most problematic sample down to 8 seconds, but by adding 10 seconds of silence before the sample, the bug could be reproduced. So I'm going to upload the sample with 10 seconds of silence attached. The native aac encoder was ok on many music clips at 320kbps, and only some clips exhibit noticeably bad quality aac files, to an extent I'd call it 'bug'.
Console Output:
ffmpeg version N-54096-ge41bf19 Copyright (c) 2000-2013 the FFmpeg developers built on Jun 19 2013 00:20:06 with gcc 4.8.1 (GCC) configuration: --enable-gpl --enable-version3 --enable-libmp3lame --enable-lib vorbis --enable-nonfree --enable-libfdk-aac --enable-libvo_aacenc --enable-libfa ac --extra-ldflags=-static --extra-cflags='-march=nocona -mfpmath=sse' --optflag s=-O2 libavutil 52. 37.101 / 52. 37.101 libavcodec 55. 16.100 / 55. 16.100 libavformat 55. 9.100 / 55. 9.100 libavdevice 55. 2.100 / 55. 2.100 libavfilter 3. 77.101 / 3. 77.101 libswscale 2. 3.100 / 2. 3.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 3.100 / 52. 3.100 [flac @ 0003f160] max_analyze_duration 5000000 reached at 5015510 microseconds Input #0, flac, from '05-true_my_heart_2m50s.flac': Duration: 00:00:18.01, bitrate: 573 kb/s Stream #0:0: Audio: flac, 44100 Hz, stereo, s16 Output #0, mp4, to '05-true_my_heart_2m50s_320k.mp4': Metadata: encoder : Lavf55.9.100 Stream #0:0: Audio: aac ([64][0][0][0] / 0x0040), 44100 Hz, stereo, fltp, 32 0 kb/s Stream mapping: Stream #0:0 -> #0:0 (flac -> aac) Press [q] to stop, [?] for help size= 331kB time=00:00:18.01 bitrate= 150.4kbits/s video:0kB audio:327kB subtitle:0 global headers:0kB muxing overhead 1.151111%
Attachments (68)
Change History (577)
by , 11 years ago
Attachment: | ffmpeg_aac320k_collapse.flac added |
---|
by , 11 years ago
Attachment: | ffmpeg_aac320k_collapse2.flac added |
---|
A sound that degrades on FFmpeg native aac encoder. Sounds like a spray can. Billie Holiday : I'm A Fool To Want You (trimmed to 20sec, first and last)
comment:1 by , 11 years ago
Component: | FFmpeg → avcodec |
---|---|
Keywords: | native encoder sound quality 256kbps 320kbps removed |
Version: | 1.0.7 → git-master |
Did the output (aac) files sound better with the (original!) release 1.2?
(Not a later release of the 1.2 series.)
comment:2 by , 11 years ago
Yes, the output aac files sounded better with release 1.2.1 I've downloaded from
http://www.ffmpeg.org/releases/ffmpeg-1.2.1.tar.bz2
Still, the quality of the native aac at 320kbps is poorer than the native aac 256kbps.
ffmpeg version 1.2.1 Copyright (c) 2000-2013 the FFmpeg developers built on Jun 19 2013 12:38:13 with gcc 4.8.1 (GCC) configuration: --enable-gpl --enable-version3 --enable-libmp3lame --enable-lib vorbis --enable-nonfree --enable-libfdk-aac --enable-libvo_aacenc --enable-libfa ac --extra-ldflags=-static --extra-cflags='-march=nocona -mfpmath=sse' --optflag s=-O2 libavutil 52. 18.100 / 52. 18.100 libavcodec 54. 92.100 / 54. 92.100 libavformat 54. 63.104 / 54. 63.104 libavdevice 54. 3.103 / 54. 3.103 libavfilter 3. 42.103 / 3. 42.103 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 2.100 / 52. 2.100 [flac @ 01405c20] max_analyze_duration 5000000 reached at 5015510 microseconds Input #0, flac, from 'ffmpeg_aac320k_collapse.flac ': Duration: 00:00:18.01, bitrate: 573 kb/s Stream #0:0: Audio: flac, 44100 Hz, stereo, s16 Output #0, mp4, to 'ffmpeg_aac320k_collapse.mp4': Metadata: encoder : Lavf54.63.104 Stream #0:0: Audio: aac ([64][0][0][0] / 0x0040), 44100 Hz, stereo, fltp, 32 0 kb/s Stream mapping: Stream #0:0 -> #0:0 (flac -> aac) Press [q] to stop, [?] for help size= 289kB time=00:00:18.01 bitrate= 131.3kbits/s video:0kB audio:285kB subtitle:0 global headers:0kB muxing overhead 1.321136%
comment:3 by , 11 years ago
Oops, you said original release 1.2.
Release 1.2 and 1.2.1 had the same behavior -- the first sample collapses at 432-464kbps.
As for N-54096-ge41bf19 I've got from git -- the first sample collapses at 256-432kbps.
These two groups have the distinct "degradation range". Release 1.2 and 1.2.1 have much narrower degradation range, and the 1.2* is less severe at the range. N-54096-ge41bf19 at 352kbps is the worst quality.
ffmpeg version 1.2 Copyright (c) 2000-2013 the FFmpeg developers built on Jun 20 2013 03:06:34 with gcc 4.8.1 (GCC) configuration: --enable-version3 --enable-nonfree --enable-libfdk-aac --extra- ldflags=-static --extra-cflags='-march=native' --optflags=-O2 libavutil 52. 18.100 / 52. 18.100 libavcodec 54. 92.100 / 54. 92.100 libavformat 54. 63.104 / 54. 63.104 libavdevice 54. 3.103 / 54. 3.103 libavfilter 3. 42.103 / 3. 42.103 libswscale 2. 2.100 / 2. 2.100 libswresample 0. 17.102 / 0. 17.102 [flac @ 03295c20] max_analyze_duration 5000000 reached at 5015510 microseconds Input #0, flac, from 'C:\Users\PCC\Documents\ABC-HR\ffmpeg_aac320k_collapse.flac ': Duration: 00:00:18.01, bitrate: 573 kb/s Stream #0:0: Audio: flac, 44100 Hz, stereo, s16 Output #0, mp4, to 'C:\Users\PCC\Documents\ABC-HR\05-true_my_heart_2m50s_320k_12 .mp4': Metadata: encoder : Lavf54.63.104 Stream #0:0: Audio: aac ([64][0][0][0] / 0x0040), 44100 Hz, stereo, fltp, 32 0 kb/s Stream mapping: Stream #0:0 -> #0:0 (flac -> aac) Press [q] to stop, [?] for help size= 289kB time=00:00:18.01 bitrate= 131.3kbits/s video:0kB audio:285kB subtitle:0 global headers:0kB muxing overhead 1.321136%
comment:4 by , 11 years ago
This patch I'm going to attach fixes both issues. But I must warn that it's a WIP, I still have to split it into individual issues and fix a bug it exhibits in rare circumstances when working in VBR mode.
by , 11 years ago
Attachment: | aac-improvements-wip.patch added |
---|
AAC native encoder improvements, work in progress
comment:5 by , 11 years ago
Keywords: | regression added |
---|---|
Reproduced by developer: | set |
Status: | new → open |
comment:6 by , 11 years ago
I appreciate your effort, klaussfreire.
I want to test the aac-improvements-wip.patch, but how can I do that?
/c/mingw/ffmpeg/ffmpeg-1.2 $ patch -u -p1 < aac-improvements-wip.patch patching file libavcodec/aaccoder.c Hunk #3 FAILED at 711. Hunk #4 succeeded at 776 (offset -5 lines). Hunk #5 succeeded at 818 (offset -5 lines). Hunk #6 FAILED at 845. Hunk #7 FAILED at 1055. Hunk #8 FAILED at 1068. Hunk #9 FAILED at 1092. Hunk #10 FAILED at 1110. 6 out of 10 hunks FAILED -- saving rejects to file libavcodec/aaccoder.c.rej patching file libavcodec/aacenc.c Hunk #3 FAILED at 622. 1 out of 3 hunks FAILED -- saving rejects to file libavcodec/aacenc.c.rej patching file libavcodec/aacpsy.c Hunk #1 succeeded at 293 (offset -4 lines). Hunk #2 succeeded at 385 (offset -4 lines). Hunk #3 succeeded at 646 (offset -33 lines). patching file libavcodec/psymodel.h
comment:7 by , 11 years ago
Without trying myself, I would bet that the patch only applies to current git head.
comment:8 by , 11 years ago
I tried $ git clone git://source.ffmpeg.org/ffmpeg.git, but still, the patch fails.
comment:10 by , 11 years ago
I tried the wip patch again. No good. I think the patch is broken.
$ patch -p1 < aac-improvements-wip.patch patching file libavcodec/aaccoder.c Hunk #3 FAILED at 711. Hunk #4 succeeded at 776 (offset -5 lines). Hunk #5 succeeded at 818 (offset -5 lines). Hunk #6 FAILED at 845. Hunk #7 FAILED at 1055. Hunk #8 FAILED at 1068. Hunk #9 FAILED at 1092. Hunk #10 FAILED at 1110. 6 out of 10 hunks FAILED -- saving rejects to file libavcodec/aaccoder.c.rej patching file libavcodec/aacenc.c Hunk #1 FAILED at 591. Hunk #2 FAILED at 609. Hunk #3 FAILED at 621. 3 out of 3 hunks FAILED -- saving rejects to file libavcodec/aacenc.c.rej patching file libavcodec/aacpsy.c Hunk #1 succeeded at 299 (offset 2 lines). Hunk #2 succeeded at 391 (offset 2 lines). Hunk #3 succeeded at 681 (offset 2 lines). patching file libavcodec/psymodel.h
by , 11 years ago
Attachment: | ffmpeg_aac320k_collapse3.flac added |
---|
A sound that degrades on FFmpeg native aac encoder. Euphoria - Yui Makino [VTCL-35073][06.4.26] Track04 Amefuribana(inst.) 2:45~2:55
comment:11 by , 11 years ago
I successfully applied the patch. klaussfreire's repository is in here. http://ffmpeg.org/pipermail/ffmpeg-devel/2013-May/143216.html
Or, you can use https://dl.dropboxusercontent.com/u/81238453/aac.patch (Thank you Takuan @K4095) to patch from current git head.
However, still, it has a distinctive bug. The sound disappears partially when the sound is white noise-like.
The bug #2706 was that the sound warbles when the sound was a sine wave. That was solved by this patch, but this creates new problem.
ffmpeg54292 -v 9 -loglevel 99 -filter_complex "aevalsrc=-0.5+random(0)" -c:a aac -strict experimental -ar 4 4100 -ac 2 -b:a 256k -t 4 "C:\Users\PCC\Documents\ABC-HR\whitenoise_256k.mp4" ffmpeg version N-54292-g97947d9 Copyright (c) 2000-2013 the FFmpeg developers built on Jun 30 2013 20:34:13 with gcc 4.8.1 (GCC) configuration: --enable-gpl --enable-version3 --enable-nonfree --enable-libfdk -aac --extra-ldflags=-static --extra-cflags='-march=nocona -mfpmath=sse' --optfl ags=-O2 libavutil 52. 38.100 / 52. 38.100 libavcodec 55. 18.100 / 55. 18.100 libavformat 55. 10.100 / 55. 10.100 libavdevice 55. 2.100 / 55. 2.100 libavfilter 3. 77.101 / 3. 77.101 libswscale 2. 3.100 / 2. 3.100 libswresample 0. 17.102 / 0. 17.102 libpostproc 52. 3.100 / 52. 3.100 Splitting the commandline. Reading option '-v' ... matched as option 'v' (set logging level) with argument '9'. Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument '99'. Reading option '-filter_complex' ... matched as option 'filter_complex' (create a complex filtergraph) with argument 'aevalsrc=-0.5+random(0)'. Reading option '-c:a' ... matched as option 'c' (codec name) with argument 'aac' . Reading option '-strict' ... matched as AVOption 'strict' with argument 'experim ental'. Reading option '-ar' ... matched as option 'ar' (set audio sampling rate (in Hz) ) with argument '44100'. Reading option '-ac' ... matched as option 'ac' (set number of audio channels) w ith argument '2'. Reading option '-b:a' ... matched as option 'b' (video bitrate (please use -b:v) ) with argument '256k'. Reading option '-t' ... matched as option 't' (record or transcode "duration" se conds of audio/video) with argument '4'. Reading option 'C:\Users\PCC\Documents\ABC-HR\whitenoise_256k.mp4' ... matched a s output file. Finished splitting the commandline. Parsing a group of options: global . Applying option v (set logging level) with argument 9. Applying option filter_complex (create a complex filtergraph) with argument aeva lsrc=-0.5+random(0). Successfully parsed a group of options. Parsing a group of options: output file C:\Users\PCC\Documents\ABC-HR\whitenoise _256k.mp4. Applying option c:a (codec name) with argument aac. Applying option ar (set audio sampling rate (in Hz)) with argument 44100. Applying option ac (set number of audio channels) with argument 2. Applying option b:a (video bitrate (please use -b:v)) with argument 256k. Applying option t (record or transcode "duration" seconds of audio/video) with a rgument 4. Successfully parsed a group of options. Opening an output file: C:\Users\PCC\Documents\ABC-HR\whitenoise_256k.mp4. detected 8 logical cores [Parsed_aevalsrc_0 @ 0140bea0] compat: called with args=[-0.5+random(0)] [Parsed_aevalsrc_0 @ 0140bea0] Setting 'exprs' to value '-0.5+random(0)' [audio format for output stream 0:0 @ 01412880] Setting 'sample_fmts' to value ' fltp' [audio format for output stream 0:0 @ 01412880] Setting 'sample_rates' to value '44100' [audio format for output stream 0:0 @ 01412880] Setting 'channel_layouts' to val ue '0x3' Successfully opened the file. [audio format for output stream 0:0 @ 01412880] auto-inserting filter 'auto-inse rted resampler 0' between the filter 'Parsed_aevalsrc_0' and the filter 'audio f ormat for output stream 0:0' [AVFilterGraph @ 0039f3c0] query_formats: 3 queried, 6 merged, 3 already done, 0 delayed [Parsed_aevalsrc_0 @ 0140bea0] sample_rate:44100 chlayout:mono duration:-1.00000 0 [auto-inserted resampler 0 @ 0039f2a0] [SWR @ 00393160] Using double precision m ode 0.707107 0.707107 [auto-inserted resampler 0 @ 0039f2a0] ch:1 chl:mono fmt:dblp r:44100Hz -> ch:2 chl:stereo fmt:fltp r:44100Hz Output #0, mp4, to 'C:\Users\PCC\Documents\ABC-HR\whitenoise_256k.mp4': Metadata: encoder : Lavf55.10.100 Stream #0:0, 0, 1/44100: Audio: aac ([64][0][0][0] / 0x0040), 44100 Hz, ster eo, fltp, 256 kb/s Stream mapping: aevalsrc -> Stream #0:0 (aac) Press [q] to stop, [?] for help No more output streams to write to, finishing. size= 141kB time=00:00:04.01 bitrate= 288.4kbits/s video:0kB audio:140kB subtitle:0 global headers:0kB muxing overhead 1.001409% 0 frames successfully decoded, 0 decoding errors [AVIOContext @ 0141b640] Statistics: 30 seeks, 197 writeouts
The output mp4 I'm going to post sounds nothing like white noise.
by , 11 years ago
Attachment: | whitenoise_256k.mp4 added |
---|
White noise, encoded by native aac encoder at 256kbps. The sound is obviously collapsed.
comment:12 by , 11 years ago
Another bug, typically happens when hi-hats are present. The sound disappears for about 20ms.
Short, but it's still audible and sounds like a annoying pulse.
When these problems are solved, I'm going to conduct an extensive blind listening test, to assess sound quality of AAC encoders available from FFmpeg.
comment:13 by , 11 years ago
follow-up: 15 comment:14 by , 11 years ago
Sorry, I expected to get email notifications, but got none.
That bug is probably a ratecontrol bug I thought I had erradicated. I'll try to test with white noise, but just in case the exact input matters, can you attach a flac version?
comment:15 by , 11 years ago
Replying to klaussfreire:
Sorry, I expected to get email notifications, but got none.
You will get them if you add yourself to CC.
follow-up: 19 comment:16 by , 11 years ago
Cc: | added |
---|
comment:17 by , 11 years ago
In aacenc.c, changing
s->lambda *= ratio
by
s->lambda *= sqrtf(sqrtf(ratio));
Fixes the white nose thing, so indeed it's RC messup.
But that brings some other trouble in more normal signals, so I guess I'll have to play with RC a little bit more.
by , 11 years ago
Attachment: | Whitenoise.flac added |
---|
White noise, created by SoundEngine Free ver.4.59. Using aevalsrc as in comment:11 do the same job.
comment:19 by , 11 years ago
Replying to klaussfreire:
You may also want to look at ticket #2706.
(Is it a duplicate of this ticket?)
comment:20 by , 11 years ago
Replying to klaussfreire:
I think AAC's ratecontrol needs a lookahead buffer.
Can you implement the feature until July 13th?
I'm going to be free and have time to do some double-blind listening tests of the codec.
Results will be like this: http://www.hydrogenaudio.org/forums/index.php?showtopic=100896
follow-up: 22 comment:21 by , 11 years ago
Maybe a very simple one-block one. I've been thinking such a simple lookahead might be enough to fix the bugs, with a better one perhaps for a further patch.
I'll give this high priority, but we're only 3 days away from that deadline you know...
comment:22 by , 11 years ago
Replying to klaussfreire:
Maybe a very simple one-block one. I've been thinking such a simple lookahead might be enough to fix the bugs, with a better one perhaps for a further patch.
I'll give this high priority, but we're only 3 days away from that deadline you know...
Thank you very much! A delay of some days is acceptable.
follow-up: 24 comment:23 by , 11 years ago
Alright, attaching another version. This seems to work better, but it's a bit rushed. I'll try to improve on it, but if I delay, feel free to test this version.
by , 11 years ago
Attachment: | aac-improvements-wip-v2-rclookahead.patch added |
---|
Second version of AAC improvements, with improvements on rate control, hopefully gets rid of all remaining "collapsations on high bit rates". Tested various music tracks on 64k, 128k, 256k and 384k.
comment:24 by , 11 years ago
Replying to klaussfreire:
Alright, attaching another version.
The patch does not apply here to current git head.
comment:25 by , 11 years ago
The patch does not apply, neither. I read http://ffmpeg.org/pipermail/ffmpeg-devel/2013-May/143216.html and http://ffmpeg.org/pipermail/ffmpeg-devel/2013-May/143222.html and guessed what should I do, but still, it fails.
by , 11 years ago
Attachment: | aac-improvements-wip-v2-rclookahead.2.patch added |
---|
Second version of AAC improvements, with improvements on rate control, hopefully gets rid of all remaining "collapsations on high bit rates". Tested various music tracks on 64k, 128k, 256k and 384k.
comment:26 by , 11 years ago
Yes, sorry, I'm not working on a clean checkout.
I should move to a clean checkout.
There I attached a rebased patch.
comment:27 by , 11 years ago
Very good one! The only serious artifact I've heard so far is whitenoise.flac at 8, 16, 24, 32kbps and 192kbps.
follow-up: 29 comment:28 by , 11 years ago
Whitenoise.flac at 384kbps, ffmpeg_aac320k_collapse.flac at 320kbps is strange, too.
comment:29 by , 11 years ago
Replying to Kamedo2:
Whitenoise.flac at 384kbps, ffmpeg_aac320k_collapse.flac at 320kbps is strange, too.
I didn't try the collapse ones at 320k, though I tried at 384 and sounded nice. I'll try again when I have a chance though.
However, whitenoise 384 gives me an error, seems 384kbps is too much for mono. The whitenoise I mention is generated with the random generator, I'll try with the flac first chance I get.
comment:30 by , 11 years ago
Isn't the lower spreading function applied too much? The quality of lower frequency is bad when the higher frequency bin is strong. And what makes 320kbps particularly bad? The quality degrades when we have enough ('overkill') bits. I think something fatal is happening, like integer overflow or something.
by , 11 years ago
Attachment: | ffmpeg_aac320k_collapse4.flac added |
---|
A sound that degrades on FFmpeg native aac encoder.
comment:31 by , 11 years ago
Isn't line 334 of libavcodec/aacpsy.c:
for (g = 0; g < ctx->num_bands[j]-1; g++) { AacPsyCoeffs *coeff = &coeffs[g]; float bark_width = coeffs[g+1].barks - coeffs->barks; coeff->spread_low[0] = pow(10.0, -bark_width * PSY_3GPP_THR_SPREAD_LOW); coeff->spread_hi [0] = pow(10.0, -bark_width * PSY_3GPP_THR_SPREAD_HI); coeff->spread_low[1] = pow(10.0, -bark_width * en_spread_low); coeff->spread_hi [1] = pow(10.0, -bark_width * en_spread_hi); pe_min = bark_pe * bark_width; minsnr = exp2(pe_min / band_sizes[g]) - 1.5f; coeff->min_snr = av_clipf(1.0f / minsnr, PSY_SNR_25DB, PSY_SNR_1DB); }
strange? I doubt the sanity of lower spreading function at the highest band, because using -cutoff 18000 option improves the quality on problematic samples, and these problematic samples always includes strong 20-22kHz sounds. (The default cutoff is 18k at 192kbps, 20k at 256kbps, and 22k at 320kbps.)
by , 11 years ago
Attachment: | 18.6_22kHz_noise.flac added |
---|
Partial white noise, clipped by 256th-order lanczos function, to include only signals between 18.6 and 22kHz. the signal wanders around the freq.
comment:32 by , 11 years ago
I've got it. When the native aac encoder calcs a masking curve, almost inaudible sounds like 18kHz, 20kHz, 22kHz is taking into account, and audible sound like 14kHz is masked by the inaudibles. Add the inaudible noise above to the source sound and the encoded sound will be significantly degraded. I recommend that any signals above 16kHz is disregarded in psychoacoustic engines.
comment:33 by , 11 years ago
Alright. Good catch.
I'd recommend not ignoring, because masking within that band will still be important for bit allocation purposes. Rather, back-spreading rolloff (towards the lower frequencies) should be tweaked a bit.
comment:34 by , 11 years ago
Things start to make sense.
Could you tweak the back-spreading and provide the patch for me? I'd like to test that.
by , 11 years ago
Attachment: | ffmpeg_aac320k_collapse5.flac added |
---|
A sound that degrades on FFmpeg native aac encoder.
comment:36 by , 11 years ago
-cutoff 18000 seems to work, but the lowpass filter is too dull, compared to many practical encoders. libavcodec/psymodel.c has the constant FILT_ORDER, and changing the order from 4 to 8 sharpens the filter. But 12 and 16 fails somehow.
comment:37 by , 11 years ago
I hope you're testing with good headphones. HF quality is hard to gauge with speakers, especially since good speakers cost a fortune.
follow-up: 40 comment:39 by , 11 years ago
Replying to Kamedo2:
Yes, I'm testing with good headphones.
The reason I mention this is because, from my experience, FAAC tends to have a low cutoff for some bitrates, that seem optimal with speakers, but sound noticeably dull with headphones.
comment:40 by , 11 years ago
Replying to klaussfreire:
The reason I mention this is because, from my experience, FAAC tends to have a low cutoff for some bitrates, that seem optimal with speakers, but sound noticeably dull with headphones.
Exactly. FAAC cutoff is rather annoyingly low in 96kbps, 64kbps, and 32kbps, and the filter is the major reason why FAAC never beats Nero.
BTW, any prospects for fixing samples 1, 4, 5, and white noise? 4 and 5 is bad at 320kbps and whitenoise.flac is bad at 384kbps. Both regain quality by -cutoff 18000.
comment:41 by , 11 years ago
from line 300:
const int chan_bitrate = ctx->avctx->bit_rate / ((ctx->avctx->flags & CODEC_FLAG_QSCALE) ? 2.0f : ctx->avctx->channels);
to:
const int chan_bitrate = FFMIN(ctx->avctx->bit_rate, 240000) / ((ctx->avctx->flags & CODEC_FLAG_QSCALE) ? 2.0 : ctx->avctx->channels);
significantly improves the quality. Bitrates remain relatively high in this change.
I have not tested all cases, but it works on 256kbps, 320kbps, and 384kbps on many sounds.
comment:42 by , 11 years ago
I've listened to over 100 samples of diverse music and speech records. No problem so far. It works on 96, 112, 128,... 256kbps, but hangs on 288kbps.
follow-up: 44 comment:43 by , 11 years ago
Yeah, but because you're capping psy's bitrate target to non-problematic rates. I don't think that's ideal, though that indeed proves the problem lies in psy.
comment:44 by , 11 years ago
Replying to klaussfreire:
Yeah, but because you're capping psy's bitrate target to non-problematic rates. I don't think that's ideal, though that indeed proves the problem lies in psy.
Rates go up even after capping. So it's not merely a cap. I think we're close to the solution.
comment:45 by , 11 years ago
They go up because twoloop will push all scalefactors down uniformly until it achieves the desired bitrate, but:
- It won't work with VBR, VBR almost wholly depends on psy to dictate scalefactor band noise floors. Twoloop will push scalefactors down a bit more I think but not much at those high bitrates
- It's still suboptimal, it's better to let psy decide, since psy understands perceptual entropy better
Sadly, I didn't have time today to work on it. Lets hope I can do so tomorrow. With your analysis I'm confident I can patch psy without having to cap anything.
comment:47 by , 11 years ago
Reading the specs right now. I had a hunch that the spec might say something about this.
comment:48 by , 11 years ago
There. Line 308:
pctx->frame_bits = chan_bitrate * AAC_BLOCK_SIZE_LONG / ctx->avctx->sample_rate;
Must be
pctx->frame_bits = FFMIN(3000, chan_bitrate * AAC_BLOCK_SIZE_LONG / ctx->avctx->sample_rate);
That is indeed said on the spec.
Step 15 of subpart 4: Steps in threshold calculation: then bit allocation is limited to 0 < bit_allocation < 3000. It seems they thought of it all.
comment:49 by , 11 years ago
Great! I'm goint to have time to test that improvement 5 hours later, so I'm going to test that. Extensively. And I think I have to look for ways to sharpen the LPF, using more order, at the cost of more computational time. Currently it's not very clear cut.
comment:50 by , 11 years ago
2560 (the number you found) works better for us though. That's certainly in relation to some deficiency in twoloop, but hey. Lets just document that this should be a 3000 but can't and be done.
comment:51 by , 11 years ago
The LPF could be accomplished by zeroing the coefficients in the FFT. To get the lowest possible ripple, the boundary coefficient needs some care, but AFAIR it's the best method, and it's free for something that's already doing FFT.
comment:52 by , 11 years ago
It's not a regression, but surround bitrate seems to be capped and do not change by -b:a 256k, 320k, 384k.
Surround sample file is in here. http://people.xiph.org/~xiphmont/demo/opus/demo3.shtml
I'm currently using tx->frame_bits = FFMIN(3000,...
No obvious bugs so far.
comment:53 by , 11 years ago
I used tx->frame_bits = FFMIN(2560, and psymodel.h line 32:
#define AAC_CUTOFF(s) (s->bit_rate ? FFMIN3(FFMIN3(s->bit_rate/s->channels/2, 4000 + s->bit_rate/s->channels/4, 12000 + s->bit_rate/s->channels/16), 20000, s->sample_rate / 2): (s->sample_rate / 2))
This is better on mono, surround, and on very low bitrates(such as 32kbps stereo).
truncut.wav has few HF content, so the bitrate saturates in 172kbps.
comment:54 by , 11 years ago
In 4 hours of hearing more than 100 musical, vocal, ambient and artificial sounds, on 64-480kbps, 44.1kHz, 48kHz, stereo, surround, I have found no problematic samples. This solution is great. Thank you for fixing, klaussfreire.
I think I'm going to test mono, collecting more surround samples to test, 32kHz or less, and VBR modes tomorrow.
comment:55 by , 11 years ago
comment:56 by , 11 years ago
Should I use ffmpeg_g to spot the bug? Thousands of diverse sound files are now encoded to see whether it doesn't freeze or fail.
follow-up: 60 comment:57 by , 11 years ago
Recommended cutoff frequency for FFmpeg AAC.
psymodel.h line 32:
#define AAC_CUTOFF(s) (s->bit_rate ? FFMIN3(FFMIN3(s->bit_rate/s->channels/2, 3000 + s->bit_rate/s->channels/4, 12000 + s->bit_rate/s->channels/16), 20000, s->sample_rate / 2): (s->sample_rate / 2))
The LPF is not applied in VBR now, resulting in noticeably poor quality.
comment:58 by , 11 years ago
songs: 5 min snippets of pops and jazz, 44.1kHz, stereo
non-music sounds: 16 min of artificial sounds, difficult samples, speech, etc, 48kHz, stereo
LAME equivalent | Bitrate | VBR number |
---|---|---|
16 | 0.029 | |
-V9.9 | 32 | 0.053 |
48 | 0.097 | |
-V9 | 64 | 0.23 |
-V8 | 80 | 0.43 |
-V7 | 96 | 0.55 |
-V6 | 112 | 0.66 |
-V5 | 128 | 0.86 |
144 | 1.06 | |
-V4 | 160 | 1.17 |
-V3 | 176 | 1.29 |
-V2 | 192 | 1.43 |
-V1 | 224 | 2.2 |
-V0 | 256 | 4.3 |
288 | 6.2 | |
320 | 7 | |
352 | 7.7 | |
384 | 10 |
comment:59 by , 11 years ago
How about the subjective quality on the various VBR modes, as compared to CBR (actually ABR, since a CBR setting in AAC produces ABR).
I worked hard to get good results, but there's still problematic samples, that sound better on equivalent ABR than VBR.
follow-ups: 61 69 comment:60 by , 11 years ago
Replying to Kamedo2:
psymodel.h line 32:
#define AAC_CUTOFF(s) (s->bit_rate ? FFMIN3(FFMIN3(s->bit_rate/s->channels/2, 3000 + s->bit_rate/s->channels/4, 12000 + s->bit_rate/s->channels/16), 20000, s->sample_rate / 2): (s->sample_rate / 2))The LPF is not applied in VBR now, resulting in noticeably poor quality.
Try this cutoff:
#define _AAC_CUTOFF(bit_rate,channels,sample_rate) (bit_rate ? FFMIN3(FFMIN3( \ bit_rate/channels, \ 3000 + bit_rate/channels/2, \ 16000 + bit_rate/channels/8), \ 20000, \ sample_rate / 2): (sample_rate / 2)) #define AAC_CUTOFF(s) ( \ (s->flags & CODEC_FLAG_QSCALE) \ ? _AAC_CUTOFF(s->bit_rate, s->channels, s->sample_rate) \ : _AAC_CUTOFF((int)(s->bit_rate * (s->global_quality ? s->global_quality : 120) / 120.0), 2, s->sample_rate) \ )
I find it works better, the other was was pretty dull for 64k/ch, which ought to be transparent for AAC. This one also works on VBR.
by , 11 years ago
Attachment: | ffmpeg_aacvbr_pulse1.flac added |
---|
Sound disappears for about 20ms in VBR mode -q:a 5, -q:a 10. Sounds like an annoying pulse.
comment:61 by , 11 years ago
Replying to klaussfreire:
Try this cutoff:
#define _AAC_CUTOFF(bit_rate,channels,sample_rate) (bit_rate ? FFMIN3(FFMIN3( \ bit_rate/channels, \ 3000 + bit_rate/channels/2, \ 16000 + bit_rate/channels/8), \ 20000, \ sample_rate / 2): (sample_rate / 2)) #define AAC_CUTOFF(s) ( \ (s->flags & CODEC_FLAG_QSCALE) \ ? _AAC_CUTOFF(s->bit_rate, s->channels, s->sample_rate) \ : _AAC_CUTOFF((int)(s->bit_rate * (s->global_quality ? s->global_quality : 120) / 120.0), 2, s->sample_rate) \ )
I tried, but isn't this cutoff strange? It sounds like the lowpass is always 20kHz.
The problem of ffmpeg_aacvbr_pulse1.flac is solved by this.
I'm using current git head 54813 + aac-improvements-wip-v2-rclookahead.2.patch + aacpsy.c Line 308
pctx->frame_bits = FFMIN(2560, chan_bitrate * AAC_BLOCK_SIZE_LONG / ctx->avctx->sample_rate);
comment:62 by , 11 years ago
LOL, sorry, the VBR condition is backwards. An old idiocy of mine, I always reverse if conditions. Kinda like coding dyslexia.
It should be
#define _AAC_CUTOFF(bit_rate,channels,sample_rate) (bit_rate ? FFMIN3(FFMIN3( \ bit_rate/channels, \ 3000 + bit_rate/channels/2, \ 12000 + bit_rate/channels/8), \ 20000, \ sample_rate / 2): (sample_rate / 2)) #define AAC_CUTOFF(s) ( \ (s->flags & CODEC_FLAG_QSCALE) \ ? _AAC_CUTOFF((int)(s->bit_rate * (s->global_quality ? s->global_quality : 120) / 120.0), 2, s->sample_rate) \ : _AAC_CUTOFF(s->bit_rate, s->channels, s->sample_rate) \ )
Though I'm getting some weird results with very low quality settings.
follow-up: 65 comment:63 by , 11 years ago
Aren't you trying to access s->bit_rate when it's VBR? Or am I missing something?
follow-up: 66 comment:64 by , 11 years ago
Is s->global_quality different from VBR number -q:a x?
LAME equivalent | Stereo Bitrate | VBR number | Recommended cutoff |
---|---|---|---|
16 | 0.029 | 4000 | |
-V9.9 | 32 | 0.053 | 7000 |
48 | 0.097 | 9000 | |
-V9 | 64 | 0.23 | 11000 |
-V8 | 80 | 0.43 | 13000 |
-V7 | 96 | 0.55 | 15000 |
-V6 | 112 | 0.66 | 15500 |
-V5 | 128 | 0.86 | 16000 |
144 | 1.06 | 16500 | |
-V4 | 160 | 1.17 | 17000 |
-V3 | 176 | 1.29 | 17500 |
-V2 | 192 | 1.43 | 18000 |
-V1 | 224 | 2.2 | 19000 |
-V0 | 256 | 4.3 | 20000 |
288 | 6.2 | 20000 | |
320 | 7 | 20000 | |
352 | 7.7 | 20000 | |
384 | 10 | 20000 |
follow-up: 68 comment:65 by , 11 years ago
Replying to Kamedo2:
Aren't you trying to access s->bit_rate when it's VBR? Or am I missing something?
Yes, bit_rate in that case holds the default of 128kbps. Psy does the same, but it works well since that's considered to be AAC's transparent rate. So, for VBR, you make psy work at transparent settings, and compensate bit allocation based on RD scaling.
comment:66 by , 11 years ago
comment:67 by , 11 years ago
I think I finally got VBR to talk to psy.
It's looking good. I'll post an updated patch with all this in a while (still lots of tests to perform)
comment:68 by , 11 years ago
Replying to klaussfreire:
Yes, bit_rate in that case holds the default of 128kbps. Psy does the same, but it works well since that's considered to be AAC's transparent rate.
AAC is not transparent in 128kbps stereo, although Apple used to advertise that way. http://d.hatena.ne.jp/kamedo2/20111029/1319840519
comment:69 by , 11 years ago
Replying to klaussfreire:
#define _AAC_CUTOFF(bit_rate,channels,sample_rate) (bit_rate ? FFMIN3(FFMIN3( \ bit_rate/channels, \ 3000 + bit_rate/channels/2, \ 16000 + bit_rate/channels/8), \ 20000, \ sample_rate / 2): (sample_rate / 2)) #define AAC_CUTOFF(s) ( \ (s->flags & CODEC_FLAG_QSCALE) \ ? _AAC_CUTOFF(s->bit_rate, s->channels, s->sample_rate) \ : _AAC_CUTOFF((int)(s->bit_rate * (s->global_quality ? s->global_quality : 120) / 120.0), 2, s->sample_rate) \ )I find it works better, the other was was pretty dull for 64k/ch, which ought to be transparent for AAC. This one also works on VBR.
The high cutoff causes trouble for whitenoise.flac below 55kbps.
And I'm almost certain 16kHz is optimal at 128kbps stereo.
http://d.hatena.ne.jp/kamedo2/20120221/1329845124
http://d.hatena.ne.jp/kamedo2/20120729/1343545890
comment:70 by , 11 years ago
I recommend psymodel.h line 24 to be:
#include "libavutil/libm.h" #include "avcodec.h" /** maximum possible number of bands */ #define PSY_MAX_BANDS 128 /** maximum number of channels */ #define PSY_MAX_CHANS 24 #define _AAC_CUTOFF(bit_rate,channels,sample_rate) (bit_rate ? FFMIN3(FFMIN3( \ bit_rate/channels/2, \ 3000 + bit_rate/channels/4, \ 12000 + bit_rate/channels/16), \ 20000, \ sample_rate / 2): (sample_rate / 2)) #define AAC_CUTOFF(s) ( \ (s->flags & CODEC_FLAG_QSCALE) \ ? _AAC_CUTOFF(((int)(135000.0f*sqrtf(s->global_quality ? s->global_quality/120.0f : 1.0f))), 2, s->sample_rate) \ : _AAC_CUTOFF(s->bit_rate, s->channels, s->sample_rate) \ )
In this way, I can set cutoff to VBR modes as well.
PSY_MAX_CHANS 24 is to accommodate NHK 22.2ch.
I notice that in -q:a 0.2 and -q:a 0.4, the lower freq is in trouble. It sounds like a thunder far away.
comment:71 by , 11 years ago
Yes, I'm fixing the lower frequency right now. It's a matter with tonal band priorization that in VBR doesn't really work as intended. I'm preparing a better patch now. I'll test your cutoffs.
comment:72 by , 11 years ago
After applying the new LPF at comment:70, the result bitrate of music changed a bit. I think I have to replot the graph. And one more problem. -q:a 0.029 or -q:a 10 is unfriendly for an average user. I think the value should be roughly equivalent of LAME. I mean, if one use -q:a 2, the result of average sound is roughly 96kbps/channel, which is the same behavior as LAME -V2. Is applying new LPF method comment:51 easy?
comment:73 by , 11 years ago
After two days of toying around, the butterworth filter used in psy is actually counterproductive. Keeping all things equal, lowering the cutoff actually increases bitrate, if a fixed RD is forced. So, for VBR, it's a no-no.
I'm trying an FFT-based LP by simply zeroing coeffs, with care at the boundary to minimize ripple, and it seems to work a lot better, at least for VBR.
Right now, the implementation is just a POC. It's very dirty. But I'm getting convinced this is the way for VBR... and maybe for ABR too. I'm not sure.
Edit: And, to boot, an FFT is phase-linear. I can actually hear group delay with the butterwroth. Ugly.
comment:74 by , 11 years ago
Is that FFT, not MDCT?
I'm guessing that lowering the cutoff increases the bitrate is the effect of comment:32. Very strange, as HF contents usually takes up more bits, but it makes sense.
comment:75 by , 11 years ago
You're right, the one I have done right now is MDCT, because it's done within the bit allocator. But I've been meaning to implement an actual FFT filter later on, if not too hard, and if the technique pans out.
comment:76 by , 11 years ago
Thing is, the butterworth doesn't really remove that much content, and it changes the masking thresholds in a way that actually requires more bits to encode. A higher-order butterworth might work, but it would have way too much group delay.
comment:77 by , 11 years ago
BTW, wait before you redo that graph, I have a much better VBR patch almost ready.
comment:78 by , 11 years ago
Alright, i'm attaching a new VBR patch. CBR/ABR shouldn't have changed (shouldn't, but might). I will probably want to apply the same logic to CBR/ABR as well, since it works very well (ie: cutoff not with a filter but with the bit allocator, stop spending bits on HF if we're starving for bits).
A heads-up: VBR's q-to-kbps curve has changed, and there's some artifacts that sound like scratchy noises (especially audible in the sine sample), that are due to clipping. I think it's not specific to this patch, but I just noticed it. I'm not sure how to attack it. Normally, I'd apply compression on the IMDCT stage, but since that's on the decoder side, I'll probably have to find a clever way to predict clipping on the encoder and compensate. Craptastic.
Anyway, I do think VBR has been greatly improved on this patch. Let me know what you think.
by , 11 years ago
Attachment: | aac-improvements-wip-v3-vbr.patch added |
---|
VBR improvements over wip-v2-rclookahead
comment:79 by , 11 years ago
I believe your latest patch contains trailing whitespace (that cannot be committed to FFmpeg git), consider running tools/patcheck over the diff.
comment:80 by , 11 years ago
I successfully applied the patch from latest git head N-54889-g47d57f2.
comment:81 by , 11 years ago
comment:82 by , 11 years ago
Yeah it seems to have an anomaly around 1. I had only tested whitenoise up to 0.7. I'll try to patch it up.
comment:83 by , 11 years ago
Ah, yeah, I know. It's probably the scaler offset. It must be unpredictable in whitenoise because of how flat the envelope is.
follow-up: 86 comment:84 by , 11 years ago
I don't recommend to ambitiously try to save the HF content above 18kHz when there are enough bits. It sounds unstable. Some 1990s early MP3 encoders had the tactic, but none of them were good. Rather, clean, fixed LPF should be applied at all time. Avoid the situation that one can hear the 12-20kHz content in some part of the music, and hearing the dull 12kHz LPF-like sound in the other part of the music.
As for
pctx->frame_bits = FFMIN(2560, chan_bitrate * AAC_BLOCK_SIZE_LONG / ctx->avctx->sample_rate);
do we get more stable results when the number 2560 is lowered?
(240kbps is a 'megadose' or 'overkill' bitrate for AAC, so slight degradation is not a major problem.)
comment:85 by , 11 years ago
follow-up: 87 comment:86 by , 11 years ago
Replying to Kamedo2:
I don't recommend to ambitiously try to save the HF content above 18kHz when there are enough bits. It sounds unstable. Some 1990s early MP3 encoders had the tactic, but none of them were good. Rather, clean, fixed LPF should be applied at all time. Avoid the situation that one can hear the 12-20kHz content in some part of the music, and hearing the dull 12kHz LPF-like sound in the other part of the music.
I just want to preserve the HF component of transients. There might be better ways of doing that. I guess I'll keep iterating on it. However, I believe the way it's being done now works well. If you check, the LP cutoff is chosen from the allocation given by psy. Psy contains bit reservoir logic, which means it will momentarily increase bits (and cutoff) for some difficult transients. Right now, it works wonders for hi-hats.
I will probably have to be stricter about the cutoff, though. As you say, when the signal by itself (not by psy's indication, but signal strength alone) suddenly jumps in HF content, the result is unpleasant. I think I have cleaned up most of those cases, but who knows. It's hard to discern those from actual transients.
As for
pctx->frame_bits = FFMIN(2560, chan_bitrate * AAC_BLOCK_SIZE_LONG / ctx->avctx->sample_rate);do we get more stable results when the number 2560 is lowered?
(240kbps is a 'megadose' or 'overkill' bitrate for AAC, so slight degradation is not a major problem.)
If it doesn't limit the ability to increase allocation for transients, it might. I'll look into it.
follow-up: 88 comment:87 by , 11 years ago
Replying to klaussfreire:
I just want to preserve the HF component of transients. There might be better ways of doing that. I guess I'll keep iterating on it. However, I believe the way it's being done now works well. If you check, the LP cutoff is chosen from the allocation given by psy. Psy contains bit reservoir logic, which means it will momentarily increase bits (and cutoff) for some difficult transients. Right now, it works wonders for hi-hats.
So, if there is a group of beat sounds that is on the threshold of tonal/transients, the LPF is sometimes on and sometimes off? Currently, the on/off switch itself is audible and is quite annoying. It sounds like a stopwatch.
I will probably have to be stricter about the cutoff, though. As you say, when the signal by itself (not by psy's indication, but signal strength alone) suddenly jumps in HF content, the result is unpleasant. I think I have cleaned up most of those cases, but who knows. It's hard to discern those from actual transients.
ffmpeg_aacvbr_pulse1.flac at -q:a 0.25 produces strange HF sounds.
by , 11 years ago
Attachment: | ffmpeg_aacvbr_pulse2.flac added |
---|
Partial white noise, splitted by 256th lanczos filter. HF pulse noise that sounds like stopwatch is added in VBR around -a:q 0.3
comment:88 by , 11 years ago
Replying to Kamedo2:
Replying to klaussfreire:
I just want to preserve the HF component of transients. There might be better ways of doing that. I guess I'll keep iterating on it. However, I believe the way it's being done now works well. If you check, the LP cutoff is chosen from the allocation given by psy. Psy contains bit reservoir logic, which means it will momentarily increase bits (and cutoff) for some difficult transients. Right now, it works wonders for hi-hats.
So, if there is a group of beat sounds that is on the threshold of tonal/transients, the LPF is sometimes on and sometimes off? Currently, the on/off switch itself is audible and is quite annoying. It sounds like a stopwatch.
No, the cutoff moves up and down, but the LP remains on.
I'll have to check the sample
follow-up: 90 comment:89 by , 11 years ago
You seems to be using the heuristics that transients HF components are loud and tonal HF components are quiet.
comment:90 by , 11 years ago
Replying to Kamedo2:
You seems to be using the heuristics that transients HF components are loud and tonal HF components are quiet.
No, I let psy detect the transients. The only heuristic, is that I attempt to encode a little bit more of the HF with decreased quality.
Ie, from 0-cutoff, normal quantization. From cutoff-cutoff * 1.2, coarse (progressively coarser in fact) quantization. Now, I let bit allocation zero out beyond 1.2. I may have to force it to avoid the artifacts you mention.
comment:91 by , 11 years ago
Seeing the spectrogram, sometimes, up to 22kHz is encoded. No way we can hear that high. However, because of your algorithm, the cutoff seems to be much higher than it actually is, and the sound is much clearer in typical cases. But we have to be careful of exceptions. I think I feel strange when the encoded_highest_sound - normal_cutoff is more than 3kHz. Sounds something like plip, plip. Is coarse quantization at cutoff~cutoff*1.2 applied only to transients?
comment:92 by , 11 years ago
No, that's applied to tonal signals as well. A way to squeeze a little extra bandwidth. It proved to be a winning move for music, though I didn't test that much with noise.
follow-up: 95 comment:93 by , 11 years ago
Is that included in a wip-v3-vbr.patch, or a new feature? It sounds like the extra HF content encode is only on transients. And some transients are indeed encoded up to 22kHz.
Are HF contents over cutoff*1.2 totally discarded? (I believe this is the best move.)
comment:94 by , 11 years ago
The LAME sometimes acts like your algorithm, but within 2kHz or so. It's related to -Y switch, and LAME sometimes encodes 16~18kHz contents.
comment:95 by , 11 years ago
Replying to Kamedo2:
Is that included in a wip-v3-vbr.patch
Yes
Are HF contents over cutoff*1.2 totally discarded? (I believe this is the best move.)
No, and maybe that's the problem. 1.2 just happens to be the point at which the increased quantization floor starts zeroing out all components. Until that, RD optimization brings down the quantization floor to maintain acceptable quality, so you don't notice the floor rising (and it fact it doesn't for fully tonal bands, that's what RD optimization is about, whereas it does rise for noisy ones).
So, in essence, up to cutoff * 1.2, tonal components are retained at the expense of HF noise, which seems like a sensible tradeoff.
What must be happening, is that, on some signals, the zeroing point happens above 1.2, significantly above. So it's perhaps wise to hardcode that 1.2 value, and force a zero on those bands instead.
comment:96 by , 11 years ago
I think we should hardcode min(cutoff+2500, cutoff*1.2). When cutoff is 18kHz, cutoff*1.2 is 21.6kHz which is too high. Could you provide the relation between -q:a value and cutoff so we can have better grasp on what's happening?
comment:97 by , 11 years ago
So, I tracked the anomaly near -q:a 1 to the ESC_BT codebook. It seems when noise floors are too low, the coefficients can't be properly encoded, and all kinds of bad things ensue. I'll see how to fix it.
comment:98 by , 11 years ago
I noticed that this new VBR encoder has zero delay. ABR encoder at 64kbps stereo has 1 sample delay. Probably because the lack of the butterworth LPF.
comment:99 by , 11 years ago
That's why I want to get rid of the butterworth. It's good, but FFT is better, since it's phase-linear. With all the quantization noise I don't think we care that much about ripple, but even if we did, FFTs can be made to minimize it.
comment:100 by , 11 years ago
I think I can start the blind test from August 3rd. With the results, we can overwrite the outdated FFmpeg AAC Encoding Guide. https://trac.ffmpeg.org/wiki/AACEncodingGuide
comment:101 by , 11 years ago
Is the comment:97 fixable? I think it will contribute to higher quality in 160kbps and 192kbps. Currently, it is still worse than the mighty Apple AAC.
I assume most blocks are long(1024 samples) tonal blocks, and short, transient blocks are rare, that are apparently causing problems, am I right?
comment:102 by , 11 years ago
Yes, I have a fix in the works. That limitation is the reason the standard limits allocation to 3000 bits, most likely.
comment:103 by , 11 years ago
Isn't aaccoder.c line 787~795 strange? I believe somewhere making cutoff value or using cutoff value should be the source of the trouble, which causes weird sounds in low bitrates such as -q:a 0.25.
comment:104 by , 11 years ago
So, I tried a whole new approach, and it seems vastly superior.
I modified psy's "Rate control" to work differently for VBR. Instead of using the bit reservoir, it just computes the optimum PE and scales it by quality. And it works nicely. I still had to push scalers a bit more on the allocator and do the LP filtering to reach the very low bit rates with VBR, but it's sounding a lot better.
I'll do some more testing and then upload the updated patch.
comment:106 by , 11 years ago
I inserted
av_log(NULL, AV_LOG_DEBUG, "\n cutoff=%d, lambda=%f, frame_bit_rate=%d, bandwidth=%d\n",cutoff,lambda,frame_bit_rate,bandwidth);
in aaccoder.c twoloop line 795, and found cutoff differs between different frames. I used -q:a 0.4, stereo 44.1kHz. I assume <99 cutoffs are the short blocks and 500< cutoffs are the long tonal blocks. The cutoff varies throughout the same music. 11.7k~13.6k for the short blocks, 11.5k~13.2k for the long blocks. (Calculated from the 25 raw examples below)
cutoff=77, lambda=47.000000, frame_bit_rate=46034, bandwidth=14508 cutoff=614, lambda=47.000000, frame_bit_rate=45648, bandwidth=14412 cutoff=76, lambda=47.000000, frame_bit_rate=45648, bandwidth=14412 cutoff=612, lambda=47.000000, frame_bit_rate=45417, bandwidth=14354 cutoff=76, lambda=47.000000, frame_bit_rate=45417, bandwidth=14354 cutoff=532, lambda=47.000000, frame_bit_rate=37937, bandwidth=12484 Last message repeated 1 times cutoff=538, lambda=47.000000, frame_bit_rate=38477, bandwidth=12619 Last message repeated 1 times size= 242kB time=00:00:15.80 bitrate= 125.2kbits/s cutoff=68, lambda=47.000000, frame_bit_rate=39017, bandwidth=12754 cutoff=544, lambda=47.000000, frame_bit_rate=39017, bandwidth=12754 cutoff=548, lambda=47.000000, frame_bit_rate=39402, bandwidth=12850 Last message repeated 1 times cutoff=551, lambda=47.000000, frame_bit_rate=39711, bandwidth=12927 Last message repeated 1 times cutoff=554, lambda=47.000000, frame_bit_rate=39942, bandwidth=12985 Last message repeated 1 times cutoff=69, lambda=47.000000, frame_bit_rate=40173, bandwidth=13043 cutoff=556, lambda=47.000000, frame_bit_rate=40173, bandwidth=13043 cutoff=69, lambda=47.000000, frame_bit_rate=40405, bandwidth=13101 cutoff=558, lambda=47.000000, frame_bit_rate=40405, bandwidth=13101 cutoff=561, lambda=47.000000, frame_bit_rate=40636, bandwidth=13159 Last message repeated 1 times cutoff=562, lambda=47.000000, frame_bit_rate=40713, bandwidth=13178 Last message repeated 1 times cutoff=71, lambda=47.000000, frame_bit_rate=41870, bandwidth=13467 cutoff=574, lambda=47.000000, frame_bit_rate=41870, bandwidth=13467 cutoff=79, lambda=47.000000, frame_bit_rate=47653, bandwidth=14913 Last message repeated 1 times cutoff=78, lambda=47.000000, frame_bit_rate=46651, bandwidth=14662 Last message repeated 1 times cutoff=76, lambda=47.000000, frame_bit_rate=45031, bandwidth=14257 Last message repeated 1 times [output stream 0:0 @ 04adab60] EOF on sink link output stream 0:0:default. No more output streams to write to, finishing. cutoff=75, lambda=47.000000, frame_bit_rate=44337, bandwidth=14084 Last message repeated 1 times cutoff=68, lambda=47.000000, frame_bit_rate=39711, bandwidth=12927 Last message repeated 1 times [aac @ 04aaf580] Trying to remove 504 more samples than there are in the queue size= 253kB time=00:00:16.10 bitrate= 128.9kbits/s video:0kB audio:250kB subtitle:0 global headers:0kB muxing overhead 1.475195% 755 frames successfully decoded, 0 decoding errors [AVIOContext @ 04ad0440] Statistics: 30 seeks, 779 writeouts [AVIOContext @ 04d6f8a0] Statistics: 3123324 bytes read, 2 seeks
ffmpeg54890g.exe -v 9 -loglevel 99 -i ffmpeg_aacvbr_pulse2.wav -c:a aac -strict experimental -q:a 0.4 ffmpeg_aacvbr_pulse2.mp4
I tried to automate it by batch script, including preserving the av_log output but somehow it freezes.
comment:107 by , 11 years ago
Don't worry, for the new patch I'm using refbits instead of destbits, refbits is a direct derivation of lambda, so it won't change. I couldn't make the changing bandwidth work in a stable fashion without a lot more work, so I'll reserve that for a further patch, maybe.
comment:110 by , 11 years ago
Patience. Later today, or perhaps tomorrow, depending on your time zone
comment:111 by , 11 years ago
Damn. The patch works wonderfully well in VBR, but breaks CBR. I'll have to look into it during the weekend.
Patience indeed.
comment:112 by , 11 years ago
Yes, the VBR sounds dull and is currently(at v3) poorer than CBR, and it should have a lot of room to improve.
comment:113 by , 11 years ago
I've encoded weeks of AACs using v3 patch, using diverse samples and diverse bitrates and there were no problem(empty files, return with errors, freezes).
comment:114 by , 11 years ago
klaussfreire, could you provide the VBR-only patch? I'd like to test it. I may be able to detect the problem(s).
by , 11 years ago
Attachment: | aac-improvements-wip-v4-vbr.patch added |
---|
Improved VBR, fixed psy threshold reduction bug
comment:115 by , 11 years ago
Attached the current WIP.
An explanation of what caused the bug for high q values: there was a bug in psy's threshold reduction for hole avoidance. When a second pass was needed, it would accumulate errors due to a simple typo (reduction += instead of reduction =).
I don't have the 3GPP spec to check, but I just noticed the code made no sense with the +=, but did with =.
Then there's the ESC_BT thing.
I think most serious anomalies have been fixed in this bug, I haven't had time to properly test CBR, but it seems to mostly work now. That was very subtle bit reservoir a bug on my "lookahead" patch that didn't surface until I fixed psy.
Anyway, I still would like to make VBR achieve lower bitrates without having to resort to LP filtering. I somehow sense it should be possible. In any case, I made CBR also use the same scalefactor-band-based LP filtering to remove the need for the butterworth that didn't save many bits anyway, and now it responds to the -cutoff argument, so if you don't like the default cutoff you can override yourself. It seemed worth parameterizing since I've found some sources that sound better at low bit rates with higher cutoffs, and some that don't. So it's source-dependent.
Anyway, enjoy the patch, I'm not sure I'll have time to work on a more permanent (one that I'd push to trunk) one till next weekend.
comment:116 by , 11 years ago
Yes, the cutoff is quite source-dependent, and listener-dependent too. Older people may prefer lower cutoffs. BTW, I'm 25 yrs old.
comment:118 by , 11 years ago
? (refbits * 1.6f * avctx->sample_rate / 1024)
to
? (refbits * 2.5f * avctx->sample_rate / 1024)
raises the LPF and the sound is much clearer(at the cost of more noise, but it's certainly better per real bitrate).
I feel the sound is bad in only tonal part of the music in VBR. And this encoder uses fewer bits, sometimes nearly half less, for the tonal part, unlike Opus, which has a distinctive tonality boost function.
comment:119 by , 11 years ago
Yes, I was in the middle of tweaking rdlambda scale for VBR (which is what gives the tonality boost). It seems way off target for VBR, since a lambda that in VBR results in 64kbps, in CBR it will give you about 32 or less.
With that properly tweaked, we can save lots of bits from noisy bands and put them to better use on tonal bands. For VBR, that means lower bitrates for the same quality level.
Increasing cutoff like you did there has the unwanted side effect of lowering quality a bit too much on tonal bands, for a set file size. I do my tests by searching through -q:a until I get a file roughly the same size as a reference CBR-encoded version, and comparing quality among those. With higher cutoffs, that procedure resulted in noticeable distortion on the HF bands, which is why I left it at 1.6, and it's what I believe will be fixed by tweaking rdlambda for VBR.
It can also be fixed by implementing codebook 13. But that's for another (future, way future) patch, since I see no easy way to implement CB 13 with twoloop, so I'll have to rewrite it.
comment:120 by , 11 years ago
This paper, fig. 6 shows bit allocation curves, although this is Opus.
http://jmvalin.ca/papers/aes135_opus_celt.pdf
comment:122 by , 11 years ago
Is aaccoder.c line 829:
if (start >= cutoff || band->energy <= (band->threshold * zeroscale) || band->threshold == 0.0) {
correct? Not start >= cutoff+cutoff/5?
comment:123 by , 11 years ago
Yep, the cutoff is used as-is in this patch, the offset is already accounted for in its computation above that.
follow-up: 130 comment:124 by , 11 years ago
I've encoded weeks of AACs using v4 patch, using diverse samples and diverse bitrates and there were no problem(empty files, return with errors, freezes).
Is 'tweaking rdlambda for VBR' ready? If not, I think I should test v4 ABR first, because it's stable, have less artifacts in tonal samples. The blind test will be conducted in ABC/HR methodology, and there should be some opponents. I'm thinking of...
- current git head with no patch, abr
- v4 patch(or anything latest), abr
- fdk-aac, abr
The bitrate will be 96kbps and 128kbps.
comment:125 by , 11 years ago
Or, I can drop fdk-aac and instead test on 3 bitrates. Do you have any idea?
comment:127 by , 11 years ago
Replying to cehoyos:
Comparing with libfaac would be useful...
Is comment:69 not enough? (The test was in 2012 July.)
comment:128 by , 11 years ago
I thought that additional improvements were made since (and if ffaac does not beat libfaac and assuming fdk-aac beats libfaac, it might make more sense to compare with libfaac) but please don't let me misguide you.
comment:129 by , 11 years ago
I don't think many people will use libfaac. Both libfaac and libfdk_aac are non-free, and if many people prefer fdk-aac over faac, the new results of the new fdk-aac is more interesting than the another results of the old faac. (As far as I know, there are no blind test of fdk-aac.)
comment:130 by , 11 years ago
comment:131 by , 11 years ago
This is not my last test, and for a desire to compare this encoder with other encoders, I can do so later. By that time, I hope the new VBR is the state-of-the-art encoder.
comment:132 by , 11 years ago
I'm going to use these 20 samples below. There are six opponents(the first 3 are 96kbps, and the last 3 are 128kbps), so I have to score 6*20=120 sounds. The test is ready.
http://www.hydrogenaudio.org/forums/index.php?showtopic=98003
comment:133 by , 11 years ago
Hi All,
Great to see that the native AAC encoder is getting some attention, and trying to make it mainstream. Using Windows 7 and Zeranoe's FFmpeg builds, I only get a choice of "The Native Encoder" or "libvo_aacenc".
From what I have read "libvo_aacenc" only seams to support sterio not 5.1 or higher.
I am no audiophile and a little hard of hearing so I cannot find fault with the Native Encoder but I can tell the difference between 2 and 6 channels :-)
Keep up the good work on a great piece of software.
Regards,
Mark
comment:134 by , 11 years ago
ffmpeg55212 -y -i input.wav -c:a aac -strict experimental -b:a 96k output.mp4 ffmpeg55212_patchv4 -y -i input.wav -c:a aac -strict experimental -b:a 96k output.mp4 ffmpeg55212 -y -i input.wav -c:a libfdk_aac -b:a 96k -afterburner 1 output.mp4 ffmpeg55212 -y -i input.wav -c:a aac -strict experimental -b:a 128k output.mp4 ffmpeg55212_patchv4 -y -i input.wav -c:a aac -strict experimental -b:a 128k output.mp4 ffmpeg55212 -y -i input.wav -c:a libfdk_aac -b:a 128k -afterburner 1 output.mp4 faad -b 4 -o output.float.wav output.mp4
The ABC/HR test is ongoing. These six outputs were shuffled and I listen to them without knowing which is which. I've done 2 samples out of 20. 10% done.
by , 11 years ago
Attachment: | fdkaac_10_12.zip added |
---|
by , 11 years ago
Attachment: | fdkaac_13_16.zip added |
---|
samples # 10 - # 12 encoded by fdkaac. *2.mp4 are the 128kbps samples, the others are the 96kbps samples.
comment:138 by , 11 years ago
I think I've found the source of most of the "annoying" artifacts. With the recent fix to psy's hole avoidance, lots of the rate control hacks in the lookahead code are no longer necessary, since the bit reservoir now actually works. Though if I do completely disable them, the target bit rate is largely missed, so some RC stuff is still needed.
In short, RC hacks screw up on transients. I guess I'll have to explicitly limit RC hacks to non-transients (with perhaps some hysteresis). I'm working on a v5 fixing that.
Still, to get to fdk quality, I think we'll need to fix M/S encoding (which still has some artifacts, if it didn't, it can be a big efficiency bost) and implement codebook 13 (which fdk seems to use, though I haven't confirmed this). That's a much bigger project though.
comment:139 by , 11 years ago
Great, I'm guessing it's the reason why some samples got much poorer results than the fdk. Should I abort the v4 abr test and instead test on v5 after the release of 5? I'm on holiday now, but after August 26th, I'll move to more quiet place, so I can test more effectively.
comment:140 by , 11 years ago
I think I'll get you the v5 soonish, but I have an office to move this weekend so it may not be as soon as you'd like. In any case, soonish.
comment:142 by , 11 years ago
Replying to Kamedo2:
How is the development of v5?
Sorry, urgent personal issues prevented me from reaching my self-imposed deadline. I'll try to dedicate some time to it as soon as I'm able, though. Next post ought to be a patch.
follow-up: 144 comment:143 by , 11 years ago
I resumed the ABC/HR test, and I've done 13 samples out of 20. How is the development going?
comment:144 by , 11 years ago
Replying to Kamedo2:
I resumed the ABC/HR test, and I've done 13 samples out of 20. How is the development going?
Stalled for now, but I'll be able to resume soon
comment:145 by , 11 years ago
Cc: | added |
---|
comment:147 by , 11 years ago
Yes, please do. I'll make sure to address those concerns as well, and we'll save one round trip
comment:148 by , 11 years ago
You can download the original sound here. http://www.hydrogenaudio.org/forums/index.php?showtopic=98003
comment:149 by , 11 years ago
Oops, -b:a 128k, not -b:a 96k in the 128kbps exp+v4 column.
By the way, why is the FFT used in LPF? Couldn't it use MDCT and simply zeroing higher coefficients? Maybe I am missing something.
follow-up: 151 comment:150 by , 11 years ago
I'll finish the test soon(16/20, 80%). What should be the next opponents in the next blind listening test including the newer patch? I'm thinking of...
- current git head with no patch, abr
- next patch, abr
- next patch, vbr
- fdk-aac, abr
and possibly...
- libopus, vbr
- libmp3lame, vbr
Do you have any idea?
follow-up: 152 comment:151 by , 11 years ago
Replying to Kamedo2:
and possibly...
- libopus, vbr
- libmp3lame, vbr
Do you have any idea?
If you have time, it would be interesting to compare to the quality of other FFmpeg audio encoders, ie ac3, eac3 and mp2.
follow-up: 153 comment:152 by , 11 years ago
Replying to cehoyos:
If you have time, it would be interesting to compare to the quality of other FFmpeg audio encoders, ie ac3, eac3 and mp2.
It may be wrong, but I guess the ac3 is the most used variant. The bitrate will be around 128kbps, so the extremely high bitrate of eac3 will not fit the frame, I think. Are there some important use of eac3 and mp2, other than the BD and VCD encoding? (For BD the space is huge and quality at lower bitrate is insignificant.)
comment:153 by , 11 years ago
Replying to Kamedo2:
Replying to cehoyos:
If you have time, it would be interesting to compare to the quality of other FFmpeg audio encoders, ie ac3, eac3 and mp2.
It may be wrong, but I guess the ac3 is the most used variant. The bitrate will be around 128kbps, so the extremely high bitrate of eac3 will not fit the frame,
I am not sure I understand you.
Afaik, nobody ever made a listening test using different internal FFmpeg encoders (not even a very cursory one). It would be interesting to know that "96kb eac3 ~ 128 kb ac3 ~ 128kb aac ~ 256kb mp2" (I assume this isn't the case, just as an example). Even if done with much less effort than your above tests (if you just mention your impression of each encoder after a few tests), I believe this would be interesting information.
It was sometimes claimed that the wma encoders produce abysmal quality, so your comment on them (possibly with higher bitrates) would also be welcome.
I think. Are there some important use of eac3 and mp2, other than the BD and VCD encoding? (For BD the space is huge and quality at lower bitrate is insignificant.)
I believe that ac3 is a very important codec (WMP plays it out-of-the-box in different containers), knowing if eac3 beats it would be interesting.
comment:154 by , 11 years ago
I don't think of any good use of eac3, other than for BD. BD can have 32Mbps, and eac3 can have up to 6144kbps. If audio quality matters, simply use the maximum bitrate. And having more opponents in parallel slow down the test. However, we need a low anchor and possibly a high anchor. I think libopus will act as a high anchor and aac without patch act as a low anchor.
There are some good uses of wma, such as encoding for an old car stereo that plays MP3/WMA, but WMAEncode 0.2.9b is far more usable. The quality is in between LAME and Apple AAC.
follow-up: 204 comment:155 by , 11 years ago
This document recommends to use -cutoff 15000 option. Too outdated, the cutoff is automatically applied since July 2012.
http://ffmpeg.org/ffmpeg-codecs.html#aac
This is the data I sent in 2012.
By the way, the progress of the listening test is 95%(19/20) now.
comment:156 by , 11 years ago
I finished the test and I uploaded the results.
http://www.hydrogenaudio.org/forums/index.php?showtopic=102699
by , 11 years ago
Attachment: | aac-improvements-wip-v5.patch added |
---|
V5 patch, twoloop RD fixed (I think)
comment:158 by , 11 years ago
So, I attached a patch that moves in the right direction (I think).
Most of the worse-performing samples, I noticed, had to do with hole avoidance being quickly violated when using low bit rates. So I re-did twoloop's RD improvement step to better respect hole avoidance, to be asymmetric in its scale manipulation (ie: to avoid adding all 1 or all 2, which would be quickly undone by the bitrate adjustment step), and everything seemed to work a lot better.
However, on the "asymmetric" little word, there's a huge hack involved. I wouldn't want to waste your time without a warning: this hack can most assuredly be improved. But I don't think I'll waste time improving a hack, since the real solution is to implement a dynamic programming coder, which I intend to do in the future. So while hackish and probably suboptimal, I'll probably leave it as-is since it works well enough.
I haven't tested VBR much. From what I tested, it seems mostly unharmed, but it still needs a better calibrated cutoff. That will take time (lets say it'll be v6).
So, this patch should be good enough for ABR. VBR will need a v6, and some day (time permitting) I'll post the patch with the dynamic coder.
I couldn't quite match FDK performance, but I suspect there's two reasons for this. First, M/S coding isn't as good as it should be. And 2, FDK probably uses a dynamic coder. So I think we'll catch FDK with the dynamic coder (which can also do the M/S part, so it'll fix both with one shot).
However, I tested most of the samples in your session, and they've all improved. Some more than others, of course. So, if not all the samples, you might want to retest the worst offenders.
Edit: I also haven't tested higher bit rates. I will tomorrow.
follow-up: 162 comment:159 by , 11 years ago
The v5 patch is encoding at 15-50x realtime, depending on bitrate and type of music encoded.
comment:160 by , 11 years ago
I changed aaccoder.c line 806 from
? (refbits * 1.6f * avctx->sample_rate / 1024)
to
? (refbits * 2.4f * avctx->sample_rate / 1024)
This is certainly better, although exact optimal value is debatable.
I encoded 2 days of diverse sounds with many settings, and listened to 2 hours of the sounds. This encoder do a relatively good job even in abr 96kbps. It's not a blind test, but I feel the improvement. Also, I compared abr 128kbps vs vbr -q 0.3, but still, abr is better. The vbr exposes its weak point in relatively quiet, tonal sections. Low S/N and stronger LPF effect.
comment:161 by , 11 years ago
I listened to about 8 hours of songs, movies, sine and white noise, and 5.1ch surround source. I'd say that abr is mature.
klaussfreire, could you add a "redirect" feature that when set bitrate is too high, redirect to the maximum bitrate possible, rather than to print the error message and stop. This simplify many batch encodes, including when encoding from hundreds of videos that have various audio frequencies and number of channels. Currently it gets:
[aac @ 013efa60] Too many bits per frame requested
Also, I notice that this commandline
ffmpeg -i ffmpeg_aacvbr_pulse1.wav -c:a aac -strict experimental -q:a 0.1 -ar 8000 -ac 1 ffmpeg_aacvbr_pulse1.mp4
gets the same Too many bits warning, and lowering the quality -q:a don't work. It only works when using -b:a, or setting higher frequency such as -ar 22050. It could be a problem when encoding from a video taken by some old digital cameras with 8kHz pcm audio attached.
The error message:
ffmpeg56470.exe -y -i ffmpeg_aacvbr_pulse1.wav -c:a aac -strict experimental -q:a 0.3 -ar 8000 ffmpeg_aacvbr_pulse1.mp4 ffmpeg version N-56469-gf6622f9 Copyright (c) 2000-2013 the FFmpeg developers built on Sep 20 2013 15:29:55 with gcc 4.8.1 (GCC) configuration: --enable-gpl --enable-version3 --enable-nonfree --enable-libfdk -aac --extra-ldflags=-static --extra-cflags='-march=native -mfpmath=sse' --optfl ags=-O2 libavutil 52. 45.100 / 52. 45.100 libavcodec 55. 33.100 / 55. 33.100 libavformat 55. 18.100 / 55. 18.100 libavdevice 55. 3.100 / 55. 3.100 libavfilter 3. 86.102 / 3. 86.102 libswscale 2. 5.100 / 2. 5.100 libswresample 0. 17.103 / 0. 17.103 libpostproc 52. 3.100 / 52. 3.100 Guessed Channel Layout for Input Stream #0.0 : stereo Input #0, wav, from 'ffmpeg_aacvbr_pulse1.wav': Metadata: encoder : Coderium SoundEngine 4.59 Duration: 00:00:12.12, bitrate: 1411 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16 , 1411 kb/s [aac @ 030cbf00] Too many bits per frame requested Output #0, mp4, to 'ffmpeg_aacvbr_pulse1.mp4': Metadata: encoder : Coderium SoundEngine 4.59 Stream #0:0: Audio: aac, 8000 Hz, stereo, fltp, 128 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le -> aac) Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height
I think this is about time we remove the -strict experimental flag.
comment:162 by , 11 years ago
Replying to Kamedo2:
The v5 patch is encoding at 15-50x realtime, depending on bitrate and type of music encoded.
I believe I may have to disappoint you there. One of the optimizations that does that, is acting up on ABR, I noticed improved quality by restricting it, so the v6 with optimized VBR will have that disabled as well (and thus be a tad slower).
I thought that optimization was result-neutral, but it seems it isn't.
comment:163 by , 11 years ago
15x speed is 'tolerable' :)
I've encoded more than 50GB of mp4s, including surround 5.1ch with more than 1Mbps etc... and listened to 12 hours of mainly Pop music. v5 seems to be stable. Is fixing "Too many bits per frame requested" error easy?
comment:164 by , 11 years ago
I can make it only applicable when using ABR, but I think it's a useful message.
I could also turn it into a warning, I think.
comment:165 by , 11 years ago
I prefer warnings, rather than the error messages and stop. Kind, and easier to use.
By the way, I'll be free from September 28th, and I'm considering a listening test of
- v4 abr
- v6 abr
- v6 vbr
- fdk-aac vbr
- ac3 abr
- libmp3lame vbr
I've got a request of testing libfaac, mp2, and eac3, but I'm running out of the "slot".
From my normal non-blind listening of average music, my current impression is:
fdk-aac > libmp3lame > v5 abr >> v4 abr > v5 vbr > ac3
comment:166 by , 11 years ago
v5 vbr is still quite worse than the abr. I feel that whenever tonal sounds are there, the frequency bin around the tone degrades. Tones are poorer at hiding other sounds than the noise, that's why harpsichords remains to be one of the most critical and hardest instruments to code. http://wiki.hydrogenaudio.org/index.php?title=Perceptual_Noise_Substitution
follow-up: 168 comment:167 by , 11 years ago
Well, v6 is almost ready. I just need to clean it up a bit. I'll probably do that tonight.
In v6, my non-blind tests make me believe that v6 vbr > v6 abr > v5 abr
.
Not sure how you compare abr vs vbr, what I do is pick a file or set of files, do a binary search of the quality level that results in the same overall file size, and then compare. In that kind of test, v6 vbr sometimes requires lots more bits for some pathological files (techno seems to drive it crazy, can't blame it). I exclude those, since they're pathological.
When I push the patches to the ML, I'll make most of what makes v6 vbr go crazy on techno (the relatively high peak bit rate allowance) configurable anyway.
follow-up: 169 comment:168 by , 11 years ago
Replying to klaussfreire:
Not sure how you compare abr vs vbr, what I do is pick a file or set of files, do a binary search of the quality level that results in the same overall file size, and then compare. In that kind of test, v6 vbr sometimes requires lots more bits for some pathological files (techno seems to drive it crazy, can't blame it). I exclude those, since they're pathological.
I compare abr vs vbr by a graph. I plot a "q vs bitrate" graph over a "standard" set of large set of sounds I extracted from diverse CDs. Then, search a number of q that have the desired bitrate. Then, make sure that average tested sample bitrate isn't very far from the "standard" bitrate. This method is common in the hydrogenaudio.
http://listening-tests.hydrogenaudio.org/sebastian/mp3-128-1/index.htm
When I push the patches to the ML, I'll make most of what makes v6 vbr go crazy on techno (the relatively high peak bit rate allowance) configurable anyway.
I think it's a good idea to automatically "cap" the bitrate based on the q number. 3x of the "standard" bitrate of the q or something.
Also, I think it's beneficial for the end users to set the -q:a value and typically gets a file with the bitrate around the set value. If one sets -q:a 256k, one gets a file of roughly 256kbps. (Or 210kbps, 289kbps, etc based on the sound content, but that's fine.) iTunes have that interface, and it's easier to use. This can be controversial as people may refer to some old documents of -q:a option and try to do the same, but the problem can be avoided by moving to a "classic mode" when the value is very small, like -q:a 0.3.
follow-ups: 170 175 comment:169 by , 11 years ago
Replying to Kamedo2:
Replying to klaussfreire:
Not sure how you compare abr vs vbr, what I do is pick a file or set of files, do a binary search of the quality level that results in the same overall file size, and then compare. In that kind of test, v6 vbr sometimes requires lots more bits for some pathological files (techno seems to drive it crazy, can't blame it). I exclude those, since they're pathological.
I compare abr vs vbr by a graph. I plot a "q vs bitrate" graph over a "standard" set of large set of sounds I extracted from diverse CDs.
Yeah, I've seen those
Then, search a number of q that have the desired bitrate. Then, make sure that average tested sample bitrate isn't very far from the "standard" bitrate.
Just how do you check bit rate? Because I've noticed ffmpeg -i file
tends to give bogus rates when used on VBR-encoded files (not even average).
Also, I think it's beneficial for the end users to set the -q:a value and typically gets a file with the bitrate around the set value. If one sets -q:a 256k, one gets a file of roughly 256kbps.
That's not doable without refactoring ffmpeg. -q:a sets the global_quality parameter, which is specified to have a somewhat standardized interpretation (1.0 = 100%, what 100% means is what some other codec means by it, can't remember which OTOMH).
However, you can get (I think) a similar result by specifying both -q:a and -b:a, like so:
ffmpeg -i somefile.flac -c:a aac -b:a 256k -q:a 1 -strict experimental somefile.aac
Although that seldom gives you 256k. The bitrate there is like a lower bound (aim for 256k, spend more if needed).
follow-ups: 171 173 comment:170 by , 11 years ago
Then, search a number of q that have the desired bitrate. Then, make sure that average tested sample bitrate isn't very far from the "standard" bitrate.
Just how do you check bit rate? Because I've noticed
ffmpeg -i file
tends to give bogus rates when used on VBR-encoded files (not even average).
filesize[Byte]*8/Sample_length[Sec]
, But be careful of very short files, it can be bogus too.
Also, I think it's beneficial for the end users to set the -q:a value and typically gets a file with the bitrate around the set value. If one sets -q:a 256k, one gets a file of roughly 256kbps.
That's not doable without refactoring ffmpeg. -q:a sets the global_quality parameter, which is specified to have a somewhat standardized interpretation (1.0 = 100%, what 100% means is what some other codec means by it, can't remember which OTOMH).
Is LAME breaking the convention?
https://trac.ffmpeg.org/wiki/Encoding%20VBR%20%28Variable%20Bit%20Rate%29%20mp3%20audio
However, you can get (I think) a similar result by specifying both -q:a and -b:a, like so:
ffmpeg -i somefile.flac -c:a aac -b:a 256k -q:a 1 -strict experimental somefile.aacAlthough that seldom gives you 256k. The bitrate there is like a lower bound (aim for 256k, spend more if needed).
Thank you for the info. Your behavior seems much like the cvbr(most used mode), Apple iTunes.
comment:171 by , 11 years ago
Replying to Kamedo2:
Then, search a number of q that have the desired bitrate. Then, make sure that average tested sample bitrate isn't very far from the "standard" bitrate.
Just how do you check bit rate? Because I've noticed
ffmpeg -i file
tends to give bogus rates when used on VBR-encoded files (not even average).
filesize[Byte]*8/Sample_length[Sec]
, But be careful of very short files, it can be bogus too.
As long as you're not also estimating sample_length with ffmpeg, which will also give you bogus, it should be fine ;)
Also, I think it's beneficial for the end users to set the -q:a value and typically gets a file with the bitrate around the set value. If one sets -q:a 256k, one gets a file of roughly 256kbps.
That's not doable without refactoring ffmpeg. -q:a sets the global_quality parameter, which is specified to have a somewhat standardized interpretation (1.0 = 100%, what 100% means is what some other codec means by it, can't remember which OTOMH).
Is LAME breaking the convention?
https://trac.ffmpeg.org/wiki/Encoding%20VBR%20%28Variable%20Bit%20Rate%29%20mp3%20audio
I think so. At least, it seems to be backwards (higher q should mean higher quality, but lame does it backwards).
comment:172 by , 11 years ago
libvorbis and libfaac break the convention, too. neroAacEnc.exe have the float quality value which 0 is lowest and 1 is highest, so if unchanged, the native encoder acts much like the nero.
follow-up: 174 comment:173 by , 11 years ago
Replying to Kamedo2:
Also, I think it's beneficial for the end users to set the -q:a value and typically gets a file with the bitrate around the set value. If one sets -q:a 256k, one gets a file of roughly 256kbps.
That's not doable without refactoring ffmpeg. -q:a sets the global_quality parameter, which is specified to have a somewhat standardized interpretation (1.0 = 100%, what 100% means is what some other codec means by it, can't remember which OTOMH).
Is LAME breaking the convention?
https://trac.ffmpeg.org/wiki/Encoding%20VBR%20%28Variable%20Bit%20Rate%29%20mp3%20audio
However, you can get (I think) a similar result by specifying both -q:a and -b:a, like so:
ffmpeg -i somefile.flac -c:a aac -b:a 256k -q:a 1 -strict experimental somefile.aacAlthough that seldom gives you 256k. The bitrate there is like a lower bound (aim for 256k, spend more if needed).
Thank you for the info. Your behavior seems much like the cvbr(most used mode), Apple iTunes.
If someone is to implement cvbr, I suggest to do it like the libopus encoder wrapper, where users are allowed to choose a "vbr" option like this http://ffmpeg.org/ffmpeg-codecs.html#Option-Mapping.
comment:174 by , 11 years ago
If someone is to implement cvbr, I suggest to do it like the libopus encoder wrapper, where users are allowed to choose a "vbr" option like this http://ffmpeg.org/ffmpeg-codecs.html#Option-Mapping.
Timothy_Gu, Thank you for the informative link. I'd like to use options like -b:a 256k -vbr.
comment:175 by , 11 years ago
However, you can get (I think) a similar result by specifying both -q:a and -b:a, like so:
ffmpeg -i somefile.flac -c:a aac -b:a 256k -q:a 1 -strict experimental somefile.aacAlthough that seldom gives you 256k. The bitrate there is like a lower bound (aim for 256k, spend more if needed).
I tried it over 128 different songs and the result was:
-b:a 256k -q:a 1
- Average 247kbps
- SD +/-33kbps
- Min 161kbps
- Max 300kbps
-q:a 1
- Average 235kbps
- SD +/-30kbps
- Min 154kbps
- Max 287kbps
(comment:160 change is not applied in this test.)
comment:176 by , 11 years ago
I'm preparing for the next listening test.
# Native aac patch v4 abr ffmpeg55212 -y -i in.wav -c:a aac -strict experimental -b:a 128k out.mp4 ffmpeg56470 -y -i out.mp4 -c:a pcm_s32le out.32bit.wav # Native aac patch v5 abr ffmpeg56470 -y -i in.wav -c:a aac -strict experimental -b:a 128k out.mp4 ffmpeg56470 -y -i out.mp4 -c:a pcm_s32le out.32bit.wav # Native aac patch v5 vbr ffmpeg56470 -y -i in.wav -c:a aac -strict experi
The sound file that cripples a native AAC encoder. True My Heart [DVTS-2121][07.09.03] Track05 2m50s~58s