Opened 2 years ago

Last modified 17 months ago

#8066 open defect

Bad quality encoding of high compressed audio by AAC encoder

Reported by: Lirk Owned by: Lynne
Priority: normal Component: avcodec
Version: git-master Keywords: aac
Cc: marcan@marcan.st Blocked By:
Blocking: Reproduced by developer: yes
Analyzed by developer: yes

Description

ffmpeg version N-94167-ga514244319

Bad quality encoding of high compressed audio by AAC encoder with enabled options: -aac_coder fast or -aac_is enable.
With using -aac_coder twoloop or disabling aac_is - audio quality is better.

Attachments (11)

aac_twoloop_128K.m4a (150.9 KB ) - added by Lirk 18 months ago.
-b:a 128K -c:a aac -aac_coder twoloop
aac_fast_128K-aac-is-false.m4a (146.1 KB ) - added by Lirk 18 months ago.
aac_fast_128K.m4a (146.1 KB ) - added by Lirk 18 months ago.
-b:a 128K -c:a aac
aac_fast_320K.m4a (343.2 KB ) - added by Lirk 18 months ago.
-b:a 320K -c:a aac
original.ac3 (273.9 KB ) - added by Lirk 18 months ago.
-c copy
libmp3lame_128K.mp3 (157.2 KB ) - added by Lirk 18 months ago.
-b:a 128K -c:a libmp3lame
aac_fast_128k_5cascades.m4a (167.0 KB ) - added by Thomas Mundt 18 months ago.
aac_twoloop_128k_5cascades.m4a (163.5 KB ) - added by Thomas Mundt 18 months ago.
mp3_5cascades.mp3 (158.9 KB ) - added by Thomas Mundt 18 months ago.
aac_128k_5cascades_ffmpeg28.m4a (163.5 KB ) - added by Thomas Mundt 18 months ago.
photon_melodies_sample_2.wav (1.3 MB ) - added by Hector Martin 18 months ago.

Change History (29)

comment:1 by Nicolas George, 2 years ago

Are you submitting a bug report to complain that the AAC encoder behaves as the documentation says?

https://ffmpeg.org/ffmpeg-codecs.html#Options-6

comment:2 by Hendrik, 2 years ago

fast being worse then twoloop might be expected, but "is" causing a notable degredation is probably a bug.

comment:3 by Nicolas George, 2 years ago

“Can be disabled for debugging by setting the value to "disable".”

Why do you expect any kind of usable result when debugging?

comment:4 by Hendrik, 2 years ago

If the quality improves when you disable it, then thats clearly a bug in the option, since its documented to only be used when its beneficial, which makes it sound like you should never disable it in normal operations.

comment:5 by Carl Eugen Hoyos, 2 years ago

Component: avcodecundetermined
Keywords: aac twoloop aac_is fast removed
Resolution: needs_more_info
Status: newclosed

by Lirk, 18 months ago

Attachment: aac_twoloop_128K.m4a added

-b:a 128K -c:a aac -aac_coder twoloop

by Lirk, 18 months ago

Attachment: aac_fast_128K.m4a added

-b:a 128K -c:a aac

by Lirk, 18 months ago

Attachment: aac_fast_320K.m4a added

-b:a 320K -c:a aac

comment:6 by Lirk, 18 months ago

Added samples.
Even with 320kbps "fast" coder encoding worse than "twoloop" coder.

Last edited 18 months ago by Lirk (previous) (diff)

comment:7 by Lirk, 18 months ago

Resolution: needs_more_info
Status: closedreopened

Added samples.
Even with 320kbps "fast" coder encoding worse than "twoloop" coder.

comment:8 by Lirk, 18 months ago

Component: undeterminedavcodec
Version: unspecifiedgit-master

Also, in this case native AAC encoder encodes worse, than libmp3lame encoder with the same bitrate (128kbps).

by Lirk, 18 months ago

Attachment: original.ac3 added

-c copy

by Lirk, 18 months ago

Attachment: libmp3lame_128K.mp3 added

-b:a 128K -c:a libmp3lame

comment:9 by Lirk, 18 months ago

Status: reopenedopen

comment:10 by Lynne, 18 months ago

Analyzed by developer: set
Keywords: aac added
Owner: set to Lynne
Reproduced by developer: set

Aware of that, have some unfinished code to fix a lot of issues, was going to get around to doing that sometime soon, since it would benefit the opus encoder too.
If you're part of the brigade/mob, next time please don't recruit an army on twitter to raise an issue, file a bug or ping a developer.

by Thomas Mundt, 18 months ago

Attachment: aac_fast_128k_5cascades.m4a added

by Thomas Mundt, 18 months ago

by Thomas Mundt, 18 months ago

Attachment: mp3_5cascades.mp3 added

by Thomas Mundt, 18 months ago

comment:11 by Thomas Mundt, 18 months ago

I also just stumbled upon the very poor quality of the native aac encoder. There seems to be a fundamental problem in processing the high frequencies. To make the errors more audible I cascaded the transcoding 5 times (aac_fast_128k_5cascades.m4a). This also shows that the twoloop encoder sounds different, but also bad (aac_twoloop_128k_5cascades.m4a). For comparison I uploaded a 5 times cascaded version with the Lame MP3 encoder (mp3_5cascades.mp3).
Furthermore it seems to be a regression. With ffmpeg 2.8 the native aac encoder sounds much better (aac_128k_5cascades_ffmpeg28.m4a). Not outstanding, but it comes much closer to MP3 encoding.
I used Lirk's sample. The poor quality can be reproduced with any densely mixed music.

comment:12 by Lynne, 18 months ago

I need raw samples which encode with issues, not already encoded samples. I know where the issue is, its the transient decision analyzer that's lifted from the 3gpp spec.
I also need samples which generate issues with a single encode and do not require cascading.
Cascading isn't a good metric for overall encoder quality, since its easily cheated (libopusfile cheats, libopus can't). Its a test of the stability of the transient detector. The one I'm writing has far better stability.
We could cheat, but its impractical for most purposes, since you can't cheat and mix multiple sources at the same time, and it requires piping side data between the decoder and the encoder, which will be stripped by any filtering in between.

comment:13 by Hector Martin, 18 months ago

While cascading is certainly not the ultimate encoder test, I disagree with the premise that it isn't a good idea because you can "cheat". Allowing sideband data is a silly idea for this test; you'd do it by going through a pipeline where the only thing between repeated encodings is plain PCM.

Repeated encodes can bring out artifacts and problems which are harder to discern in a single encode. Additionally, single-digit number of transcodes are a thing in valid use cases, and something worth optimizing for. An encoder that produces 10% better quality than another encoder at a single encode, but 40% worse quality after 2 encodes, is an inferior encoder for a lot of practical use cases (and if that tradeoff really must be made, that should be a configurable option so both use cases are covered).

For what it's worth, libfdk_aac seems to be exceptionally stable. At 100 repeat encodes at 128kbps, it still produces listenable output, with somewhat patchy high frequency bands but otherwise no added distortion nor level changes. I would subjectively rate it on par with ffmpeg-aac after just 6 transcodes. At that point ffmpeg-aac has more high frequency content, but also has more audible *added* noise/artifacts. Those add up to complete destruction by the time you get to 100 transcodes. Sure, 100 transcodes is a silly use case, but the results still sound like they should lead to some insight as to how to improve the encoder.

Also, consider that already encoded audio is perfectly valid PCM after decoding, and a useful sample to test encoding on. While there are subtleties regarding whether an encoder encoding its own output would have an advantage vs encoding another encoder's output (or another codec's entirely), this is still not a test to be dismissed on the face of it. This whole thing started for me because a practical MP3 (256k) -> ffmpeg-aac (320kbps) -> ffmpeg-aac (320kbps) pipeline had noticeably worse artifacts than replacing just the last step with libfdk_aac at the same bitrate, or even at 128kbps. I posit that 320kbps should be effectively transparent for essentially all samples, and should remain so even after a few transcodes. The sample that started this all is not transparent for me after just 2 rounds of ffmpeg-aac at 320kbps (>99% confidence), but is after 100 rounds of libfdk_aac at the same bitrate.

Just to add some relevant references:

(meta: I "recruited an army on twitter" because last time I raised an issue about this I was effectively ignored/dismissed as crazy and imagining artifacts, and this time, before making the twitter thread, I was linked to this ticket by another user, where I saw that the immediate response from a developer in comments 1 and 3 was dismissive, rude, and outright had not paid attention to what the user who opened the ticket had actually said; this was followed by the ticket being closed NEEDINFO by another person with no further explanation or request for info, even though the original user who opened it attempted to calmly clarify what the problem was. I started the Twitter thread after seeing this; then I had a request for info from @ffmpeg, which I did provide (after considering for a while whether I really should be spending more of my time on this), and this was followed by more rude responses aimed at me from yet another ffmpeg developer. If ffmpeg developers expect users to "file a bug or ping a developer", I would recommend reacting politely and listening when users do exactly that, and perhaps consider whether keeping certain developers around who alienate users is a net benefit for the project. I am happy to provide samples and perform listening tests if they are going to be taken seriously, but so far that hasn't happened and I'd like some reassurance that they will before I waste more of my time on this.)

Last edited 18 months ago by Hector Martin (previous) (diff)

comment:14 by Lynne, 18 months ago

I didn't say a cascaded encode wasn't useful or relevant to any scenario, I pointed out it stresses a single component of the encoder, and that specific part hasn't been touched since at least 1996 (its a line-by-line copy of the 3gpp example encoder aac spec). As for why it got so far: no one really noticed until the encoder started getting used by OBS, which took 3-4 years after most of the work on the encoder was done.

I'm working on a replacement psychoacoustic system for both the opus encoder and the aac encoder, but I need raw samples at both 44.1 and 48khz that display artifacts. Obviously, the higher the bitrate and the lower the amount of cascading needed, the better. I'd prefer 48khz (could someone tell OBS people to make that the default, most streams use 44.1 which, aside from the obvious reasons, doesn't produce nice Bark bands and makes writing a psychoacoustic model less general and more samplerate-specific).

Having said that, I would like to point out that, in the real world, where mixing and sub-frame-sample-offsets happen, sterile cascading tests could potentially give highly misleading results, especially with good encoders like libopus.
The performance of cascading encoders depends highly on whether each decoded frame is given sample-aligned to the encoder. Even a small alignment difference for each successive encode can ruin the result. For example, if a transient AAC frame (1024 samples, split into 8 smaller transforms) is given to an encoder with a 64-sample offset, the block boundary of each smaller decoded transform, where most MDCT codec artifacts happen, will be in the middle of the encoder's frame. Which, if it decides to encode as a transient (very possible, given the artifacts increase the energy) will produce very annoying results after no more than 5-6 encodes, regardless of the bitrate.
Coincidences like that happen and are somewhat out of your control, unless you like to inject discontinuities and latency into your stream and assume frame sizes.
As for Opus, it uses 120-sample overlaps on 960-sample frames, rather than the AAC's 512-samples at 1024 frames. With such a low amount of overlap (1/4 less compared to AAC), there are even higher artifacts at the frame boundary, even with non-transient frames (Opus too splits transient frames into 8 smaller transforms), and even worse, it does TF switching (recombines/uncombines smaller transforms) which is highly sensitives towards the signal (and artifacts), so Opus really benefits from "cheating". Thankfully, some of this can be kept under control due to its lossless signalling of band energy levels (so low-frequency artifacts can overwhelm the signal acceptably).
In conclusion, while cascading does give you a good idea of how the encoder deals with codec artifacts, don't assume it won't spazz out on you in the field. Not saying it isn't useful, just not very useful for that exact case where your frames aren't aligned.

Meta:
ffmpeg is hardly an organized, focused, single-entity project where each contributor forms a part of a swarm mind and can work and judge anything, or is responsible for the actions of another. You shouldn't take some random contributor's word for much, let alone a bug tracker janitor. I don't even read the bug tracker unless something in the title strikes me as relevant to my field and I by chance read it.
Certainly, while people who do research on encoders for fun (and for free) are few and far between, everyone knows everyone in this field, so you only needed to ask literally anywhere (especially on freenode) to find the most appropriate person who would pay attention to it, rather than do an angry broadcast and hope someone listens.
As it turns out, some motivated people did exactly that, and somehow I got very unwelcome and demotivating private messages. While perhaps such things are not entirely responsible for ffmpeg developers' overall reputation of being an unwelcoming, they really don't help. Especially when it becomes a personal attack like now, since there's really only a single person who would do this work. Shaming companies where people share duties and responsibilities (or even lack such, since they're paid) is one thing, but when it comes to open source, there's usually just a single person behind a given feature.
I spoke with 2 "leaders" behind 2 other large projects and was told to just ignore such messages, as every project this size gets random "you suck" complaints on a daily basis (seriously).
Regarding the "best h264 encoder and worst aac encoder" comment, x264 literally got millions from huge companies, still does to this day, and had several full time developers. Can't say the latter got anything.
Regarding the "developers must be held responsible for not maintaining such a widely used project" comment: I can name addresses and address names of who to send glitter bombs to. Unfortunately, one of those resolves as 'NULL', at '0xffffffffffffffff', which is dependent on undefined behavior, metaphysical interpretation.

Last edited 18 months ago by Lynne (previous) (diff)

comment:15 by Thomas Mundt, 18 months ago

First of all thank you Lynne for taking care of this problem! I am not a user of OBS or any "social" networks. I just recently switched from MP3 to AAC because MP3 is no longer supported on the device of one of my users.
For me the artifacts are very audible even without cascading. Also with Lirk's sample. I just wanted to make it more clear.
I was a little bit surprised that the native AAC encoder is claimed in the ffmpeg documentation to be on par with the FDK AAC encoder at 128 kBit/s and better than MP3. This is definitely not true. I also find it surprising that the previous native AAC encoder from ffmpeg 2.8 sounds much better. Why was it replaced?
I would like to provide samples where the problem is more audible without cascading. But for me all music samples I tested sound similarly bad.

by Hector Martin, 18 months ago

comment:16 by Hector Martin, 18 months ago

You're right that cascading with frame alignment is a best case scenario and not representative of typical streaming/etc use cases. Anecdotally, this also aligns with how when I noticed the artifacts from 2x 320k ffmpeg-aac done through two instances of OBS, they seemed fairly evident (presumably not frame aligned), while when I did an ABX test later (with the cascading aligned), I still was able to tell but it took more careful listening. (Unfortunately the 2xOBS test was literally with two people on the other side of the planet both running OBS and a DJ setup as the input, so I can't literally reproduce it as is, though I can try to approximate it).

But either way, having aligned inputs is a best case scenario, so it makes sense that the encoder ought to do a good job then at least.

I'm attaching the sample I used for the 2x 320kbps ABX test, which was the song that was playing via the DJ/OBS pipeline when I first noticed the artifacts. I need to try it at a single encode and see if I can still tell it apart. It is sourced from a 44.1 256kbps MP3 (no better source is available), but that should be mostly irrelevant, just treat it as a blob of PCM data that AAC encoders may do a better or worse job on.

Meta:
ffmpeg may not be a very organized project, but the actions of its developers and other members reflect on the project as a whole, and tolerance of misbehavior by other developers indirectly reflects on them. The reason why some projects are adopting codes of conduct and such is to have a more unified view of what kind of community the project is attempting to foster. While deciding whether to actually do that and what the contents should be is a massive debate and can of worms I'm not going to open here, the underlying fact here is that projects aren't just loose collections of independent developers; they have a shared image and the project as a whole, and other members, are not insulated from the actions of others. Having things literally implemented as a free-for-all with no consequences for those who act poorly towards users does not excuse other developers who may not be directly the problem, but enable those who are by doing nothing about it. Whether it be via codified rules like a CoC, or via informal discussion and consensus among members, projects need to manage their image just like any other organization.

I'm sorry that you got personally attacked in private. For what it's worth, I've been a notable public face of a certain project/community in the past, so I know what getting that stuff is like, and it's a big reason why I burned out of ever taking that kind of role again in that field. My issues with the AAC encoder are not intended as demands for improvement, nor to mock the project, but rather my frustration stems from the claim of superiority over FDK (which seemed patently absurd to me, I've yet to see a sample where FDK was clearly worse), which seems misleading towards users; and at the animosity towards me for trying to bring the issue up. I have no problem with an encoder that isn't ideal as long as it is documented as such, and of course I would support any improvements to it, so long as honest feedback/test results from me are taken seriously. Unfortunately, last time I brought this up (again with samples) nothing changed in the docs; it seems this time around a patch was posted to remove the claim, so perhaps that will be fixed soon. And I do hope the encoder does improve to the point where it surpasses FDK some day.

The line about h.264 vs aac encoders was not intended to put them on equal footing. It was merely a comment on the paradoxical status quo where, of two proprietary and patented codecs which are most often used together, the open source world has probably the best encoder for one, and only fairly mediocre support for the other. I understand *how* that happened, it's just a comment on the weird state of things.

comment:17 by Hector Martin, 18 months ago

Cc: marcan@marcan.st added

comment:18 by Lirk, 17 months ago

I wonder how "Avidemux" team bypass "FDK AAC" license restrictions? Maybe it makes sense to contact the developers of "FDK AAC" and solve the issue of integration "FDK" to "FFmpeg" compiled builds?

Note: See TracTickets for help on using tickets.