Does Youtube lowpass audio at 16k?

Response to Dan Worrall's video

In a recent video, Dan Worrall uploads a test audio file to youtube to investigate an often-repeated claim that youtube introduces a lowpass filter at 16kHz. This file, only about 9 seconds in length, contains a short linear frequency sweep, as well as a period of noise. Dan then plays this video back at several quality levels and uses this as the basis of his analysis.

As a software engineer who has been playing with youtube audio for quite some time, as well as a listener to youtube's audio offerings (Youtube videos & Youtube Music - RIP Google Play), I too am interested in helping get to the bottom of this.

Background

Youtube allows videos to be played back at several different quality settings. Hopefully this statement is not a revelation to anyone, as it is nothing new; youtube has been serving videos at multiple quality levels for almost two decades at this point. What is less widely known, is that youtube separates the audio and video and serves these separately. This is part of what allows Youtube Music to exist, as well as those with youtube premium to listen to videos in the background without consuming video bandwidth.

These separate audio formats, however, are relatively hidden from the user. The menu options available on a video itself only allow choosing between video quality levels. What's more, is that these formats are device-targeted, like the video formats. Many may not know that youtube is in the middle of a format migration (and has been for the past several years). H.264 video with AAC audio (the age-old .mp4 that youtube started with) is deprecated in favor of VP9 video with OGG-OPUS audio (using the .webm container). Youtube keeps transcoding to H.264 video and AAC audio for legacy devices that are unable to understand this newer format.

Available Formats

With that in mind, let's take a closer look at the formats available for Dan's test video. The full table has quite a bit of info in it, so this table is abbreviated to only show the relevant formats that contain audio.

IDTypeVideo Res.FilesizeTotal BitrateAudio CodecAudio BitrateSample RateMore InfoSample
139m4aaudio only52.66KiB49kmp4a.40.549k22050Hzlow, m4a_dashPlay Sample
249webmaudio only38.49KiB36kopus36k48000Hzlow, webm_dashPlay Sample
250webmaudio only64.68KiB61kopus61k48000Hzlow, webm_dashPlay Sample
140m4aaudio only137.55KiB130kmp4a.40.2130k44100Hzmedium, m4a_dashPlay Sample
251webmaudio only107.79KiB102kopus102k48000Hzmedium, webm_dashPlay Sample
173gp176x14485.87KiB81kmp4a.40.20k22050Hz144pPlay Sample
18mp4640x360566.35KiB535kmp4a.40.20k44100Hz360pPlay Sample
22mp41280x720~ 1.72MiB1564kmp4a.40.20k44100Hz720pPlay Sample

Feel free to download each of these clips and load them in your analysis tool of choice, in fact I encourage it. You should never take one person's analysis at face value, as there is always the possibility it is flawed.

Methodology

Each of these files has been download from YouTube's servers without transcoding. The container has been fixed with ffmpeg (an automatic function of yt-dlp/youtube-dl) to correct for dash container differences for portability, but this in no way changes the media data. These files were all downloaded with the yt-dlp python tool, with analysis done with ffmpeg's showspectrumpic filter with default settings. A linear frequency plot was chosen to increase resolution at the highest frequencies on the plot, as the question at answer here is the cut-off point. Additionally, the frequency sweep performed by Dan was linear, and it shows up more nicely on a linear plot.

Results

This brings us back to our question. Does Youtube lowpass audio at 16k?. The answer is a little complicated. Let's look at some spectrograms.

Well no, but actually yes

3gp - Legacy Mobile Video.

The worst and most laughable case is the 3gp video format, encoded with a sample rate of 22kHz. This has a clear cut-off at approximately 8kHz, which is unsurprising given the sample rate (cutoff is ~3kHz under nyquist). This is an old format, which is really only still around for legacy mobile and embedded platforms. I would be quite surprised if it was used with any regularity.

Fig. 1 - Spectrogram, Format 17 (3gp).

MP4 - Legacy video, bundled with audio

These formats are bundles, where the video and the audio are encoded into the same container. These were the formats that used to power youtube back in the late 2000s and early 2010s. Because the audio and video are not separate in these formats, the links above show the whole video.

The sample rate of these clips is 44.1kHz, and so a cut-off around 19-20kHz is what would be expected. Here, however, we see a cut-off much closer to 15kHz on the 360p format and 16kHz on the 720p format. Whether this is a result of a new video being run through old formats, or if really old videos also suffer from a low-pass as low as this is an interesting question, but outside the scope here.

Fig. 2 - Spectrogram, Format 18 (m4a, legacy 360p).
Fig. 3 - Spectrogram, Format 22 (m4a, legacy 720p).
webm / ogg-opus - Modern AV Codec.

Next up are the modern codecs. OGG-OPUS is renowned for its versatility and ability to maintain decent audio fidelity even at low bitrates. For this reason, it is beginning to supplant many other domain-specific formats in applications such as audio streaming, internet voice chat, digital radio telephony, and others.

These audio samples represent what the vast majority of users will experience, as newer web browsers, mobile apps, and even most newer smart TVs will utilize these formats. One potential exception is on iOS, where Apply only implemeted support for VP9 in iOS 14. Whether this would also prevent these devices from utilizing the higher quality opus audio is not easily answerable in a quick google search.

In all three cases, we see a cutoff around 20kHz. Given the audio sample rate of 48kHz, it is a bit surprising that the cutoff isn't higher, but Dan doesn't mention what sample rate he sent his encoded file up to youtube at. If the original audio was encoded at 48kHz, we should see a higher cut-off frequency than this.

It is interesting to see the artifacts introduced by the heavy compression, especially at the 36kbit level. There appears to be a some sort of weird aliasing effect, but it's not clear why this exists. Perhaps a result of extreme rounding in the cosine transforms?

Fig. 4 - Spectrogram, Format 249 (opus 36k).
Fig. 5 - Spectrogram, Format 250 (opus 61k).
Fig. 6 - Spectrogram, Format 251 (opus 102k).

m4a - Legacy Dash Audio.

These formats represent a transition point, supporting devices new enough to understand DASH, thus being able to take advantage of audio quality switching irrespective of video quality, but that don't understand the newer opus codec. Just as with the legacy mp4 format where the audio was baked in to the media files, we see a 16kHz cut-off point.

Fig. 7 - Spectrogram, Format 139 (aac 49k).
Fig. 8 - Spectrogram, Format 140 (aac 130k).

Conclusion

The "Myth" that YouTube introduces a low-pass filter around 16kHz is no myth, but fortunatly for content creators and viewers alike, the vast majority of users avoid the formats where this is introduced.

If any further questions, corrections, or additions are warranted, feel free to email this feedback to corrections@tyzoid.com.