Sound Pressure Level referenced to 20 µPa (threshold of hearing at 1 kHz). Conversation ≈ 60 dB SPL; rock concert ≈ 110. The slider's level is a pedagogical value, not the actual loudness from your speakers — the volume slider controls real-world loudness.

Why is threshold of hearing U-shaped?

Hearing peaks in 2–5 kHz (speech band). Below: eardrum mass and middle ear attenuate. Above ~10 kHz: cochlear response rolls off. Adults over 40 often can't hear above 14 kHz.

Why are masker and probe solo modes mutually exclusive?

For pedagogical clarity — solo buttons let you compare masker alone vs probe alone. Press 'Play both' to hear them simultaneously.

How does psychoacoustic masking relate to choosing MP3 or AAC bitrates?

Higher bitrates give the encoder more bits per frame, but more importantly they ensure the quantisation noise floor stays below the masking threshold even in frames with low-level content where masking gives little cover. At 128 kbps, complex polyphonic material can hit the threshold, causing pre-echo and pumping artefacts on transients. At 320 kbps the available masking is almost always sufficient. The principle is identical to the spreading-function plot shown in this simulator.

👁️

Frequency Masking Simulator

This frequency masking simulator lets you play with a masker and a probe tone and see the psychoacoustic masking math: the asymmetric spreading function, the Bark / critical-band boundaries, the absolute threshold of hearing, and the pre/post temporal masking curves. Then press Play and hear whether your ear agrees with the model.

Masker (green)

Masker frequency 1000 Hz

Masker level 80 dB

Probe (red when masked)

Probe frequency 1200 Hz

Probe level 35 dB

Verdict

Probe is

—

Margin vs threshold: —

Listen

Temporal: masker + probe

Probe offset (from masker START) +120 ms

Masker plays 0–100 ms. Negative = probe BEFORE masker (true pre-masking, ~−20 ms window). 0–100 = during masker (overlap). >100 = after masker (post-masking decay).

Output volume

Volume 15 %

Idle — adjust the masker/probe and press a Play button.

Simultaneous masking — frequency vs level

Temporal masking — time vs level (masker plays 0–100 ms)

Frequency Masking in Five Minutes

Frequency masking is the perceptual phenomenon where a louder sound makes a softer one inaudible. It's not a quirk — it's the foundation of every lossy audio codec (MP3, AAC, Opus, every voice codec). The encoder analyses the audio for what the listener would not be able to hear behind louder content, and discards exactly that. Without psychoacoustic masking there would be no streaming music as we know it. It is also the key principle behind noise shaping in digital audio, where quantisation noise is deliberately pushed into frequency regions the ear cannot hear due to masking by the signal itself.

Two flavours

Simultaneous masking (the top plot) is when a masker tone hides probe tones playing at the same time. The shape of what's hidden is described by the spreading function: a curve that radiates outward from the masker's frequency with characteristic asymmetric slopes.

Temporal masking (the bottom plot) is when a loud sound hides quieter sounds before and after it in time. Pre-masking is short (~20 ms) — your brain hasn't finished processing the sudden quiet thing before the loud one slams in and steals the attention. Post-masking is longer (100–200 ms) — the cochlea's response to the masker is still ringing down.

The upward spread of masking

Look at the yellow threshold curve on the top plot. The slope above the masker is much gentler than the slope below it (about 17 dB/Bark up vs 27 dB/Bark down). That asymmetry is real and biological: a high-amplitude tone "spreads" upward along the basilar membrane farther than it spreads downward, so it hides higher frequencies more effectively than it hides lower ones. This is why a loud bass note can mask a midrange vocal line, but a loud whistle doesn't hide the bass.

The Bark scale

The pink dotted lines mark critical band boundaries on the Bark scale — 24 perceptual frequency bands roughly proportional to where the cochlea's hair cells are spaced. Tones within the same critical band interact strongly; tones in different bands are processed semi-independently. Most masking calculations work in Bark space because that's what the ear cares about, not linear Hz.

Why does the probe look "above" the threshold but I still can't hear it?

The model is an approximation. The exact threshold depends on individual hearing, the test environment, masker bandwidth (tones vs noise vs music are different), and time integration. Below the masking curve lies the absolute threshold of hearing — the quietest level the ear can detect at each frequency with no masker present — and the plot combines both limits. The spreading function shown is the textbook tone-on-tone version with a ~15 dB tone-masking-tone offset. Real audibility margins of ±5–8 dB are normal. Where the model is most reliable: when the probe is many dB above or below the threshold curve, you're confidently audible or confidently masked.

How is this used in codecs?

An MP3/AAC/Opus encoder runs an FFT on each frame, computes a per-band masking threshold from the louder content using essentially the same spreading function shown here, then quantises each band only as finely as needed to keep the noise below that threshold. The compression ratio comes from spending fewer bits on frequencies where the listener wouldn't hear the noise anyway. This is why a 128 kbps MP3 of a rock track — rich in loud mid-frequency content — sounds better than a 128 kbps MP3 of a solo piano, where there is less masking available to hide the codec artefacts. If you want to hear the individual frequency content the encoder is working with, the Audio Spectrum Analyzer shows a live view of your audio in real time. For experiments with pure tones at exact frequencies, the Tone Generator lets you set a masker-like reference signal with precise level control.

Frequently Asked Questions

My ears disagree with the model. Is it broken?

The model is a textbook approximation of average human hearing — it'll be off by 5–10 dB for any one listener, especially around the high frequencies and at very low or very high masker levels. Use it for the shape of the curve and the relative ordering of audible vs masked, not for absolute calibration. Where the spreading function is most reliable is the upward-spread asymmetry: try setting the probe a few Bark above the masker vs a few Bark below, and you should hear masking clearly on the upper side.

What's "dB SPL"?

Sound Pressure Level, referenced to 20 µPa (the nominal threshold of hearing at 1 kHz). 0 dB SPL ≈ the quietest audible sound at 1 kHz; conversation is ~60 dB SPL; rock concert ~110 dB SPL; pain threshold ~130 dB SPL. The plot Y-axis is in dB SPL because that's the scale the ATH curve and the masking model are calibrated to. The slider's "level" is not the actual sound level coming out of your speakers — it's a relative pedagogical value; the volume slider controls real-world loudness.

Why is the threshold of hearing curve U-shaped?

Human hearing is most sensitive in the 2–5 kHz region — exactly the band that carries speech information. Below that the eardrum's mass and the middle ear's transfer function attenuate signals; above ~10 kHz the cochlea's high-frequency response rolls off and the hair cells responsible become more sparsely innervated. Old-age and noise-induced hearing loss shows up first as the upper end of the curve climbing — many adults over 40 can't hear above 14 kHz.

What are "critical bands" and the Bark scale?

The cochlea performs a frequency analysis where the bandwidth of each "band" gets wider with frequency: narrow at the low end (~50 Hz at 100 Hz centre), wide at the high end (~3.5 kHz at 10 kHz centre). The Bark scale maps these to integer "critical band rate" units 1–24 across the audible range. Two tones within the same Bark band interact strongly (masking, beats, fusion); two tones in different bands are treated semi-independently. Most psychoacoustic processing works in Bark, not Hz.

Why are masker and probe mutually exclusive in "play X only" mode?

For pedagogical clarity — when you press "Play masker only" we want you to hear the masker on its own, then switch to "Play probe only" to compare. If you want both together press "Play both". If they overlapped on the solo buttons you'd lose the cleanest A/B test.

Can I demonstrate pre-masking?

Set the probe offset slider to a small negative value (e.g., −15 ms) and press "Play sequential". The probe will fire just before the masker. Pre-masking is a fragile, short effect (~20 ms window) and depends heavily on probe level — too loud and you hear it; too quiet and the threshold of hearing itself dominates. The temporal plot shows the model's predicted threshold ramping up before t = 0.

Is anything recorded or uploaded?

No. The tool only generates sine tones with the Web Audio API and plays them through your speakers. No microphone, no file upload, no analytics on the audio.

Why does masking spread upward more than downward — is this asymmetry a real biological effect?

Yes, it is a well-established consequence of basilar membrane mechanics. The cochlea separates frequencies by location along its length, with high frequencies at the base and low frequencies at the apex. A loud tone at a given frequency creates a travelling wave that tapers off steeply toward lower frequencies (the tail toward the apex) but extends further toward higher frequencies. This means a loud bass note can mask a quieter midrange sound much more effectively than the reverse. Codecs exploit this heavily — the upward spread of masking means that bits can be saved in the upper harmonics of bass-heavy music without the listener noticing.

What is the difference between tone-on-tone masking and noise masking a tone?

This simulator models tone-on-tone (sinusoidal masker, sinusoidal probe). A pure-tone masker is actually a relatively weak masker compared to a narrow band of noise at the same level — a phenomenon called the "tone-to-noise" masking advantage. Narrowband noise spreads energy across a wider range of auditory nerve fibres, raising the threshold more broadly. Real audio (music, speech) behaves more like noise than like a pure tone, so the actual masking in programme material is stronger and wider than what this simulator shows. For mixing decisions, treat the simulator as a conservative lower bound on how much masking is occurring.

How does this psychoacoustic masking model relate to choosing MP3 or AAC bitrates?

Higher bitrates give the encoder more bits to use per frame, but more importantly, they allow the encoder to keep the quantisation noise floor below the masking threshold even in frames with low-level content or wide spectral spread — situations where masking gives little cover. At 128 kbps, complex polyphonic material often hits the threshold, causing pre-echo and pumping artefacts on transients. At 320 kbps, the available masking is almost always sufficient. For simple content (a solo voice, sparse piano), the masking threshold is high relative to the needed noise floor even at lower bitrates. The principle is identical to what is shown in the spreading-function plot here.

Related Tools

← All Signal Processing Tools