Frequency Heatmap Generator

Drop an audio file and render a full static spectral heatmap — time on the horizontal axis, frequency on the vertical, and magnitude as colour. A windowed STFT (FFT + Hann window, 50% overlap) is computed across the whole clip, with five colour scales, log/linear frequency axis, adjustable dB floor/ceiling, and PNG + SVG export. Everything runs locally — nothing is uploaded.

100% browser-based — files are decoded locally with the Web Audio API and never uploaded, recorded or transmitted.
📁 Drop an audio file here, or click to browse (.wav, .mp3, .ogg, .flac, .m4a)

Idle — upload an audio file to generate its spectral heatmap.

−100 dB Magnitude (relative dBFS) — Viridis 0 dB

Heatmap controls

Larger = finer frequency, coarser time.

Floor maps to the darkest colour, ceiling to the brightest. Lower the floor to reveal quiet detail; tighten the range for high-contrast peaks.

Analysis details

File
Duration
Sample rate
Nyquist (max freq)
FFT size2048 (1024 bins)
Time columns
Resolution

Frequencies above Nyquist (sample rate ÷ 2) cannot appear — they were never captured in the file. Long clips are downsampled to at most 1,400 time columns for display.

How to Use

  1. Drop a file (or click the zone to browse). WAV, MP3, OGG, FLAC and M4A all work where your browser can decode them. The file is decoded entirely in your browser.
  2. The heatmap renders automatically: each vertical slice is one FFT, plotted left→right over time, with colour encoding magnitude in dB.
  3. Pick a colour scale and a frequency axis (log is best for music and voice; linear spreads high frequencies evenly).
  4. Drag the floor / ceiling sliders to re-map the colour range — lower the floor to see faint detail, raise it to suppress noise.
  5. Change FFT size to trade frequency resolution against time resolution, then Export PNG or SVG.

Understanding Your Results

A spectral heatmap (a spectrogram rendered as a static image) shows three dimensions at once: time across the bottom, frequency up the side, and magnitude as colour. Horizontal stripes are sustained tones or harmonics; vertical streaks are transients (clicks, drum hits, consonants); diagonal lines are pitch glides or sweeps.

The colour represents relative magnitude in dBFS — decibels relative to digital full scale, not calibrated sound-pressure level. The brightest colour is whatever your ceiling slider is set to; the darkest is the floor. Two files mastered at different levels will look different even if they contain the same content, because the absolute reference depends on the file’s own peak level, not on a physical SPL.

The vertical extent is bounded by Nyquist — half the file’s sample rate. A 44.1 kHz file can only show up to ~22 kHz; nothing above that was ever recorded. The vertical resolution is one FFT bin, Δf = sample rate ÷ FFT size. The horizontal resolution is the number of time columns; for long files the frames are sampled evenly down to a fixed maximum (1,400) — one frame kept, the rest skipped — so a very brief event that falls on a skipped frame can be thinned out or missed entirely.

How It Works

When you drop a file, the browser’s AudioContext.decodeAudioData turns it into raw PCM samples — locally, with no upload. The tool then runs a Short-Time Fourier Transform (STFT):

  • The chosen channel is split into overlapping frames of N samples (the FFT size), stepped by 50% (the hop).
  • Each frame is multiplied by a Hann window to reduce spectral leakage, then transformed with a radix-2 FFT.
  • Each frame’s magnitude spectrum becomes one column of the heatmap. Magnitudes are window-normalised and converted to dBFS (20 · log10).
  • If the clip would produce more than 1,400 frames, frames are sampled evenly down to 1,400 columns so the canvas, memory and exports stay bounded.

Rendering maps each pixel row to a frequency bin through the selected axis (log or linear) and each pixel column to a time column, then looks the dB value up in a 256-entry colour-map lookup table whose endpoints are your floor and ceiling. PNG export re-renders the heatmap into a clean canvas and downloads it via canvas.toBlob. SVG export writes a standalone SVG document: the heatmap raster is embedded as a bounded base64 image and the axis ticks/labels are emitted as vectors, downloaded via a Blob. This is a 2D-canvas renderer — no WebGL is used or claimed.

Honesty & limits. Results are bounded by the file’s sample rate (Nyquist) and the FFT resolution you choose. Colours are relative, uncalibrated dBFS — useful for comparing content within one clip, not for absolute SPL. Long files are downsampled for display. The model is the standard windowed STFT; like all spectrograms it trades time resolution against frequency resolution and cannot beat that uncertainty limit. Nothing leaves your device.

Frequently Asked Questions

Why can’t I see frequencies above a certain point?
The heatmap can only show frequencies up to the Nyquist limit — half the file’s sample rate. A 44.1 kHz file tops out near 22 kHz, a 48 kHz file near 24 kHz. Anything above that frequency was filtered out and never stored in the file, so it physically cannot be displayed. The Analysis details panel shows the exact Nyquist value for your file.
Are the colours calibrated to real loudness (dB SPL)?
No. Colours encode relative magnitude in dBFS (decibels relative to digital full scale), not sound-pressure level. They are meaningful for comparing content within one file — which parts are louder, which frequencies dominate — but two files at different mastering levels will look different even with identical content. For absolute SPL you need a calibrated measurement microphone and reference level, which a browser cannot provide.
What does the FFT size change?
FFT size sets the frequency resolution: Δf = sample rate ÷ FFT size. A large FFT (8192) gives fine, closely-spaced frequency lines but each column spans a longer time slice, smearing transients. A small FFT (512) gives sharp timing but blurry frequencies. This is the time–frequency uncertainty trade-off; 2048 is a good general default. Changing the FFT size re-runs the whole STFT.
My file looks blocky or low-detail in time — why?
Long files are downsampled to at most 1,400 time columns so the canvas and exports stay bounded and a huge file can’t exhaust memory. When that happens, the frames are sampled evenly — one frame kept, the rest skipped — so a brief event (a single drum hit, a click) that falls on a skipped frame can be thinned out or missed entirely rather than smoothly merged with its neighbours. For frame-accurate inspection of a short moment, trim the clip to just that region before uploading.
What do the floor and ceiling sliders do?
They set the dB values that map to the darkest and brightest ends of the colour scale. Lowering the floor reveals faint, low-energy detail (and more noise); raising it hides the quiet stuff. Tightening the range (say −60 to −10 dB) gives a punchy, high-contrast image of the strongest content; widening it (say −130 to 0 dB) shows everything. The legend bar updates to reflect your current range.
Should I use the log or linear frequency axis?
Use logarithmic for music, voice and most natural sound — pitch is perceived logarithmically, so a log axis gives each octave equal vertical space and makes harmonics and melodies easy to read. Use linear when you care about evenly-spaced high frequencies, harmonic spacing of a single tone, or matching a datasheet that plots linear Hz. Both views use exactly the same STFT data underneath.
Is the SVG export real vector graphics?
Partly. The axis grid, tick labels and titles are true SVG vectors that stay crisp at any zoom. The heatmap itself — potentially over a million coloured cells — is embedded as a bounded base64 raster image inside the SVG, because emitting one vector rectangle per cell would create an enormous, slow file. This keeps the SVG portable and editable in Inkscape or Illustrator while staying a reasonable size. The PNG export is a flat raster of the whole figure.
Is my audio uploaded anywhere?
No. The file is read and decoded entirely in your browser with the Web Audio API; the samples never leave your device, nothing is recorded, and there is no server round-trip. You can confirm this in your browser’s network panel — loading a file makes no outbound requests. Exports are generated locally and downloaded straight from the page.