🌈

Real-Time Spectrogram

Scrolling frequency-vs-time waterfall from your microphone with five colormaps (Matrix green, Viridis, Magma, Inferno, Grayscale), log/linear frequency axis, adjustable dB range, configurable FFT size, and PNG screenshot export.

100% browser-based — your microphone audio never leaves your device.
⏸ FROZEN
Peak frequency
Peak magnitude
Bin width
Sample rate
Frame rate
Time span shown

Cursor

Frequency at cursor
Time at cursor
Musical note
Active colormapMatrix Green
Min dB Magnitude scale Max dB

Controls

About Spectrograms & Colormaps

A spectrogram shows three dimensions of audio at once: time on one axis, frequency on another, and magnitude as colour intensity. It is the single most informative way to look at a complex sound — speech formants, music harmonics, machine noise, bird songs, the click of a key on a keyboard all have distinctive spectrogram "fingerprints". In this tool, newest data scrolls in at the top and ages downward.

FFT size, bin width and time resolution

The frequency-axis resolution is Δf = sample_rate / N; the time-axis update rate is the inverse — bigger N means finer frequency lines but each row represents a longer time slice (~ N / sample_rate seconds). Two extremes:

  • Small FFT (1024 at 48 kHz: 46.9 Hz/bin, ~21 ms/row): sharp time response, blurry frequencies. Good for percussion, transient detection, drum analysis.
  • Large FFT (16384 at 48 kHz: 2.9 Hz/bin, ~341 ms/row): sharp frequency lines, smeared time. Good for sustained tones, instrument tuning, room modes.

For voice and most music, 4096 is the comfortable middle. This is the same time/frequency tradeoff that signal processors call the uncertainty principle — you cannot resolve both arbitrarily.

Choosing a colormap

The five built-in colormaps each have a job:

  • Matrix Green — the site default. Black → dark green → bright green → near-white. Looks cinematic; works well for high-contrast displays.
  • Viridis — matplotlib's default since 2017. Perceptually uniform across its range (equal brightness steps look equal-sized), colour-blind safe, accurate when printed in grayscale. The right default for scientific work.
  • Magma — dark, perceptually uniform, with deep purples in low-energy regions. Good for displaying spectrograms on light backgrounds.
  • Inferno — bright, perceptually uniform, like Magma but warmer. Good for highlighting strong peaks.
  • Grayscale — simplest possible. Useful for publication-quality figures or when you want users to bring their own colour mapping.

Floor and ceiling dB

The floor (minimum) is the dB level that maps to the colormap's "minimum" colour (black/dark purple); the ceiling (maximum) maps to the "brightest" colour. Lower the floor to see more detail in quiet passages; raise it to suppress noise. Tighten the range (e.g., −60 to −20) for high-contrast displays of strong-signal regions; widen it (−120 to 0) for everything-visible monitoring. The floor/ceiling are the analyser's minDecibels / maxDecibels properties — they control the linear-to-dB mapping internally as well.

Why logarithmic frequency?

Music and speech are pitched roughly logarithmically — each octave is a doubling of frequency. A linear-axis spectrogram puts the entire bass range in 1% of the screen width and gives the unused high frequencies the rest. A logarithmic axis gives each octave equal visual space, which is what musicians, voice analysts, and acousticians want almost always. Use linear only when you specifically need uniform spacing of high-frequency tones.

Frequently Asked Questions

What's the difference between a spectrogram and a spectrum analyzer?
A spectrum analyzer shows the instantaneous spectrum (frequency content of one moment); a spectrogram shows the spectrum's evolution over time, with magnitude as colour. The Audio Spectrum Analyzer tool on this site is the instantaneous view; this Real-Time Spectrogram is the time-evolving view. They use the exact same FFT math under the hood — just different rendering.
How do I interpret the colors?
Brighter = stronger signal at that frequency at that moment. The dB scale next to the cursor info panel shows the mapping. With the Matrix colormap: bright white-green peaks are dominant tones, dim green is background noise, black is silence (below the dB floor). With Viridis: yellow = peak, green = mid, deep purple = floor. The exact dB → colour mapping is controlled by your Floor and Ceiling sliders.
Why does my voice look like horizontal bands?
Those are vocal harmonics. The human voice has a fundamental frequency (90–250 Hz typically) and a series of integer-multiple harmonics extending up to ~10 kHz. The fundamental appears as the lowest bright band; the second, third, fourth, ... harmonics appear at multiples above it. The shaping of those harmonics by the vocal tract creates "formants" — bands of emphasized harmonics around 500 / 1500 / 2500 Hz that define vowel identity. Sing a sustained vowel and watch the harmonics line up; change the vowel without changing pitch to see the formants shift.
Why is the time resolution variable?
Each new row of the spectrogram represents the time it takes to capture and FFT one block of N samples (the FFT size). At a fixed sample rate, larger N = longer time per row. At N=4096 / 48 kHz, each row is ~85 ms. The Analyser rAF rate is independent — typically ~60 fps — but it reads the SAME most-recent N samples each tick. The actual unique-data update rate is set by the audio API's internal block size (usually 128 samples / ~2.7 ms at 48 kHz), independent of N. So the spectrogram visually scrolls smoothly even when N is large; each row just represents the same wider time slice.
Can I export a long recording's spectrogram?
The screenshot button captures the current canvas state — the visible time window. To capture a long passage, freeze at the moment you want to keep, then screenshot. For programmatic long-form analysis, the Waveform Visualizer's file-mode is a better starting point — it shows the entire waveform of an uploaded file at once, and a future enhancement could add spectrogram-of-file. For now, this tool is real-time-only.
Why do high frequencies look dimmer than low ones for music?
Music's natural spectrum typically follows a pink-noise-ish shape: power drops by roughly 3 dB per octave as frequency rises. So 10 kHz content has about 1/10 the power of 1 kHz content even when both sound "balanced" to the ear. The spectrogram shows the raw dB — high frequencies legitimately are dimmer. If you want them visually equalized, raise the floor (suppress more bass) or apply A-weighting (use the Audio Spectrum Analyzer tool, which has weighting — spectrograms with weighted curves are uncommon).
Does the spectrogram capture stereo separately?
No — the AnalyserNode mixes all input channels to a single mono signal before FFT. To see L and R separately you'd need two independent analysers (one per channel). For phase relationships between channels, use the Phase Frequency Analyzer (when built) in this category. The dedicated Stereo Channel Tester (in the Microphone & Diagnostics category) is the right tool for L/R signal comparison.