🌈

Real-Time Spectrogram

This free real-time spectrogram shows a scrolling frequency-vs-time waterfall from your microphone with five colormaps (Matrix green, Viridis, Magma, Inferno, Grayscale), log/linear frequency axis, adjustable dB range, configurable FFT size, and PNG screenshot export.

100% browser-based — your microphone audio never leaves your device.

⏸ FROZEN

Peak frequency

—

Peak magnitude

—

Bin width

—

Sample rate

—

Frame rate

—

Time span shown

—

Cursor

Frequency at cursor	—
Time at cursor	—
Musical note	—
Active colormap	Matrix Green

Min dB Magnitude scale Max dB

Controls

Microphone

FFT size

Colormap

Frequency axis

Max frequency

Floor (min dB) −90

Ceiling (max dB) −10

Analyser smoothing α 0.30

Frequency-axis labels

About Spectrograms & Colormaps

A spectrogram — sometimes called a sonogram or spectrograph — shows three dimensions of audio at once: time on one axis, frequency on another, and magnitude as colour intensity. Formally it is a sequence of short-time Fourier transforms (STFTs): the audio is divided into overlapping windows, each window is transformed with an FFT, and the resulting magnitude spectrum is plotted as one column (or row). This online spectrogram is the single most informative way to look at a complex sound — speech formants, music harmonics, machine noise, bird songs, the click of a key on a keyboard all have distinctive spectrogram "fingerprints" that evolve as an FFT waterfall scrolling frequency over time. In this tool, newest data scrolls in at the top and ages downward.

FFT size, bin width and time resolution

The frequency-axis resolution is Δf = sample_rate / N; the time-axis update rate is the inverse — bigger N means finer frequency lines but each row represents a longer time slice (~ N / sample_rate seconds). Two extremes:

Small FFT (1024 at 48 kHz: 46.9 Hz/bin, ~21 ms/row): sharp time response, blurry frequencies. Good for percussion, transient detection, drum analysis.
Large FFT (16384 at 48 kHz: 2.9 Hz/bin, ~341 ms/row): sharp frequency lines, smeared time. Good for sustained tones, instrument tuning, room modes.

For voice and most music, 4096 is the comfortable middle. This is the same time/frequency tradeoff that signal processors call the uncertainty principle — you cannot resolve both arbitrarily. The choice of window function (this tool uses a Blackman window by default) also affects sidelobe leakage: a rectangular window gives the sharpest time edges but smears energy into adjacent bins; Blackman gives the cleanest frequency lines at the cost of a slightly wider main lobe. As a fully browser spectrogram, everything runs in-tab without any plugin — making it a convenient voice spectrogram online for quick vocal analysis alongside classroom or studio work.

Choosing a colormap

The five built-in colormaps each have a job:

Matrix Green — the site default. Black → dark green → bright green → near-white. Looks cinematic; works well for high-contrast displays.
Viridis — matplotlib's default since 2017. Perceptually uniform across its range (equal brightness steps look equal-sized), colour-blind safe, accurate when printed in grayscale. The right default for scientific work.
Magma — dark, perceptually uniform, with deep purples in low-energy regions. Good for displaying spectrograms on light backgrounds.
Inferno — bright, perceptually uniform, like Magma but warmer. Good for highlighting strong peaks.
Grayscale — simplest possible. Useful for publication-quality figures or when you want users to bring their own colour mapping.

Floor and ceiling dB

The floor (minimum) is the dB level that maps to the colormap's "minimum" colour (black/dark purple); the ceiling (maximum) maps to the "brightest" colour. Lower the floor to see more detail in quiet passages; raise it to suppress noise. Tighten the range (e.g., −60 to −20) for high-contrast displays of strong-signal regions; widen it (−120 to 0) for everything-visible monitoring. The floor/ceiling are the analyser's minDecibels / maxDecibels properties — they control the linear-to-dB mapping internally as well.

Why logarithmic frequency?

Music and speech are pitched roughly logarithmically — each octave is a doubling of frequency. A linear-axis spectrogram puts the entire bass range in 1% of the screen width and gives the unused high frequencies the rest. A logarithmic axis gives each octave equal visual space, which is what musicians, voice analysts, and acousticians want almost always. Use linear only when you specifically need uniform spacing of high-frequency tones. To test your understanding of frequency spacing across octaves, the Octave Band Calculator shows you exactly where the 1/3-octave centre frequencies fall. You can also generate a reference sweep to sweep across the spectrogram with the online tone generator — watching the sweep trace the screen is a great way to calibrate your eye to the log scale.

Frequently Asked Questions

What's the difference between a spectrogram and a spectrum analyzer?

A spectrum analyzer shows the instantaneous spectrum (frequency content of one moment); a spectrogram shows the spectrum's evolution over time, with magnitude as colour. The Audio Spectrum Analyzer tool on this site is the instantaneous view; this Real-Time Spectrogram is the time-evolving view. They use the exact same FFT math under the hood — just different rendering.

How do I interpret the colors?

Brighter = stronger signal at that frequency at that moment. The dB scale next to the cursor info panel shows the mapping. With the Matrix colormap: bright white-green peaks are dominant tones, dim green is background noise, black is silence (below the dB floor). With Viridis: yellow = peak, green = mid, deep purple = floor. The exact dB → colour mapping is controlled by your Floor and Ceiling sliders.

Why does my voice look like horizontal bands?

Those are vocal harmonics. The human voice has a fundamental frequency (90–250 Hz typically) and a series of integer-multiple harmonics extending up to ~10 kHz. The fundamental appears as the lowest bright band; the second, third, fourth, ... harmonics appear at multiples above it. The shaping of those harmonics by the vocal tract creates "formants" — bands of emphasized harmonics around 500 / 1500 / 2500 Hz that define vowel identity. Sing a sustained vowel and watch the harmonics line up; change the vowel without changing pitch to see the formants shift.

Why is the time resolution variable?

Each new row of the spectrogram represents the time it takes to capture and FFT one block of N samples (the FFT size). At a fixed sample rate, larger N = longer time per row. At N=4096 / 48 kHz, each row is ~85 ms. The Analyser rAF rate is independent — typically ~60 fps — but it reads the SAME most-recent N samples each tick. The actual unique-data update rate is set by the audio API's internal block size (usually 128 samples / ~2.7 ms at 48 kHz), independent of N. So the spectrogram visually scrolls smoothly even when N is large; each row just represents the same wider time slice.

Can I export a long recording's spectrogram?

The screenshot button captures the current canvas state — the visible time window. To capture a long passage, freeze at the moment you want to keep, then screenshot. For programmatic long-form analysis, the Waveform Visualizer's file-mode is a better starting point — it shows the entire waveform of an uploaded file at once, and a future enhancement could add spectrogram-of-file. For now, this tool is real-time-only.

Why do high frequencies look dimmer than low ones for music?

Music's natural spectrum typically follows a pink-noise-ish shape: power drops by roughly 3 dB per octave as frequency rises. So 10 kHz content has about 1/10 the power of 1 kHz content even when both sound "balanced" to the ear. The spectrogram shows the raw dB — high frequencies legitimately are dimmer. If you want them visually equalized, raise the floor (suppress more bass) or apply A-weighting (use the Audio Spectrum Analyzer tool, which has weighting — spectrograms with weighted curves are uncommon).

Does the spectrogram capture stereo separately?

No — the AnalyserNode mixes all input channels to a single mono signal before FFT. To see L and R separately you'd need two independent analysers (one per channel). For phase relationships between channels, use the Phase Frequency Analyzer (when built) in this category. The dedicated Stereo Channel Tester (in the Microphone & Diagnostics category) is the right tool for L/R signal comparison.

Can I use this spectrogram to identify bird songs or nature sounds?

Yes — the real-time spectrogram is widely used in bioacoustics for exactly this purpose. Bird vocalisations produce highly distinctive visual patterns: short broadband clicks, tonal whistles (narrow horizontal lines), frequency sweeps (diagonal lines) and complex trills all look different. Hold your device near the sound source, set the FFT size to 4096 and the max frequency to 10 kHz, and use the Viridis or Grayscale colormap for publication-ready captures. The PNG export lets you compare traces against reference sonograms in field guides or databases like Xeno-canto.

What does a musical chord look like on a spectrogram?

A chord produces multiple simultaneous vertical-streak clusters — one per note — each consisting of a fundamental and its overtone series. For example, a C major triad (C, E, G) on a piano creates three harmonic stacks starting at approximately 262 Hz, 330 Hz and 392 Hz, with overtones at integer multiples of each. On the log-frequency axis the visual spacing between roots follows the equal-temperament ratio (each semitone is 2^(1/12) wider than the last). Dissonant intervals produce overtones that are very close together, which sometimes creates visible beating (amplitude modulation) that you can see as periodic bright-dim alternation in a single bin.

How accurate is the frequency reading when I hover the cursor?

The cursor frequency readout is limited by the FFT bin width: Δf = sample_rate / N. At the default FFT size of 4096 and a 48 kHz sample rate that is about 11.7 Hz per bin; at 16384 it narrows to about 2.9 Hz. On the log-frequency canvas the cursor maps from pixel position to frequency using the same log interpolation, so the readout accuracy is roughly ±(Δf / 2). For higher-accuracy pitch measurement, the main Frequency Detector uses autocorrelation and YIN pitch estimation, which can resolve pitch to sub-Hz precision on sustained tones.

Related Tools

← All Signal Processing Tools