Sound Visualization 3D Tool

Turn live sound or an uploaded audio file into a real-time 3D visualization. Choose a perspective spectrogram waterfall (frequency × time surface scrolling back in depth), a reactive particle field that pulses with band energy, or a morphing spectrum sphere of radial bars. Rendered with WebGL where supported, with an automatic 2D-canvas fallback.

⚠ This is a visualization for insight and enjoyment, not a measurement. Heights, colours and motion are driven by relative, uncalibrated FFT magnitude (dBFS) and reflect the combined source + room + microphone chain — not a calibrated level. Microphone audio and any file you load are analysed locally in your browser and never recorded or uploaded. Auto-gain, noise suppression and echo cancellation are requested off for the mic.

100% browser-based — microphone audio and uploaded files never leave your device.

Idle — choose a source, then press Start (microphone) or load a file.

Detecting renderer…

Visual gain only — scales heights/motion, does not change the audio level.
Higher = smoother, calmer motion; lower = snappier and more reactive.
Auto-rotate camera
Renderer
Frame rate
Peak frequency
Overall level
Bass / Mid / Treble
Sample rate

How to Use the Sound Visualization 3D Tool

  1. Pick a source. Use the Microphone tab to visualize live sound, or the Audio file tab to drop in a track. Files are decoded locally — nothing is uploaded.
  2. Start it. For the microphone, press Start Listening and allow access when your browser asks. For a file, drop or browse for it, then press Play.
  3. Choose a mode. Switch between the 3D waterfall, particle field and spectrum sphere at any time. The geometry resolution is fixed, so heavy modes never slow your tab to a crawl.
  4. Tune the look. Adjust colour scheme, FFT detail, visual sensitivity and motion smoothing. Toggle auto-rotate to spin the camera, or leave it off for a steady view.
  5. Capture it. Press PNG to download a snapshot of the current frame. Everything stays on your device.

Understanding What You See

Every mode is driven by the same data: a Fast Fourier Transform (FFT) of the incoming audio, taken many times per second. The FFT splits each short slice of sound into frequency bins, and the magnitude of each bin — how much energy sits at that frequency — drives the visuals. Because the values come from a browser AnalyserNode, they are relative dBFS (decibels relative to full scale), not absolute sound-pressure level. Treat heights and brightness as a picture of the spectrum's shape, not a calibrated reading.

3D Spectrogram Waterfall

The waterfall is a perspective-projected surface where the left–right axis is frequency (low on the left, high on the right, on a logarithmic scale so each octave gets fair visual space), depth is time (the newest slice is at the front and older slices recede into the distance), and height plus colour is magnitude. Sustained tones appear as ridges running back into depth; transients appear as short bright crests. It is the same information a flat 2D spectrogram shows, lifted into a tilted 3D landscape.

Reactive Particle Field

A fixed cloud of particles is arranged in a 3D ring. Each particle is assigned a frequency band; when that band has energy, the particle pushes outward and brightens, so the field "breathes" with the music. The band-to-position layout is fixed in code, not ordered by pitch — particles are not sorted bass-inward and treble-outward. The particle count is fixed too — it never grows with the FFT size or any number you type — so the animation stays smooth on modest hardware.

Pulsing Spectrum Sphere

Radial bars are placed around a sphere; each bar's length tracks one frequency band, so the whole shape morphs and pulses with the spectrum. Loud, full-range sound inflates the sphere evenly; a thin or bass-heavy mix leaves it lopsided.

The live readouts

Below the canvas, the readouts report the active renderer (WebGL or the 2D fallback), the current frame rate, the peak frequency and its musical neighbourhood, the overall relative level, and a bass / mid / treble split. These are diagnostic conveniences computed from the same FFT — the peak frequency is accurate to the FFT bin width, but the levels remain relative and uncalibrated.

How It Works

When you start the microphone, the tool requests raw audio with auto-gain, noise suppression and echo cancellation turned off (browsers and operating systems may still apply processing they don't expose). The stream is connected to a Web Audio AnalyserNode only — never to your speakers — so there is no feedback and nothing is played back. For file mode, the file is decoded with AudioContext.decodeAudioData entirely in memory; playback routes through a gain node to your speakers, while a parallel analyser feeds the visuals.

Each animation frame, the analyser hands back the current frequency magnitudes. The tool detects WebGL support at load time: if a WebGL context is available it uploads vertex data to the GPU and renders the 3D geometry with a hand-written vertex/fragment shader pair (no Three.js, no external library — raw WebGL). If WebGL is unavailable or the context fails to create, it transparently falls back to a hand-rolled perspective projection drawn on a 2D canvas, and the renderer readout says exactly which one is active. Either way, the visible 3D-ness is genuine perspective math.

Bounded by design. The waterfall grid, the particle count and the sphere's bar count are all fixed constants in the code. No slider or typed value can inflate them, so the tool cannot accidentally freeze your browser by drawing millions of elements. The FFT size you pick changes frequency detail, not geometry size — internally the spectrum is resampled onto the fixed grid. Long files are visualized frame-by-frame at the audio's own sample rate. The visual frequency axis tops out around 20 kHz (the upper edge of human hearing), or the Nyquist limit (half the sample rate) when that is lower; the underlying peak-frequency readout still scans up to that 20 kHz cap.

On Stop, on closing the tab, and when the page is hidden, every microphone track is stopped and the audio context is closed; the WebGL buffers, textures and program are deleted; and the animation loop is cancelled. A generation token guards against a Stop racing a still-pending permission request, so a microphone stream that arrives late is immediately released and never used.

Frequently Asked Questions

Is this a measurement tool?
No. It is a visualization for insight and enjoyment. The heights, colours and motion are driven by relative, uncalibrated FFT magnitude (dBFS), and they reflect the combined sound source, room and microphone — not a single calibrated level. For real acoustic measurements (SPL, calibrated spectra), use a measurement microphone and a dedicated analyzer. The peak-frequency readout is accurate to the FFT bin width; the level readouts are relative only.
Does it use real WebGL or fake 3D?
It uses real WebGL when your browser supports it, with hand-written shaders and no external 3D library. If WebGL is unavailable or the context fails to create, it falls back to a genuine hand-coded perspective projection drawn on a 2D canvas. The renderer readout below the canvas always tells you which one is currently active, so the copy never claims WebGL when it is not running.
Is my microphone or audio file uploaded anywhere?
No. Everything runs in your browser. Microphone audio is connected only to an analyser (never to your speakers and never to a server) and is discarded frame by frame. Uploaded files are decoded locally with the Web Audio API and never leave your device. Nothing is recorded or transmitted.
Can a big file or a large FFT size freeze my tab?
No. The waterfall grid, particle count and sphere bar count are fixed constants, independent of FFT size, file length or any value you choose. The FFT size only changes frequency detail; the spectrum is resampled onto the fixed geometry. So the renderer always draws the same bounded number of elements per frame, regardless of input.
What does the FFT size actually change?
It sets how finely the frequency axis is divided. A larger FFT gives narrower bins (sharper frequency resolution) but each analysis slice covers a longer time, so fast changes smear slightly. A smaller FFT reacts faster in time but blurs nearby frequencies. For music, 2048 is a comfortable middle; raise it for sustained tones, lower it for percussive material.
Why does the highest part of the spectrum look empty?
The visual frequency axis is capped at about 20 kHz (the top of human hearing), so the very top of the Nyquist range — half the sample rate, about 22–24 kHz for typical 44.1–48 kHz audio — is intentionally not plotted. Within the shown band, most everyday sound also has far less energy at high frequencies, following a roughly pink (−3 dB/octave) tilt, so the treble end legitimately appears dimmer and sparser. Raising the sensitivity makes faint high content more visible.
The visuals barely move — what's wrong?
Usually the sound is too quiet for the microphone, or motion smoothing is set very high. Try raising the sensitivity slider, lowering the smoothing slider for snappier motion, moving closer to the sound source, or checking that your operating system isn't muting or heavily processing the input. In file mode, make sure playback is actually running (the Play button should read Pause).