Sound Visualization 3D Tool
Turn live sound or an uploaded audio file into a real-time 3D visualization. Choose a perspective spectrogram waterfall (frequency × time surface scrolling back in depth), a reactive particle field that pulses with band energy, or a morphing spectrum sphere of radial bars. Rendered with WebGL where supported, with an automatic 2D-canvas fallback.
⚠ This is a visualization for insight and enjoyment, not a measurement. Heights, colours and motion are driven by relative, uncalibrated FFT magnitude (dBFS) and reflect the combined source + room + microphone chain — not a calibrated level. Microphone audio and any file you load are analysed locally in your browser and never recorded or uploaded. Auto-gain, noise suppression and echo cancellation are requested off for the mic.
Idle — choose a source, then press Start (microphone) or load a file.
Detecting renderer…
How to Use the Sound Visualization 3D Tool
- Pick a source. Use the Microphone tab to visualize live sound, or the Audio file tab to drop in a track. Files are decoded locally — nothing is uploaded.
- Start it. For the microphone, press Start Listening and allow access when your browser asks. For a file, drop or browse for it, then press Play.
- Choose a mode. Switch between the 3D waterfall, particle field and spectrum sphere at any time. The geometry resolution is fixed, so heavy modes never slow your tab to a crawl.
- Tune the look. Adjust colour scheme, FFT detail, visual sensitivity and motion smoothing. Toggle auto-rotate to spin the camera, or leave it off for a steady view.
- Capture it. Press PNG to download a snapshot of the current frame. Everything stays on your device.
Understanding What You See
Every mode is driven by the same data: a Fast Fourier Transform (FFT) of the incoming audio, taken many times per second. The FFT splits each short slice of sound into frequency bins, and the magnitude of each bin — how much energy sits at that frequency — drives the visuals. Because the values come from a browser AnalyserNode, they are relative dBFS (decibels relative to full scale), not absolute sound-pressure level. Treat heights and brightness as a picture of the spectrum's shape, not a calibrated reading.
3D Spectrogram Waterfall
The waterfall is a perspective-projected surface where the left–right axis is frequency (low on the left, high on the right, on a logarithmic scale so each octave gets fair visual space), depth is time (the newest slice is at the front and older slices recede into the distance), and height plus colour is magnitude. Sustained tones appear as ridges running back into depth; transients appear as short bright crests. It is the same information a flat 2D spectrogram shows, lifted into a tilted 3D landscape.
Reactive Particle Field
A fixed cloud of particles is arranged in a 3D ring. Each particle is assigned a frequency band; when that band has energy, the particle pushes outward and brightens, so the field "breathes" with the music. The band-to-position layout is fixed in code, not ordered by pitch — particles are not sorted bass-inward and treble-outward. The particle count is fixed too — it never grows with the FFT size or any number you type — so the animation stays smooth on modest hardware.
Pulsing Spectrum Sphere
Radial bars are placed around a sphere; each bar's length tracks one frequency band, so the whole shape morphs and pulses with the spectrum. Loud, full-range sound inflates the sphere evenly; a thin or bass-heavy mix leaves it lopsided.
The live readouts
Below the canvas, the readouts report the active renderer (WebGL or the 2D fallback), the current frame rate, the peak frequency and its musical neighbourhood, the overall relative level, and a bass / mid / treble split. These are diagnostic conveniences computed from the same FFT — the peak frequency is accurate to the FFT bin width, but the levels remain relative and uncalibrated.
How It Works
When you start the microphone, the tool requests raw audio with auto-gain, noise suppression and echo cancellation turned off (browsers and operating systems may still apply processing they don't expose). The stream is connected to a Web Audio AnalyserNode only — never to your speakers — so there is no feedback and nothing is played back. For file mode, the file is decoded with AudioContext.decodeAudioData entirely in memory; playback routes through a gain node to your speakers, while a parallel analyser feeds the visuals.
Each animation frame, the analyser hands back the current frequency magnitudes. The tool detects WebGL support at load time: if a WebGL context is available it uploads vertex data to the GPU and renders the 3D geometry with a hand-written vertex/fragment shader pair (no Three.js, no external library — raw WebGL). If WebGL is unavailable or the context fails to create, it transparently falls back to a hand-rolled perspective projection drawn on a 2D canvas, and the renderer readout says exactly which one is active. Either way, the visible 3D-ness is genuine perspective math.
Bounded by design. The waterfall grid, the particle count and the sphere's bar count are all fixed constants in the code. No slider or typed value can inflate them, so the tool cannot accidentally freeze your browser by drawing millions of elements. The FFT size you pick changes frequency detail, not geometry size — internally the spectrum is resampled onto the fixed grid. Long files are visualized frame-by-frame at the audio's own sample rate. The visual frequency axis tops out around 20 kHz (the upper edge of human hearing), or the Nyquist limit (half the sample rate) when that is lower; the underlying peak-frequency readout still scans up to that 20 kHz cap.
On Stop, on closing the tab, and when the page is hidden, every microphone track is stopped and the audio context is closed; the WebGL buffers, textures and program are deleted; and the animation loop is cancelled. A generation token guards against a Stop racing a still-pending permission request, so a microphone stream that arrives late is immediately released and never used.