3D Audio Visualizer — Sound Visualization Tool

This 3D audio visualizer turns live sound or an uploaded audio file into a real-time 3D visualization. Whether you want a music visualizer for a track or a scientific look at a live microphone feed, choose from a perspective spectrogram waterfall (frequency × time surface scrolling back in depth), a reactive particle field that pulses with band energy, or a morphing spectrum sphere of radial bars. Rendered with WebGL where supported, with an automatic 2D-canvas fallback.

⚠ This is a visualization for insight and enjoyment, not a measurement. Heights, colours and motion are driven by relative, uncalibrated FFT magnitude (dBFS) and reflect the combined source + room + microphone chain — not a calibrated level. Microphone audio and any file you load are analysed locally in your browser and never recorded or uploaded. Auto-gain, noise suppression and echo cancellation are requested off for the mic.

100% browser-based — microphone audio and uploaded files never leave your device.

Microphone

Idle — choose a source, then press Start (microphone) or load a file.

Detecting renderer…

Visualization mode

Colour scheme

FFT size (detail)

Sensitivity / gain 1.4× Visual gain only — scales heights/motion, does not change the audio level.

Motion smoothing 0.80 Higher = smoother, calmer motion; lower = snappier and more reactive.

Auto-rotate camera

Renderer

—

Frame rate

—

Peak frequency

—

Overall level

—

Bass / Mid / Treble

—

Sample rate

—

How to Use the Sound Visualization 3D Tool

Pick a source. Use the Microphone tab to visualize live sound, or the Audio file tab to drop in a track. Files are decoded locally — nothing is uploaded.
Start it. For the microphone, press Start Listening and allow access when your browser asks. For a file, drop or browse for it, then press Play.
Choose a mode. Switch between the 3D waterfall, particle field and spectrum sphere at any time. The geometry resolution is fixed, so heavy modes never slow your tab to a crawl.
Tune the look. Adjust colour scheme, FFT detail, visual sensitivity and motion smoothing. Toggle auto-rotate to spin the camera, or leave it off for a steady view.
Capture it. Press PNG to download a snapshot of the current frame. Everything stays on your device.

Understanding What You See

Every mode is driven by the same data: a Short-Time Fourier Transform (STFT) of the incoming audio, computed many times per second as a sliding-window FFT. The FFT splits each short slice of sound into frequency bins, and the magnitude of each bin — how much energy sits at that frequency — drives the visuals. Because the values come from a browser AnalyserNode, they are relative dBFS (decibels relative to full scale), not absolute sound-pressure level. The spectral centroid — the frequency-weighted average of the spectrum — is what gives a sound its perceived "brightness," and you can see it shift as the balance of high vs. low content changes across the modes. Treat heights and brightness as a picture of the spectrum's shape, not a calibrated reading. For a flat 2D view of the same data, the real-time spectrogram plots frequency vs. time on a scrolling heatmap.

3D Spectrogram Waterfall

The waterfall is a perspective-projected surface where the left–right axis is frequency (low on the left, high on the right, on a logarithmic frequency scale so each octave gets fair visual space), depth is time (the newest slice is at the front and older slices recede into the distance), and height plus colour is magnitude. Sustained tones appear as ridges running back into depth; transients appear as short bright crests. Harmonic overtones show up as a series of parallel ridges evenly spaced on the log axis, which makes it easy to identify pitched instruments during real-time audio analysis. It is the same information a flat 2D spectrogram shows, lifted into a tilted 3D landscape.

Reactive Particle Field

A fixed cloud of particles is arranged in a 3D ring. Each particle is assigned a frequency band; when that band has energy, the particle pushes outward and brightens, so the field "breathes" with the music. The band-to-position layout is fixed in code, not ordered by pitch — particles are not sorted bass-inward and treble-outward. The particle count is fixed too — it never grows with the FFT size or any number you type — so the animation stays smooth on modest hardware.

Pulsing Spectrum Sphere

Radial bars are placed around a sphere; each bar's length tracks one frequency band, so the whole shape morphs and pulses with the spectrum. Loud, full-range sound inflates the sphere evenly; a thin or bass-heavy mix leaves it lopsided.

The live readouts

Below the canvas, the readouts report the active renderer (WebGL or the 2D fallback), the current frame rate, the peak frequency and its musical neighbourhood, the overall relative level, and a bass / mid / treble split. These are diagnostic conveniences computed from the same FFT — the peak frequency is accurate to the FFT bin width, but the levels remain relative and uncalibrated.

How It Works

When you start the microphone, the tool requests raw audio with auto-gain, noise suppression and echo cancellation turned off (browsers and operating systems may still apply processing they don't expose). The stream is connected to a Web Audio AnalyserNode only — never to your speakers — so there is no feedback and nothing is played back. For file mode, the file is decoded with AudioContext.decodeAudioData entirely in memory; playback routes through a gain node to your speakers, while a parallel analyser feeds the visuals. If you need to inspect the raw time-domain waveform rather than the frequency spectrum, the waveform visualizer shows the oscilloscope view of the same audio stream alongside a triggering control.

Each animation frame, the analyser hands back the current frequency magnitudes. The tool detects WebGL support at load time: if a WebGL context is available it uploads vertex data to the GPU and renders the 3D geometry with a hand-written vertex/fragment shader pair (no Three.js, no external library — raw WebGL). If WebGL is unavailable or the context fails to create, it transparently falls back to a hand-rolled perspective projection drawn on a 2D canvas, and the renderer readout says exactly which one is active. Either way, the visible 3D-ness is genuine perspective math. All of this runs entirely in your browser — browser audio visualization with no plugins, no server, and no data leaving your device.

Bounded by design. The waterfall grid, the particle count and the sphere's bar count are all fixed constants in the code. No slider or typed value can inflate them, so the tool cannot accidentally freeze your browser by drawing millions of elements. The FFT size you pick changes frequency detail, not geometry size — internally the spectrum is resampled onto the fixed grid. Long files are visualized frame-by-frame at the audio's own sample rate. The visual frequency axis tops out around 20 kHz (the upper edge of human hearing), or the Nyquist limit (half the sample rate) when that is lower; the underlying peak-frequency readout still scans up to that 20 kHz cap.

On Stop, on closing the tab, and when the page is hidden, every microphone track is stopped and the audio context is closed; the WebGL buffers, textures and program are deleted; and the animation loop is cancelled. A generation token guards against a Stop racing a still-pending permission request, so a microphone stream that arrives late is immediately released and never used.

Frequently Asked Questions

Is this a measurement tool?

No. It is a visualization for insight and enjoyment. The heights, colours and motion are driven by relative, uncalibrated FFT magnitude (dBFS), and they reflect the combined sound source, room and microphone — not a single calibrated level. For real acoustic measurements (SPL, calibrated spectra), use a measurement microphone and a dedicated analyzer. The peak-frequency readout is accurate to the FFT bin width; the level readouts are relative only.

Does it use real WebGL or fake 3D?

It uses real WebGL when your browser supports it, with hand-written shaders and no external 3D library. If WebGL is unavailable or the context fails to create, it falls back to a genuine hand-coded perspective projection drawn on a 2D canvas. The renderer readout below the canvas always tells you which one is currently active, so the copy never claims WebGL when it is not running.

Is my microphone or audio file uploaded anywhere?

No. Everything runs in your browser. Microphone audio is connected only to an analyser (never to your speakers and never to a server) and is discarded frame by frame. Uploaded files are decoded locally with the Web Audio API and never leave your device. Nothing is recorded or transmitted.

Can a big file or a large FFT size freeze my tab?

No. The waterfall grid, particle count and sphere bar count are fixed constants, independent of FFT size, file length or any value you choose. The FFT size only changes frequency detail; the spectrum is resampled onto the fixed geometry. So the renderer always draws the same bounded number of elements per frame, regardless of input.

What does the FFT size actually change?

It sets how finely the frequency axis is divided. A larger FFT gives narrower bins (sharper frequency resolution) but each analysis slice covers a longer time, so fast changes smear slightly. A smaller FFT reacts faster in time but blurs nearby frequencies. For music, 2048 is a comfortable middle; raise it for sustained tones, lower it for percussive material.

Why does the highest part of the spectrum look empty?

The visual frequency axis is capped at about 20 kHz (the top of human hearing), so the very top of the Nyquist range — half the sample rate, about 22–24 kHz for typical 44.1–48 kHz audio — is intentionally not plotted. Within the shown band, most everyday sound also has far less energy at high frequencies, following a roughly pink (−3 dB/octave) tilt, so the treble end legitimately appears dimmer and sparser. Raising the sensitivity makes faint high content more visible.

The visuals barely move — what's wrong?

Usually the sound is too quiet for the microphone, or motion smoothing is set very high. Try raising the sensitivity slider, lowering the smoothing slider for snappier motion, moving closer to the sound source, or checking that your operating system isn't muting or heavily processing the input. In file mode, make sure playback is actually running (the Play button should read Pause).

Which visualization mode is best for music vs. speech vs. noise?

The 3D Spectrogram Waterfall suits music analysis best: sustained notes form ridges you can follow through time, rhythmic transients appear as sharp crests, and harmonic series reveal themselves as parallel ridges spaced by the fundamental. The Reactive Particle Field is the most visually engaging for general playback and live performance — it reads as "feel" more than detail. The Pulsing Spectrum Sphere gives an at-a-glance sense of spectral balance, so it works well for comparing the bass/mid/treble weight of different mixes. For speech, the waterfall makes formants and vowel transitions visible; white noise or pink noise produces a uniformly bright, nearly featureless surface across all three modes.

Can I use this with a DAW or desktop audio rather than a microphone?

Yes, indirectly. Many operating systems let you route desktop audio to a virtual input device (for example, Stereo Mix on Windows, or Blackhole / Loopback on macOS) that then appears as a microphone input in the browser. Select that virtual device from the Microphone dropdown before pressing Start Listening. This lets you visualize DAW output, streaming audio, or system sounds without physically recording the room. The tool itself only sees whatever audio the browser reports as a microphone stream.

Does the tool work on a phone or tablet?

It runs on modern mobile browsers (Chrome for Android, Safari on iOS 14.5+), but performance depends heavily on the device. WebGL is broadly supported on current phones, though the GPU power available is lower than a desktop, and a large FFT size can strain slower processors. If the frame rate readout drops well below 30 fps, try switching to a smaller FFT size (1024), reducing motion smoothing, or using a simpler visualization mode. The canvas also works in the 2D fallback on devices where WebGL is blocked or unavailable.

Related Tools

← All Signal Processing Tools