Audio Fingerprint Tool
Explore how Shazam-style audio fingerprinting works. Record short audio clips, visualize their spectral peaks as constellation maps, generate fingerprint hashes, and compare two recordings for similarity — all processed locally in your browser.
Audio Fingerprint Tool
Recording A
Recording B (optional)
Algorithm Walkthrough
Click to expand
STFT Spectrogram
The audio is divided into overlapping frames. Each frame is windowed (Hann) and transformed via FFT to produce a time-frequency representation.
Peak Detection
Local maxima in the spectrogram are identified as “constellation points”. These are the most prominent time-frequency features.
Constellation Map
The peaks are plotted on a time-frequency map, creating a unique visual “fingerprint” of the audio.
Hash Pairs
Pairs of nearby peaks are combined into compact hashes. Two recordings of the same audio will produce matching hashes even with background noise.
How to Use the Audio Fingerprint Tool
-
Record or Upload Audio Sample A
Click "Record A" and let the tool capture 3–15 seconds of audio from your microphone. Alternatively, upload an audio file (MP3, WAV, OGG). This becomes the primary fingerprint.
-
Observe the Algorithm Steps
Watch the algorithm breakdown panel as each step completes: the spectrogram is computed, peaks are detected, the constellation map is plotted, and hash pairs are generated. Each step lights up in real time.
-
Explore the Visualizations
The spectrogram shows detected peaks as bright dots overlaid on the frequency-time heatmap. The constellation map below connects nearby peaks with lines, showing the pairs used for hashing.
-
Record Sample B to Compare
Optionally record or upload a second clip. The tool computes a similarity score by comparing the fingerprint hashes of both recordings. The comparison overlay shows matching and differing peaks.
-
Export and Experiment
Use the Export JSON button to save fingerprint data including peaks, hashes, and metadata. Adjust the Peak Density slider to see how threshold changes affect the fingerprint.
Understanding Your Results
Fingerprint Hash
The hash display shows the computed fingerprint as a series of hexadecimal values. Each hash encodes a pair of spectral peaks: their frequencies and the time delta between them. Identical or very similar audio produces matching hashes, which is the foundation of audio recognition.
Constellation Map
The constellation map plots detected spectral peaks as dots on a time-vs-frequency chart, with lines connecting nearby peak pairs. This sparse representation is what makes fingerprinting robust — it captures the most distinctive features of the audio while ignoring noise and volume differences.
Similarity Score
When two recordings are fingerprinted, the tool computes a similarity percentage based on how many hash values match between them. 90–100% indicates the same audio content, 50–89% suggests similar but not identical audio, and below 50% means the recordings are substantially different.
Peak Density
The peak density slider controls how many spectral peaks are detected. A lower density produces fewer, more distinctive peaks — faster matching but less detail. A higher density captures more peaks for better accuracy at the cost of more data.
How Audio Fingerprinting Works
Audio fingerprinting is the technology behind services like Shazam and SoundHound that can identify songs from short recordings, even in noisy environments. The core insight is that a song's identity can be captured by its most prominent spectral features — the specific frequencies that peak at specific moments in time.
Spectral Peak Extraction
The process begins with a short-time Fourier transform (STFT), which divides the audio into overlapping windows (typically 1024–4096 samples) and computes the frequency spectrum of each window. This produces a spectrogram — a 2D map of frequency magnitude over time. From this spectrogram, the algorithm identifies local maxima: points that are louder than their neighbors in both time and frequency. These peaks represent the most energy-dense moments in the audio and are remarkably stable across different recording conditions, volumes, and background noise levels.
Constellation Maps and Combinatorial Hashing
The detected peaks form a constellation map — a sparse scatter plot of (time, frequency)
coordinates. To create searchable fingerprints, the algorithm pairs each peak with several nearby peaks
within a target zone. Each pair is encoded as a compact hash: hash = freq1 + freq2 + time_delta.
This combinatorial hashing approach generates hundreds of hashes per second of audio,
creating a robust fingerprint that can tolerate minor variations. The hash values are independent of
absolute time position, making them suitable for matching against a database regardless of where in the
song the recording started.
Matching and Hamming Distance
To compare two fingerprints, the algorithm counts how many hashes they share. For binary hash representations, the Hamming distance measures the number of differing bits between two hash values. A low Hamming distance between hash sets indicates high similarity. In production systems like Shazam, matching also considers the time coherency of matching hashes — not just that two recordings share the same hashes, but that those hashes occur at consistent time offsets relative to each other. This dramatically reduces false positives and enables identification from recordings as short as 3–5 seconds.
Frequently Asked Questions
What is audio fingerprinting?
Audio fingerprinting is a technique that creates a compact digital summary of an audio signal based on its spectral characteristics. Like a human fingerprint, each audio recording produces a unique pattern of spectral peaks that can be used for identification, even from short or noisy samples.
How does Shazam recognize songs so quickly?
Shazam records a few seconds of audio, extracts spectral peaks to build a constellation map, generates hash values from peak pairs, and searches a database of pre-computed hashes. Because the hashes are compact and time-independent, matching is extremely fast — typically under a second against millions of songs.
How long does the recording need to be?
This tool records 3 to 15 seconds of audio, which is typically enough to generate a meaningful fingerprint with dozens of spectral peaks and hundreds of hash pairs. Shazam-like systems can identify songs from as little as 3 seconds of clear audio. Longer recordings produce more peaks and hashes for higher matching confidence.
What does the similarity score mean?
The similarity score compares the fingerprint hashes of two recordings. A score of 90–100% means the audio is essentially identical. 50–89% suggests similar content (e.g., the same song recorded at different times). Below 50% indicates substantially different audio. The score is based on hash overlap and Hamming distance.
Why do different recordings of the same sound produce different hashes?
Background noise, microphone differences, and slight timing variations cause small changes in which peaks are detected. However, the majority of peaks remain consistent, which is why the similarity score between two recordings of the same source will still be high (typically 60–90%) rather than 100%.
Is my audio data stored or uploaded anywhere?
No. All fingerprint computation happens entirely in your browser using the Web Audio API and JavaScript. No audio data, fingerprints, or hashes are transmitted to any server. The exported JSON file is generated locally on your device.
Related Tools
Frequency Detector
Detect the frequency of any sound in real time with Hz display and spectrum analysis.
Try it →Peak Frequency Detector
Find the dominant frequency peaks in any audio signal with spectral analysis.
Try it →Beat Detection Tool
Detect BPM and beats in real time from your microphone or audio files.
Try it →