Audio Fingerprint Tool

Explore how Shazam-style audio fingerprinting works. Record short audio clips, visualize their spectral peaks as constellation maps, generate fingerprint hashes, and compare two recordings for similarity — all processed locally in your browser.

🔒 Your audio never leaves your device — 100% local processing, zero uploads. Chrome Firefox Safari Edge

Recording A

or Upload File

No recording

Recording B (optional)

or Upload File

No recording

Peak Density 50%

Lower = fewer peaks (distinctive), Higher = more peaks (detailed)

1 Record A 2 Record B R Reset E Export JSON F Freeze

Fingerprint Hash

Record audio to generate fingerprint…

Peaks (A)

—

Peaks (B)

—

Similarity Score

—

Record two clips to compare

Algorithm Breakdown

1

Spectrogram

Short-time FFT converts audio into a time-frequency representation

Waiting

2

Peak Detection

Local maxima are identified in the time-frequency space

Waiting

3

Constellation Map

Peaks plotted as dots on a time vs. frequency chart

Waiting

4

Hash Pairs

Nearby peak pairs combined into compact hash values

Waiting

Spectrogram + Peaks (A)

Constellation Map (A)

Algorithm Walkthrough

Click to expand

1

STFT Spectrogram

The audio is divided into overlapping frames. Each frame is windowed (Hann) and transformed via FFT to produce a time-frequency representation.

2

Peak Detection

Local maxima in the spectrogram are identified as “constellation points”. These are the most prominent time-frequency features.

3

Constellation Map

The peaks are plotted on a time-frequency map, creating a unique visual “fingerprint” of the audio.

4

Hash Pairs

Pairs of nearby peaks are combined into compact hashes. Two recordings of the same audio will produce matching hashes even with background noise.

How to Use the Audio Fingerprint Tool

Record or Upload Audio Sample A

Click "Record A" and let the tool capture 3–15 seconds of audio from your microphone. Alternatively, upload an audio file (MP3, WAV, OGG). This becomes the primary fingerprint.
Observe the Algorithm Steps

Watch the algorithm breakdown panel as each step completes: the spectrogram is computed, peaks are detected, the constellation map is plotted, and hash pairs are generated. Each step lights up in real time.
Explore the Visualizations

The spectrogram shows detected peaks as bright dots overlaid on the frequency-time heatmap. The constellation map below connects nearby peaks with lines, showing the pairs used for hashing.
Record Sample B to Compare

Optionally record or upload a second clip. The tool computes a similarity score by comparing the fingerprint hashes of both recordings. The comparison overlay shows matching and differing peaks.
Export and Experiment

Use the Export JSON button to save fingerprint data including peaks, hashes, and metadata. Adjust the Peak Density slider to see how threshold changes affect the fingerprint.

Understanding Your Results

Fingerprint Hash

The hash display shows the computed fingerprint as a series of hexadecimal values. Each hash encodes a pair of spectral peaks: their frequencies and the time delta between them. Identical or very similar audio produces matching hashes, which is the foundation of audio recognition.

Constellation Map

The constellation map plots detected spectral peaks as dots on a time-vs-frequency chart, with lines connecting nearby peak pairs. This sparse representation is what makes fingerprinting robust — it captures the most distinctive features of the audio while ignoring noise and volume differences.

Similarity Score

When two recordings are fingerprinted, the tool computes a similarity percentage based on how many hash values match between them. 90–100% indicates the same audio content, 50–89% suggests similar but not identical audio, and below 50% means the recordings are substantially different.

Peak Density

The peak density slider controls how many spectral peaks are detected. A lower density produces fewer, more distinctive peaks — faster matching but less detail. A higher density captures more peaks for better accuracy at the cost of more data.

How Audio Fingerprinting Works

Audio fingerprinting is the technology behind services like Shazam and SoundHound that can identify songs from short recordings, even in noisy environments. The Shazam algorithm, and others like it, work by capturing a song's identity through its most prominent spectral features — the specific frequencies that peak at specific moments in time. Related approaches power music recognition, broadcast monitoring for royalty tracking, and content-ID systems on video platforms.

Spectral Peak Extraction

The process begins with a short-time Fourier transform (STFT), which divides the audio into overlapping windows (typically 1024–4096 samples) and computes the frequency spectrum of each window. This produces a spectrogram — a 2D map of frequency magnitude over time. From this spectrogram, the algorithm identifies local maxima: points that are louder than their neighbors in both time and frequency. These peaks represent the most energy-dense moments in the audio and are remarkably stable across different recording conditions, volumes, and background noise levels. You can explore the underlying frequency spectrum using the audio spectrum analyzer to see how a signal's energy is distributed before the peak-detection step runs.

Constellation Maps and Combinatorial Hashing

The detected peaks form a constellation map — a sparse scatter plot of (time, frequency) coordinates. To create searchable fingerprints, the algorithm pairs each peak with several nearby peaks within a target zone. Each pair is encoded as a compact hash: hash = freq1 + freq2 + time_delta. This combinatorial hashing approach — the core of audio hashing — generates hundreds of hashes per second of audio, creating a robust fingerprint that can tolerate minor variations. The hash values are independent of absolute time position, making them suitable for audio matching against a database regardless of where in the song the recording started. This is also why the method is robust to pitch-shifting or tempo changes that alter absolute frequencies — the relative peak relationships remain distinctive.

Matching and Hamming Distance

To compare two fingerprints, the algorithm counts how many hashes they share — a fingerprint comparison based on hash overlap. For binary hash representations, the Hamming distance measures the number of differing bits between two hash values. A low Hamming distance between hash sets indicates high similarity. In production systems like Shazam, matching also considers the time coherency of matching hashes — not just that two recordings share the same hashes, but that those hashes occur at consistent time offsets relative to each other. This dramatically reduces false positives and enables identification from recordings as short as 3–5 seconds. If you want to inspect the raw dominant peaks in a signal before fingerprinting, the peak frequency detector isolates individual spectral maxima interactively.

Frequently Asked Questions

What is audio fingerprinting?

Audio fingerprinting is a technique that creates a compact digital summary of an audio signal based on its spectral characteristics. Like a human fingerprint, each audio recording produces a unique pattern of spectral peaks that can be used for identification, even from short or noisy samples.

How does Shazam recognize songs so quickly?

Shazam records a few seconds of audio, extracts spectral peaks to build a constellation map, generates hash values from peak pairs, and searches a database of pre-computed hashes. Because the hashes are compact and time-independent, matching is extremely fast — typically under a second against millions of songs.

How long does the recording need to be?

This tool records 3 to 15 seconds of audio, which is typically enough to generate a meaningful fingerprint with dozens of spectral peaks and hundreds of hash pairs. Shazam-like systems can identify songs from as little as 3 seconds of clear audio. Longer recordings produce more peaks and hashes for higher matching confidence.

What does the similarity score mean?

The similarity score compares the fingerprint hashes of two recordings. A score of 90–100% means the audio is essentially identical. 50–89% suggests similar content (e.g., the same song recorded at different times). Below 50% indicates substantially different audio. The score is based on hash overlap and Hamming distance.

Why do different recordings of the same sound produce different hashes?

Background noise, microphone differences, and slight timing variations cause small changes in which peaks are detected. However, the majority of peaks remain consistent, which is why the similarity score between two recordings of the same source will still be high (typically 60–90%) rather than 100%.

Is my audio data stored or uploaded anywhere?

No. All fingerprint computation happens entirely in your browser using the Web Audio API and JavaScript. No audio data, fingerprints, or hashes are transmitted to any server. The exported JSON file is generated locally on your device.

Can this tool identify a song from a recording?

No — this is an educational demo, not a song-identification service. It has no database of pre-computed fingerprints to search against. It shows you how audio fingerprinting algorithms work (spectrogram, peak extraction, constellation map, hash pairing) and lets you compare two clips you supply yourself. To identify a song, use a service like Shazam, which maintains a database of millions of pre-indexed fingerprints.

What exactly is a constellation map in audio fingerprinting?

A constellation map is a sparse scatter plot of spectral peak coordinates — each dot represents a (time, frequency) point where the audio had unusually high energy. The name comes from the visual resemblance to stars on a star chart. Because only the most prominent peaks are kept, the map is compact and noise-resistant. The lines you see connecting dots in this tool's visualization show the peak pairs used to form hash values.

How does background noise affect the audio fingerprint?

Moderate background noise typically reduces the similarity score between two recordings of the same source but does not destroy recognition. The key peaks — the loudest spectral features of the target audio — usually remain dominant even with ambient noise present. Very loud or spectrally dense noise (e.g., crowd noise, machinery) can mask quieter peaks and lower the score significantly. Quieter environments and closer microphone placement improve fingerprint accuracy.

🎙️