A phone captures audio and runs a Fast Fourier Transform (FFT) on short windows. It builds a spectrogram and extracts peaks. Nearby peak pairs form compact hashes (two frequencies + time delta). An inverted index maps those hashes to songs, and timing validates matches.
Most services run lookups on servers against vast databases. On-device systems trade coverage for lower latency, better privacy, and curated models.










