Local Whisper

Tap to choose an audio file Or drop a file anywhere on the page · MP3, WAV, M4A, WebM, OGG…

Selected file

Duration

Model size

English Whisper in fp32. Download sizes shown are approximate (HF ONNX artifacts + tokenizer); each model caches locally after first use and may drift slightly if the hub or Transformers updates.

Inference

Default uses the CPU via WebAssembly. WebGPU runs on the GPU when your browser supports it (Chromium). First run still downloads weights to the local cache.

Start (m:ss)

End (m:ss)

Start time End time

Technical details

About

Local Whisper is a small web app that turns audio files into text using OpenAI’s Whisper speech-recognition model. Everything runs locally in your tab: your recording or file is not sent to our servers for transcription.

How it works

The app loads an English Whisper checkpoint (Tiny, Base, or Small) through Transformers.js. ONNX weights are fetched from Hugging Face the first time you pick a model, then stored in the browser’s cache. Inference runs in a background worker — either on the CPU via WebAssembly or on the GPU via WebGPU when your browser supports it. Long files are handled in overlapping time windows so Whisper can stream partial text while it works through the clip.

How to use

Choose or drop an audio file (MP3, WAV, M4A, WebM, OGG, FLAC, Opus).

Pick a model size: smaller downloads and runs faster; larger tends to be more accurate.

Select CPU (WASM) or GPU (WebGPU) if offered.

Optionally narrow the transcription range or adjust segment length for how often the live transcript refreshes.

Press Start transcription, wait for processing, then read or copy the transcript.

FAQ

Does my audio get uploaded?

Transcription happens in your browser. Audio you select stays on your device for processing. Model files are downloaded from Hugging Face’s CDN into your browser cache (like loading a heavy static site asset), not uploaded as playable audio for cloud transcription.

Why is the first run slow?

The first time you use a given model size, the app downloads ONNX weights (hundreds of MB for larger checkpoints). Later visits reuse the cached files, so startup is much quicker.

What does WebAssembly do here?

WebAssembly (WASM) is portable bytecode that runs in a browser sandbox. The ONNX runtime uses it so inference can execute on the CPU without a plug-in when you choose the WASM option.

What is WebGPU?

WebGPU is a browser API that gives pages access to the GPU through a unified interface. Chromium-based browsers can use it to accelerate ONNX inference on the GPU where supported.

WebGPU vs CPU — what should I choose?

Try GPU (WebGPU) in Chromium-based browsers if it’s stable on your machine — it often reduces wall-clock time once the model is cached. Fall back to CPU (WebAssembly) if WebGPU fails, isn’t supported, or uses too much GPU memory elsewhere. Use the live transcript stats (words/min and audio/time ratio) to benchmark each mode on your device and pick the faster, steadier one.

Which languages are supported?

This build uses English-tuned Whisper checkpoints (.en models).