Building Ramble #1: Taming the Microphone
The surprising complexity behind asking users for microphone permission on the Web
- Building Ramble #1: Taming the Microphone
- Building Ramble #2: Capturing Audio in Real-Time (soon)
- Building Ramble #3: Visualizing the Waveform (soon)
While building Ramble , we needed to handle microphone access for voice input. The logic started as a few lines directly inside the component—call getUserMedia to request permission and get an audio stream, done. But as we kept running into browser quirks and edge cases, it grew complex enough that we extracted it into a dedicated hook called useMicrophone.
This write-up covers the interesting challenges involved: handling permissions across different browsers, enumerating and selecting microphone devices, persisting user preferences across sessions, and managing media streams without leaking resources.
Getting Permissions Right
The browser permission model looks straightforward until you actually use it. We ended up with five distinct states:
type MicrophonePermissionState =
| undefined // Still loading, haven't checked yet
| 'prompt' // Ready to ask, user hasn't been prompted
| 'granted' // User allowed microphone access
| 'denied' // User blocked microphone access
| 'not-found' // No microphone detected on the device
Why so many? Because “denied” and “no microphone” require completely different UI and messaging. And undefined lets us distinguish “still loading” from 'prompt' which means “ready to ask the user”, so we know when to show loading states versus the actual permission request button.
The trickiest part was Chromium’s permission prompt. When a user dismisses the prompt (clicks outside it or presses Escape), getUserMedia throws a NotAllowedError. The same error you get when they explicitly click “Block.” 🤨
The only way to tell them apart is to query navigator.permissions after the error. If it returns 'prompt', they dismissed it. If it returns 'denied', they blocked it. Firefox doesn’t have this quirk, so we only apply this logic on Chromium browsers.
Firefox has its own quirk. When a user temporarily blocks access, the permission API might still return 'prompt' instead of 'denied'. So we can’t always tell if permission was never requested or just temporarily denied. We accept this ambiguity rather than trigger an unsolicited prompt to find out.
One firm requirement: never trigger a permission prompt on page load. Users should explicitly initiate the request. So on mount, we silently check the current permission state through navigator.permissions.query() without prompting. Only when the user clicks a button that requires microphone access do we call getUserMedia, which triggers the browser’s permission dialog.
Making Device Selection Work
Once we had permissions sorted, we needed device enumeration. This is where Windows threw us a curveball 🫠
Calling navigator.mediaDevices.enumerateDevices() on Windows returns devices with IDs like 'default' and 'communications'. These aren’t real microphones. They’re system aliases that duplicate actual devices in the list. We filter them out to avoid confusing users with duplicate entries.
But here’s another gotcha: the first device in the enumerated list isn’t necessarily the system default. The browser returns devices in an arbitrary order. To find the actual default, we create a temporary stream with getUserMedia({ audio: true }), check which device it picked, then immediately stop the stream. A bit wasteful, but it’s the only reliable way.
const tempStream = await navigator.mediaDevices.getUserMedia({ audio: true })
const defaultDeviceId = tempStream.getAudioTracks()[0]?.getSettings().deviceId
tempStream.getTracks().forEach((track) => track.stop())
We then reorder our device list to put the real default first. Users expect the top option to be their system default, and now it actually is.
Remembering Preferences Across Sessions
Users who pick a microphone other than the default don’t want to re-select it every time they open Ramble. Persistence seemed simple: store the device ID in localStorage, restore it on load.
Except device IDs aren’t stable. Browsers can regenerate them. Plugging a device into a different USB port might change it. Even browser updates can shuffle them around.
Labels are more reliable. A “Blue Yeti” is still called “Blue Yeti” regardless of which port it’s in. So we store the label instead of the ID.
But labels have their own quirks. Chrome loves appending technical suffixes:
-
MacBook Pro Microphone (1234abcd-5678-90ef) -
External Mic (Default)
We clean these before storing and comparing. A simple regex strips hex IDs, (Default) markers, and numeric suffixes:
const suffixPattern = /\s*\((?:[0-9a-f]+[-:][0-9a-f]+|Default|\d+-\d+)\)$/i
label.replace(suffixPattern, '').trim()
Even with clean labels, exact matching isn’t enough. A browser update might slightly change how a device is named. So we implemented fuzzy matching: exact match first, then case-insensitive, then word-based similarity scoring. If at least 50% of words match, we consider it the same device.
function findBestLabelMatch(savedLabel, availableDevices) {
// Try exact match first, then case-insensitive...
// Finally, fuzzy match based on shared words
for (const device of availableDevices) {
const savedWords = savedLabel.toLowerCase().split(/\s+/)
const deviceWords = device.label.toLowerCase().split(/\s+/)
const sharedCount = savedWords.filter(word =>
deviceWords.some(dw => dw.includes(word) || word.includes(dw))
).length
const score = sharedCount / savedWords.length
// Require at least 50% word similarity
if (score >= 0.5 && score > bestScore) {
bestMatch = device
}
}
return bestMatch
}
Is this overkill? Maybe 🫣 But it means users almost never lose their preference, even when the underlying system changes slightly.
Managing Streams Without Leaks
Media streams are resources. Leave them running and you get the recording indicator stuck on, battery drain, and potentially blocked access for other apps. We needed careful lifecycle management.
The main challenge is race conditions. Users can switch devices quickly. Each switch triggers a getUserMedia call. If a user switches three times rapidly, we might have three in-flight requests. The last one should win, but we can’t just ignore the others. Those streams need to be stopped.
We use an AbortController pattern. Each device switch creates a new controller. If the effect re-runs before getUserMedia resolves, we abort. When the promise resolves, we check if we were aborted. If so, we immediately stop the new stream’s tracks and bail out.
if (abortController.signal.aborted) {
newMicrophoneStream.getTracks().forEach((track) => track.stop())
return
}
On unmount, we stop all tracks in the current stream. No orphaned streams, no stuck indicators.
Wrapping Up
None of this was planned. Each section of this write-up represents a problem we discovered only after hitting it. The permission quirks, the Windows device aliases, the label instability—all surprises.
The Chromium dismissed-vs-denied issue was particularly frustrating. The error looked the same, the behavior was different, and the only way to tell them apart was querying navigator.permissions after the fact. Not intuitive, but it seems to work.
The hook now handles all of this so components don’t have to think about it. Ramble uses it today, but it’s generic enough that we can use it for any feature that needs a microphone.