A private ambient listener for homes and offices / Evan King

A private ambient listener for homes and offices

3 Jan 2025

An ePaper display shows the most common words overheard from nearby conversations.

I wrapped up my PhD recently. It’s in “Electrical and Computer Engineering”, though it’s more descriptive to say that I spent my four years of graduate school working in “Ubiquitous Computing” or ubicomp. Ubicomp researchers aim to make technology that “fit[s] the human environment” rather than forcing humans to adapt themselves to environments dominated by obtrusive technology. Established by the late Mark Weiser at Xerox PARC in the early nineties, ubicomp has philosophical roots: Weiser was strongly influenced by the idea of “entanglement”, which suggests that humans are inextricably linked to (and influenced by) their surroundings.

One of the most compelling ideas to come out of Weiser’s time at PARC was that of calm technology. Calm technology exists at the periphery of human attention, where it provides a subtle augmentation of our knowledge about the environment without dominating the center of our focus. The canonical example is the “Live Wire” installed by artist and engineer Natalie Jeremijenko at Xerox PARC. The concept was simple: a wire dangled from the ceiling, where it was connected to a motor that spun at a speed relative to the rate of PARC’s internet traffic at each moment. The subtle movement of the wire didn’t demand your full attention—rather, its movement and sound in the periphery of your attention came to augment your awareness of unseen aspects of the environment.

Calm technology can also help with things like finding free spaces in a parking garage. The naive design might be to make a map of the garage, which can be displayed on an app and populated with data from the sensors in the garage. In this case, the center of focus becomes the app—the map, not the territory—and pulls the driver out of the environment. The “calmer” solution is to simply place a small light above each space—red if occupied, green if not—thus augmenting people’s awareness of the environment as they navigate it.

Image credit: Reddit

I’ve always found calm technology compelling, but was short on time during my PhD to explore it. One paper that impacted me was about a lamp that passively listens to conversations throughout the day and serves up daily retrospection (i.e., a “Message Ritual”) about the people around it. In a very elegant way, the lamp can augment people’s awareness of the state of their relationships, their moods, and their interactions. A downside, however, is that it generates these retrospections by sending all of your conversations to remote, cloud-hosted services.

My focus at Useful Sensors lately has been on our effort to build rapid speech-to-text models for edge devices. These models are super fast and, importantly, small enough to run on the CPU of an air-gapped Raspberry Pi. With the Message Ritual in mind, I set out to integrate Useful’s models in a personal project––my own take on calm technology. The result is an ambient listener that displays the most commonly-overheard words from nearby conversations throughout the day. Its constituent parts are straightforward: A Raspberry Pi, a microphone, an ePaper display, a voice activity detection (VAD) model, and lightweight speech-to-text model.

“I’m gonna run this little thing and see what happens. Did I do it right?”

Most importantly, the Raspberry Pi has no internet connection. It uses Silero VAD to detect speech activity and Useful’s Moonshine ONNX models to transcribe it. Dogfooding the Moonshine models like this was a good exercise since it helped me identify (and remedy) some issues with our Python package for the ONNX models.

As people have conversations in proximity of the summarizer, the word distribution is updated from real-time transcriptions of their speech. The transcripts themselves, however, are discarded. To highlight only the most salient words, I use NLTK for removal of stop words, along with a hard-coded list of filler words. The Pi is attached to a (very calm looking) ePaper display and placed in a shadow box frame with a mat cut to size. It updates every minute if the word distribution has changed, and starts over fresh each day.

Running the summarizer in my home for a few months has been a fun experiment. My partner and I have noticed that the mood of the words on the display often matches the mood of our conversations, and its presence invites productive discussions about our discussions. When guests see the summarizer, their first response is often to read off the words they see on the display and watch with amusement as it is influenced by their speech. It serves as a calm reminder that words have meaning, and can encourage us to be mindful of the content of our interactions.