Software on Evan King

Mishearings: Turning ASR models into poets

Tue, 11 Feb 2025 00:00:00 +0000

TL;DR: I built a tool for assembling Dadaist poems from ASR transcriptions of misheard speech, and you can play with it in your web browser. I built it with my MoonshineJS library, Tone.js, and p5js.

Introduction

My goal lately has been to build a JS library for simple on-device speech recognition in web applications. While there are many practical uses for this, I felt the urge recently to explore its creative potential. Speech recognition models are a bit more utilitarian than purely generative models – you put speech in and get text out – so I had my work cut out for me.

Moonshine: Industry-leading edge ASR

Mon, 21 Oct 2024 00:00:00 +0000

Moonshine outperforms speech-to-text models from OpenAI, Google, NVIDIA, and Meta on the OpenASR Leaderboard while running 5x faster¹ on edge devices.

I built the data collection and preprocessing pipelines that we used to train Moonshine, delivering over 200K hours of labeled data. We needed a LOT of good data, and we had to move fast. The pipeline I constructed was massively distributed, allowing us to intake hundreds of terabytes of raw audio data, label it, and clean it within the span of several weeks.

Teaching things to think

Mon, 06 May 2024 00:00:00 +0000

What if smart devices could reason about their state, like a thermostat that explains its schedule, or a light bulb that chooses a color to suit a mood?

That’s the subject of my IEEE PerCom ‘25 paper, “Teaching Things To Think: Bootstrapping Local Reasoning for Smart(er) Devices”. We proposed a method for synthesizing training data to distill small language models for the task, leveraging a combination of formal methods and generative models. We ultimately trained and evaluated models for two “thoughtful things” – a lamp and a thermostat – then evaluated their performance at explaining and mutating their state in response to unconstrained user commands.

Sasha: Introducing LLMs for smart spaces

Wed, 06 Mar 2024 00:00:00 +0000

Building on my weekend project to control some smart lights with ChatGPT, this wide-ranging paper fully introduces LLM-based reasoning to multi-device smart home environments. We introduce methods and benchmarks for measuring model performance at reasoning in smart homes, propose methods for engineering immediate and scheduled responses to user goals, propose a multi-step reasoning system for improving system performance, and conduct the first user study of a real LLM-controlled smart home.

Something that often gets lost in a research paper is the truly fun and challenging experiences you can have with the work. Between touring a trailer on the UT Austin JJ Pickle Campus to see if I could turn it into a smart home (I could not – it was outfitted with extremely sensitive equipment for conducting ventilation studies) to eventually hauling pegboard and furniture from my illegally-parked RAV4 up to an unused lab on the 7th floor of the EER building, this project was truly the highlight of my PhD.

CANDor: Energy efficient neighbor discovery

Mon, 25 Sep 2023 00:00:00 +0000

Bluetooth Low Energy (BLE) can be useful for forming ad-hoc networks between smart devices, and for exchanging information in sensor networks. It’s difficult, however, to choose the right settings for the BLE protocols that devices use to discover one another. The optimal protocol effectively balances discovery latency and packet reception rates with the energy consumed by sending and scanning for neighboring packets.

In this work, I introduce a novel neighbor discovery protocol to improve the packet reception rate and energy consumption of BLE neighbor discovery. Unlike prior work, which uses extra sensors (e.g., GPS) to recalibrate the protocol performance, this work uses the performance of neighbor discovery itself as a signal to determine when to recalibrate the protocol’s settings.

Using LLMs to control a smart home

Sat, 18 Mar 2023 00:00:00 +0000

2024 UPDATE

I originally authored this post in early 2023 — little did I know it would be the spark of a broader research project, ultimately becoming a significant part of my PhD dissertation. If you are the researchy type, you can read the short 2023 preprint paper and/or the in-depth 2024 ACM IMWUT paper I published and presented at UbiComp 2024 about this topic. We also made a demo video that shows an LLM-based smart home in action. Otherwise, enjoy the original post!

From sea levels to music with CSV to MIDI

Wed, 14 Feb 2018 00:00:00 +0000

Data sonification describes the process by which data is translated into sound. There’s a variety of reasons this might be worthwhile, from purely artistic or aesthetic purposes to its potential as an assistive technology for the visually impaired. I like to think of it as synesthesia: it allows us to experience information in a different modality than it was originally expressed.

There’s no one-size-fits-all way to design a data sonification algorithm—it’s highly dependent on the structure of the data you’re working with and the type of sound you’re trying to create. Since comma-separated values (CSV) are a common way of storing numerical datasets and MIDI is the standard for representing melodic information, it struck me that a tool that could convert between the two formats would come in handy as a generative composition tool. So back in 2018, I made one.