Using LLMs to control a smart home

March 18, 2023

2024 UPDATE

I originally authored this post in early 2023 — little did I know it would be the spark of a broader research project, ultimately becoming a significant part of my PhD dissertation. If you are the researchy type, you can read the short 2023 preprint paper and/or the in-depth 2024 ACM IMWUT paper I published and presented at UbiComp 2024 about this topic. We also made a demo video that shows an LLM-based smart home in action. Otherwise, enjoy the original post!

Introduction

While smart home assistants like Google Home have improved significantly over the years, they can still feel pretty rigid in their ability to respond to more ambiguous commands. Often you’ll find yourself adapting your language to the expected query structure of the assistant, rather than the assistant adapting itself to your prompts. ChatGPT has recently shown the surprising power of large language models to grasp semantics and “meaning” behind text, which inspired me to apply it to the smart home assistant problem and see if it can service ambiguous or vague smart home commands where existing assistants tend to choke. I’ve been surprised to find that it’s pretty good at not only inferring user intent behind vague commands like “do something to cheer me up”, but it also has a great grasp of finer details like creating properly formatted JSON for interfacing with the Philips Hue API.

Engineering a solution is a matter of designing the right interface: wrapping smart home context into a prompt, parsing responses from the model, and passing them off to the appropriate API. I’ll briefly write about my experiences implementing a proof-of-concept using Philips Hue lights here.

Prompt Engineering

Since ChatGPT is a general-purpose model, we need to engineer our prompts to 1.) scope its responses to the smart home use case and 2.) recieve output in a machine-parseable format that we can input to an API (in this case, Philips Hue). Toward 1.), we simply open the prompt with some language to put the model in the right “frame of mind”:

You are an AI tasked with controlling a smart home.

We also need to give it some information about the current context of the space—what is the state of devices there, what are they called? This not only helps the model determine what it has control over, but also provides useful contextual information that might inform how it chooses to change the state of devices in response to the prompt. We start with some preamble:

Here is the state of the devices in the home, in JSON format:

Then follow up with JSON pulled directly from the Philips Hue API:

{ 
    "action": {
        "on": true, 
        "bri": 200, 
        "hue": 42354, 
        "sat": 66, 
        "effect": "none", 
        "xy": [0.3223, 0.3287], 
        "ct": 167, 
        "alert": "lselect", 
        "colormode": "xy"
    }
}

Note that I’m only using information from one room in my home, and only providing parameters that (based on the API definition) it should be able to control. Passing it off too much context could be problematic, both in terms of inference time and in terms of overwhelming the model with mostly-irrelevant context. Having experimented with different amounts of information passed off to the model, I suspect engineering the full solution here will in large part be figuring out how to slim down the amount of contextual information passed off to the model in order to quickly and accurately service the query.

Next, I need to tell the model what the user’s query is, so it has something to respond to:

The user issues the command: {query}. Change the device state as appropriate.

And finally, provide some guidance for formatting the output, so that we can parse it and hand it off to an API:

Provide your response in JSON format.

Summed up, this is what a prompt looks like:

You are an AI tasked with controlling a smart home. Here is the state of the devices in the home, in JSON format: {context} The user issues the command: {query}. Change the device state as appropriate. Provide your response in JSON format.

Complex Prompts

Now, with a fairly complex query of “Turn the lights white and blink them continuously”, we get the following response in properly formatted JSON that can be handed off directly to the Philips Hue API.

Here’s the updated JSON format with the new device state as per the user’s command:

 {
   "on": true,
   "bri": 254,
   "hue": 0,
   "sat": 0,
   "effect": "blink",
   "xy": [0.3227, 0.329],
   "ct": 399,
   "alert": "lselect",
   "colormode": "xy"
 }

Explanation:

"on": true - This means the lights are turned on.

"bri": 254 - This sets the brightness of the lights to the maximum value of 254.

"hue": 0 - This sets the hue to 0, which is white light.

"sat": 0 - This sets the saturation to 0, which means the color is fully desaturated and the light is completely white.

"effect": "blink" - This sets the effect of the lights to blink continuously.

"xy": [0.3227, 0.329] - This sets the color of the light to a specific xy coordinate that represents white light.

"ct": 399 - This sets the color temperature to the original value of 399.

"alert": "lselect" - This makes the lights blink quickly for a short period of time to indicate that they are in alert mode.

"colormode": "xy" - This confirms that the lights are in XY color mode.

Here’s what happens when we pull the JSON portion of the response and push it directly to the Hue API—the lights turn white and blink continuously, like requested:

Pretty cool.

Ambiguous Prompts

It might be more interesting to try ambiguous commands related to abstract concepts like “mood” rather than specific instructions. What if we use the prompt “make a party vibe in here”? Using the same approach above, this is the response from ChatGPT:

Here’s the updated device state in JSON format based on the user’s command:

{
  "on": true,
  "bri": 254,
  "hue": 46920,
  "sat": 254,
  "effect": "colorloop",
  "xy": [0.1533, 0.0531],
  "ct": 424,
  "alert": "none",
  "colormode": "hs",
  "scene": "party"
}

I added a new field scene with a value of "party" to reflect the requested party vibe. Depending on the specific capabilities of the devices in the home, this command could involve changes to the brightness, color, and/or effects of lights, as well as changes to music playback, temperature, and other factors to create an overall party atmosphere.

And this is the result when pushing to the bridge—the lights loop through a festive set of bright colors:

Now that’s promising! Existing smart assistants could probably not deal with this level of ambiguity. But it doesn’t always work (sometimes the responses are not, despite ChatGPT’s insistence, party vibes), and you’ll note that ChatGPT hallucinated some made-up information in its response (creating a party scene) that doesn’t actually correspond with the Hue API’s capabilities. This is where another real challenge of tying LLMs into smart home control will probably lie: filtering down the often unreliable responses from the model into actionable sets of system commands that correspond with real (and desirable) behaviors.

Future Work

This covers a pretty small and self-contained example of how general-purpose LLMs might be leveraged to handle more complex user queries in smart environments. A lot of work remains to be done to produce a full solution. Off the top of my head:

Moving beyond lights. This is a simple proof of concept, but one could imagine a much more interesting system that integrates smart TVs, speakers, etc.
Handling a full home’s worth of state and context. The naive solution will result in big (expensive) prompts, looong inference times, and major hallucinations. Some preprocessing needs to be done to hand off only the most relevant information.
From commands to automation. Right now the bulk of home automation tasks are handled by preprogramming schedules. What if you could simply ask your assistant to build these automation routines for you? For example, “create a cozy atmosphere whenever it’s raining”.

Conclusion

This brief post introduces a proof of concept application that leverages ChatGPT to control Philips Hue lights in response to prompts that are too complex for existing smart home assistants. A lot of work remains to be done to engineer a full solution here—namely, forming concise prompts to reduce inference time and room for error, as well as taming outputs to deal with hallucinations. Thanks for reading!