Making LLMs better at presenting context

This isn't just a roast of Rabbit, I promise.

I recently watched the demo of the Rabbit R1, a new handheld LLM-powered device made in partnership with Teenage Engineering. Teenage Engineering produces beautiful, simp-able hardware ranging from a delicious music synth to a modern (post-modern?) tape recorder. TE is opinionated, with a penchant for the analog, and it's for this reason I think they have something fresh to say in the AI world.

When my dad was in the hospital last month, and we at home had little understanding of what was going on, I sat my family around the dinner table and asked ChatGPT questions by voice. It was simple. Raw. Universal. I could give it as much context as I wanted, ramble and blather on with my fears and uncertainty, and ask it for clarity. ChatGPT responded with high-level summaries of his condition, contingencies and treatment plans. The raw quantity of info, and its ability to sift through context, was reassuring.

It was the most human interaction I've had with a machine - by interface and content. No buttons or other UI needed. Just push to talk, surrounded by family.

I'm thinking today about how LLMs present context, after watching Rabbit's demo. At 12:34 the guy is playing a Kraftwerk album. He skips around, showing off how to navigate the album on the touchscreen and by voice. Then, he pushes the push-to-talk button and asks, "who wrote the lyrics for this song?"

The music cuts out and the bot responds, "Ralf Hütter and Emil Schult wrote the lyrics for the song, Computer Love by Kraftwerk."

It's too long. If you asked your buddy in the passenger seat to queue up some Kraftwerk in the car, and you said "Man this is good, who wrote it?" it would be an immediate vibe killer if they turned the audio off and looked you dead in the eyes and said "Ralf Hütter and Emil Schult wrote the lyrics for the song, Computer Love by Kraftwerk."

"Emil Hütter and Ralf Schult wrote the lyrics for the song Computer Love by Kraftwerk."

This response is too long and obstructive for the standards of familiarity that this kind of device is aiming for. It hits the uncanny valley. In some ways, the 500ms response time they're touting might not matter if a 2 second question gets a 15 second response.

For delightful and useful experiences, LLMs need to be more human-like in detecting how much context to share. There's much discussion on how to get LLMs to "remember" more context. But it's an equally important problem for LLM UX to get better at returning the right amount back in its answer. I see Rabbit's songwriter moment a lot in my own chats with ChatGPT. In pair-programming chats, it's even more critical to get the right amount of context. Is this solution that ChatGPT's produced for me the only solution to this problem? Did it make any assumptions? Is there a more computationally expensive solution that would work better in some way? etc.

How does that lengthy songwriter moment happen? Maybe during prompting or training, the LLMs are incentivized to preserve and present more context to the user so that the context isn't lost in future responses.

The amount of context I want presented to me depends on several things - the task at hand, my personal communication style, even my culture's context preferences.

How can I explore these questions in an MVP way? I'm trying to ship more. Maybe I could fire up an open-source model like Mistral on Ollama, test it on various contextual tasks. A benchmark. What's an example of a task that would require presentation of lots of context? At the beginning of an LLM chat, perhaps, or what I ask the LLM to respond to a multi-threaded question. Or I could try to prompt and tune it based on various personal preferences.

How can we make LLMs smarter at presenting the right amount of context to users?

No easy answers here - this is hard for even humans to determine. Tom Critchlow writes that handling context is the primary work of high-value strategy consulting. The best consultants are those that place their work in just the right context, and surface that context effectively to their clients.

But working towards this would go a long way to making interactions with LLMs feel a lot more like speaking naturally with an attentive, tactful human.


A version of this post originally appeared in one of my weekly handwritten letters to strangers. If you'd like to get one in the mail, you can sign up (for free).