Most People Think Speech Is About Words. We Built Our Entire Engine on the Opposite Idea.
Bradley Smith · CTO & Co-Founder · February 25, 2026
Most people think speech is about words.
In NLP, the standard pipeline is: audio in, transcript out, process the text. The audio is a means to an end. Once you have the words, you throw away the signal.
We did the opposite.
At Threadline Studio we're building an AI assistant editor for professional video. And the core insight behind our narrative engine is that the most important information in an interview isn't in the words. It's in the voice.
Here's what I mean.
Say a subject is telling a story about their childhood. The transcript reads the same whether they're reciting it for the tenth time or reliving it for the first. But the audio is completely different. Pitch drops when someone moves from rehearsed to genuine. Pacing slows when they're reaching for a real memory instead of a prepared answer. Breath patterns change when emotion enters.
A skilled editor hears these shifts instinctively. They mark the in-point not at the start of a sentence, but at the moment the delivery changes. That's the cut point. That's where the story lives.
So we built our engine around prosodic analysis. We extract features from the audio signal itself: pitch contour, speech rate, pause duration, energy dynamics, breath placement. Then we use those features to identify what we call "narrative moments," the points in a conversation where something real is happening.
The transcript tells us what the story is about. The prosody tells us where the story is.
Once you have those moments mapped, assembling a rough cut becomes a sequencing problem, not a search problem. You're not looking for keywords. You're arranging emotional beats into a structure that flows.
This is why our output feels different from transcript-based tools. It's not smarter about finding words. It's listening to something those tools ignore entirely.
The voice carries more information than the transcript. Always has. We just built a system that finally pays attention to it.
What's a signal in your domain that most systems ignore but humans rely on instinctively?
