If you’re fond of ’80s movies, chances are you’ve seen what they envisioned the 2010s to look like: full of hovercrafts, futuristic cities, and murderous robot machines. For better or worse, none of those things have become commonplace yet, but we are making strides in the technology department. The self-driving car will soon be commercially available, and our lives are very much devoted to smartphones and the Internet.
But why are we so behind on speech recognition? This is a hugely debated topic in the transcription and translation industries, as many don’t want their jobs to become obsolete. Though while that is a worry, the ability for a machine to accurately transcribe or translate will provide cheap access to many who need language services, which would be a huge step forward.
The problem is that language is so, so difficult (especially English) in so many ways. It may seem like speech recognition should be easy to develop, but just think about an elderly person having a conversation with a pre-teen. Even if they’re both speaking English, it can sometimes feel like conversations between two different worlds due to how quickly language evolves, context, and speaking styles.
Machines have this same problem. While we are extremely close to machines being able to understand monotonous, clear English spoken by a single speaker, problems arise when you have recordings with groups of people (which is what we specialize in).
Where Machines Fail
Here’s an example: a market research company is holding a 10-person focus group with folks who do speak English but aren’t native speakers. The group is comprised of teenagers from the rural south. So that’s 10 people, many talking over each other at certain points, who don’t speak English well, use regional dialect, and also use many newly-created words. That’s hard enough for a human to transcribe. But for machines, at this time, it’s not possible to achieve anywhere near 100% accuracy.
In the future, this will likely be solved in some technical way that my non-scientific brain cannot fathom. When it does, Atomic Scribe will evolve, just as language does. For now, it’s best to use human-powered services if you’re looking for accuracy. The rise of the machines will have to wait just a little bit longer.