Voice Analytics vs. Speech Analytics: Whats the Difference?

Many call center managers are familiar with speech analytics software. This software is designed to transcribe phone agent and customer interactions into a text document that can then be used to assess the agents performance based on how well he or she sticks to the script.

Recently, however, there have been new types of software developed to analyze phone agent and customer interactions. One example of a new technology for analyzing customer and phone agent interactions is voice analytics.

What is voice analytics, and how does it differ from speech analytics? Heres a quick comparison that pits voice analytics vs speech analytics:

How Speech Analytics Works

Most speech analytics programs work by analyzing phonetics. These software programs pick apart syllables and pick out specific keywords for analysis.

Some systems have a vocabulary of just a few words, while others have a much larger dictionary with tens of thousands of words. The larger the systems vocabulary, the more complete the transcript of the call will be.

Basically, these systems focus on what people say in a conversation.

How Voice Analytics Works

Voice analytics programs have a similar function to speech analytics programs in that they analyze recorded conversations.

However, rather than focusing on phonetic pronunciations to pick out individual words, voice analytics programs study vocal elements such as syllable emphasis, tone, pitch, and tempo to analyze speaker behavior.

After collecting the raw vocal data, it is then run against an emotional voice database, comparing features generated by the voice analytics system to known features that are associated with emotions such as happiness, sadness, anger, and fear to properly identify and classify the emotional state of the speaker.

In short, voice analytics focuses on the emotional aspect of speech, or how people speak in a conversation.

Issues with Speech Analytics

The issue with speech analysis systems that convert speech to text is that while they can be very quick to produce a report, analysis of the report is still left entirely to human experts. This makes the process of analyzing call results slow and cumbersome, as your team is manually reviewing each call based on words without context as to the speakers emotional state.

Such a lack of context can frustrate proper analysis because certain phrases can change in meaning depending on how theyre said.

Also, the speed at which reports are generated depends on how in-depth the dictionary used for the speech analytics program is. Deeper, more complicated dictionaries mean slower report generation.

Worse yet, there is the limitation of a speech analysis programs Word Error Rate (WER) in transcribing speech to text. The WER of a program can vary based on the specific languages and dialects you deal with, but common issues that may confound speech analysis include English homophones (there, their, and theyre), slang, and heavy accents.

Because of these issues, many call centers limit their speech analytics to use for compliance purposes, analyzing phone agent speech to make sure theyre sticking to the script and using approved language.

The Importance of How We Speak

According to psychological studies by experts such as Professor Albert Mehrabian, paraverbal messages (messages implied by tone and emphasis) can account for approximately 38% of what is communicated to someone and that nonverbal messages account for about 55% of what is perceived and understood by others.

This effectively means that word choice only has about a 7% impact on face-to-face conversations. In essence, how you speak is more than 5 times as important as what you say. With conversations over the phone, where body language cannot be assessed, the importance of tone and emphasis in speech is heightened even further.

For example, consider the word fine. The meaning of this one word can change wildly based on how its said. A short, terse utterance indicates that the speaker is anything but fine. On the other hand, a cheerful tone can indicate that the speaker is happy or feeling good about something.

Advantages of Voice Analytics

Because voice analytics make a careful analysis of speaker pitch, energy, amplitude, frequency, Mel-Frequency cepstral coefficients, and other key elements of how people speak, it is possible for voice analytics to assess the emotional state of a speaker.

This has the advantage of avoiding confusion over the meaning of a given word or phrase used. Instead, you get the context of the response without having to guess at meaning. This reduces the need for expert analysis of collected data, saving time and labor.

Additionally, certain voice analytics programs can extrapolate a speakers emotional state and responses to a call to determine future behaviors. This predictive voice analytics variant is used by collections agencies to improve second-call targeting by focusing on the debtors that are most likely to pay.

This predictive behavioral model is only possible because the machine learning algorithm has access to the true meaning and context of a conversation. Without the data on how words are said, such predictive analysis would be impossible.

Voice analytics isnt limited to predicting call results. There are QA applications for this technology in call centers for financial institutions, schools, service companies, and government organizations as well.

For example, say that a phone agent is struggling to perform, failing to close deals or experiencing an abnormally high number of unresolved hang-ups with customers.

With speech analytics, you might find that the phone agent is saying all the right things, following the script to a T. If this was your only data point, you might be led to assume the agent is doing everything right.

On the other hand, voice analytics would be able to uncover that the tone of voice used by the agent is aggressive and rude, an issue that speech analytics alone would not identify. Armed with this knowledge, you can take more targeted measures to address the performance issue by correcting the agents use of tone.

Although theyre both used by call centers to gather intelligence about call center performance, voice analytics and speech analytics are very different tools. Their different operating principles of analyzing what is said vs. how the speaker said it give you different kinds of information, which can be used to improve operations in different ways.

Call to action heading

Your clients will thank you immeasurably if you can intervene and provide additional training to the poorly performing agents


Harness the essence of the human voice

Schedule a demo

Experience the power of RankMiner yourself.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.