Difference Between Natural Language Processing (NLP) And Speech Recognition
- Ajay Sharma
- Dec 2, 2020
- 4 min read
Updated: Dec 8, 2020
Today I try to give a brief introduction about one of the most confusing terms which are speech recognition and Natural language processing. Many peoples who are new to this field get confused between them. Today I will try to clear this confusion. So, let’s start

What is Natural Language Processing?
NLP stands for Natural Language Processing which is a part of Artificial Intelligence. NLP is used for communicating with an intelligent system using a natural language such as English, etc.
As we know how much data is being generated on a daily basis so processing that data is required when you want an intelligent system like a robot to perform as per your instructions, when you want to hear a decision from a dialogue-based clinical expert system, etc.
NLP is the practice of understanding how people organize their thinking, feeling, language, and behavior to produce the results they do. NLP provides people with a methodology to model outstanding performances achieved by geniuses and leaders in their field. NLP is also used for personal development and for success in business.
A key element of NLP is that we form our unique internal mental maps of the world as a product of the way we filter and perceive information absorbed through our five senses from the world around us
NLP in simple language is nothing but making computers to perform useful tasks with the natural language humans use to do.
The input and output of an NLP system can be −
Speech
Written Text
What are the Components of NLP?
There are mainly two components of NLP which can be given as −
1) Natural Language Understanding (NLU)
In Natural LanguageUnderstanding, it involves the following tasks −
First, here we map the given input in natural language into some useful representations.
Then, analyzing different aspects of the language.
2) Natural Language Generation (NLG)
Natural Language Generation (NLG) is the process of producing meaningful phrases and sentences in the form of natural language from some internal representation.
Mainly it involves −
Text planning –In-text planning, we try to retrieve the relevant content from the knowledge base.
Sentence planning –In Sentence planning, we try to choose the required words, forming meaningful phrases, and setting the tone of the sentence.
Text Realization –In-Text Realization, we try to map sentence plans into sentence structure.
Note: As a comparison between Natural Language Understanding (NLU) is harder than the Natural Language Generation (NLG).
What is Speech recognition?
In simple terms, speech recognition is nothing but simply the ability of software to recognize speech. Anything that a person says, in a language of their choice, must be recognized by the software.
Speech recognition used to prepare the input data (speech) to be appropriate for natural language processing (text).
Let’s take an example.
How Siri Works?
Siri works on mainly 2 technologies which are Speech Recognition and Natural language processing.
Here, Speech Recognition is used to convert human speech into its corresponding textual form. For instance, when you trigger Siri by saying “Hey Siri! How is the weather today?”, in the back-end, a powerful speech recognition system by Apple kicks off and converts your audio into its corresponding textual form – “Hey Siri! How is the weather today” This is an extremely challenging task simply because we humans have a highly diverse set of tones as well as accents? The accents vary not only across countries but also across states/cities within a country. Some people speak fast, some speak slowly. Characteristics of male and female voices are also very different from each other.
The engineers at Apple train Machine Learning models on large, transcribed datasets in order to create efficient speech recognition models for Siri. These models are trained with highly diverse datasets that comprise of the voice samples of a large group of people. This way, Siri is able to cater to various accents.
In recent years, deep learning has proven to produce phenomenal results in speech recognition. The word error rate of speech recognition engines has drastically gone down to less than 10%. This has been possible due to the availability of not only large datasets but also powerful hardware using speech recognition algorithms that can be trained on the datasets.
Once Siri has understood what you are saying, the converted text is sent to Apple servers for further processing. Apple servers then run Natural Language Processing (NLP) algorithms on this text to understand the intent of what the user is trying to say.
For instance, the NLP engines are able to differentiate that when a user is saying “set an alarm for 6 AM tomorrow,” the user is asking about setting an alarm and not about making a call. This is challenging because different users speak the same sentence in different ways. For instance, one can say the same thing in the following ways:
Hey Siri, can you set me an alarm for 6 AM tomorrow?
Siri, can you wake me up tomorrow at 6 AM?
Siri, please set an alarm for tomorrow at 6 AM.
Siri, please wake me up tomorrow at 6 AM.
Read the full article here.
For more blogs/courses on data science, machine learning, artificial intelligence, and new technologies do visit us at InsideAIML.
Thanks for reading…
Comentarios