Not many people know it, but the roots of ai translator date back to the 1950s. In the mid-50s, IBM unveiled the first-ever AI machine that can translate a few select words—250 words to be exact, and they are all Russian to English.
The breakthrough was short-lived. In 1964, the Automatic Language Processing Advisory Committee released a report proclaiming that machines could never produce accurate translations and investing in them is a waste of money.
It’s a good thing that scientists were persistent. The IBM model serves as the basis for what would become the ai translator using statistical machine translation technology.
So What is Statistical MT?
At the heart of the Statistical MT is the predictive algorithm used by the computer on how to translate a statement. It takes a lot of input to create accurate linguistic data to make the translation more accurate as time goes on.
Among the different subgroups of Statistical MT are:
- Word-based translation
- Phrase-based translation
- Hierarchical phrase-based
Simply put, the model works by examining a statement and taking key words to “predict” what it means. So you have a basic understanding of what the speaker is trying to say, but it’s not a word-for-word translation.
Evolution in Machine Translation
In 2003, some University of Montreal researchers, led by YoshuaBengio, have created a new language model that took advantage of neural networks.
A neural network is defined as a sequence of algorithms that tries to find a connection in the data. What makes it different is that it tries to imitate how the human brain works. Their language model is the father of modern Neural Machine Translation (NMT).
It wasn’t until a decade later when NMT would hit a breakthrough, thanks to the encoder-decoder language developed by Phil Blunsom and NalKalchbrenner. Deep learning became possible when they developed a structure that inputs data into the Convolutional Neural Network, which will then be decoded using a Recurrent Neural Network.
Thus, the AI translator using NMT was born.
- NMT uses more extensive linguistic data compared to Statistical MT
- NMT fills in the gap caused by the natural data sparsity of Statistical MT
- SMT is still better when parsing long and complex sentences
- NMT works better in analyzing commonality in words compared to Statistical MT
- NMT is much more accurate in translating between languages
- NMT can be used on one system rather than a series of systems which Statistical MT relies on
- NMT has the machine-learning capability, which means it can learn to be more accurate on its own. Statistical MT is rule-based
- NMT has zero-shot translation capabilities whereas Statistical MT has none
The problem with language is that it’s never simple. For instance, the Statistical MT uses a pivot that allows the machine to come up with the most accurate translation. For example, if it’s trying to translate Russian to English but couldn’t find the meaning of the word, it looks for a pivot, in this case, the Polish language.
However, using this model, errors in language are carried over onto the translation. The NMT, meanwhile, is not heavily dependent on the pivot. Instead, all language data are fed into the system, and the translator will try to find the relationship between multiple sets of data.