Machine Translation in Speech-to-Speech Situations

May 24, 2014 | Alpha Omega Translations

The glow of the big screen illuminates the Parisian movie theater’s engrossed occupants. This is the latest American blockbuster and while the French audience gapes at the big-shot actress in the opening scene, they also tune in to a very familiar voice. The voiceover sounds exactly like the middle aged woman from last week’s showing, briefly taking the audience out of the verisimilitude. It is a mystery who the French male and female voiceover professionals are, working from behind the screen, but their voices, broadcast to millions of spectators, aren’t varied enough to serve all of Hollywood’s acting crew. Yet here is a business where machines could never replace humans whose voices must express drama and excitement in the millions of new shows and movies being spewing out. This introduces a much broader topic, machine interpretation. Machine Translation (MT) is a hot topic in the 21^st century language world, despite its rudimentary vocabulary and poor handling of grammar and regionalisms. If the text is barely useable, what about simultaneous voiceovers between foreigners in a futuristic conversation? What about machine interpretation of telephone conversations, theater productions, speeches, and conferences?

This concept has been dubbed Speech-to-Speech translation and it presages that the familiar (or annoying) voiceover or customer service voices that are heard everywhere will make their way into headsets and speakers. Or for those iPhone users who have met Siri… one of her relatives may one day eliminate language barriers and replace the thrill of speaking to a foreign person, hearing a foreign accent, struggling with a language, or discovering a culture. If Speech-to-Speech actually worked! The acoustic model is the speech recognition tool which picks up audio waves to convert them to text. The machines work wonders when presented with a slow even speech pattern but when the sports channel comes on or Italians tune in, speedy talking will invariably undermine their efficiency. The same goes for mumbling, slurring and background noise. Not to mention spoken vocabulary is a far cry from textual vocabulary because people add fillers, hesitations, colloquialisms and generally what researchers call OOV (Out of Vocabulary) words. Despite this, speech recognition has undergone constant innovation, notably by Microsoft. While Gaussian Mixture Models (GMM) were the go-to-method in deciphering which phonemes were being spoken to reconstruct words, DNN (Deep Neural Networks) are the attraction of recent years. In fact, Microsoft Research has developed a version of this last method, which mimics human neurons in its way of processing phonemes, that uses senones instead. Senones are even shorter fragments of sound in the human voice which greatly improve accuracy. However, Microsoft’s 2010 version still recorded errors in one out every eight words. In a typical three minute telephone conversation, that’s equal to fifty wrong words!

Furthermore, the cost of accuracy is the information it requires. Senones outnumber phonemes by the thousands (and presumably more in phoneme rich African languages) and together with repertoires for different accents means machines will have to be well equipped. Especially when it comes time for the machine translation process which relies on statistical information. Either the Nano age and the storage of terabytes in small chips cannot come soon enough or information will have to be kept in the cloud, where it is prone to leakage. While private interpreting firms have confidentiality statements about information that is translated, the Japanese NTT DoCoMo cell phone company sends all information to a central database in its attempt to simultaneously translate phone calls. There are other dangers in dependence on machine translation but what about the robotic voice that emits the translation vocally? While Siri is excellent at voice recognition, its responses are very limited. The same is for eBooks that have to be specially recorded, while being the most obnoxious sounding bed time story you’ve ever heard. And finally there’s the potential for new gaffes. The embarrassment of a woman’s dialogue being simultaneously translated into a man’s voice, or the voices of several interlocutors being translated as if one person was speaking, or yet again, if a snarling foreigner was given a cheery tone by the machine voice. And think of the uproar caused by Google Glasses… what would people think if everyone travelled with headphones in their ears? Perhaps they are marketable to the French movie buffs that need an alternative to the same person doing all the voiceovers!

For an overview of our translation expertise, visit our audio and video translation page.

Tags: machine translation, translation software

Category: Translation Tools

Albanian	Armenian	Azari
Azerbaijani	Baloch	Basque
Belo Russian	Brahui	Bulgarian
Burushaski	Catalan	Creole
Czech	Danish	Dutch
Estonian	Finnish	French
Gagauz	Galician	Georgian
German	Greek	Haitian
Hindko	Hungarian	Italian
Kazak	Kurdish	Kyrgyz
Latvian	Lithuanian	Norwegian
Pashto	Pashtu	Polish
Portuguese	Pusktu	Romanian
Russian	Serbian	Sindhi
Siraiki	Slovak	Slovene
Spanish	Swedish	Tajik
Turkmen	Ukrainian	Uzbek
Yiddish

Assamese	Bengali	Burmese
Cambodian	Chinese	Chukese
Gujarati	Hmong	Indonesian
Japanese	Kannada	Kashmiri
Kirgiz	Korean	Koren
Malay	Malayalam	Marathi
Mongolian	Oriya	Pohnpeian
Punjabi	Samoan	Sanskrit
Sindhi	Surinamese	Tagalog
Tamil	Telugu	Thai
Tongan	Urdu	Vietnamese
Yapase

Afar	Afrikaans	Ateso
Bandjabi	Bapounou/Eschira	Bateke
Bilen	Bojpoori	Cotocoli
Diaula	Ewé	Fang
Fulani	Hakka	Kabyé
Kikongo	Krio	Kunama
Lingala	Luganda	Luo
Malinké	Mende	Mina
Myene	Nara	Ndebele
Orominga	Saho	Serer
Sesotho	Sesothosa Leboa	Setswana
Shona	Somali	Susu
Swahili	Swazi	Temne
Tigre	Tigrinya	Tobedawi
Tshivenda	Wolof	Xhosa
Xitsonga	Yoruba	Zulu

Hindi	Pakistani	Punjabi
Urdu

Get In Touch With Us

Machine Translation in Speech-to-Speech Situations

Subscribe