Translation Memory: a Necessary Tool for Translation Accuracy and Consistency

by Alpha Omega Translations | Jun 11, 2014 | Translation Tools

Patterns tend to be repeated throughout the natural world, whether they be leopard spots on a butterfly’s wings or the floral imprint that adorns a sand dollar. In fact, the mathematician Fibonacci discovered the pattern that lies at the heart of nature, a sequence represented by an ever-widening spiral. While a spiral of clicks and Internet links may have brought you to this article or a direct need for our services, the translation industry addresses patterns through one important tool.

What is a translation memory?

A translation memory is a linguistic database that continually grows with each translation. All previous translations are compiled within the translation memory and reused. The more the translation memories are built up, the better the terminology consistency and the better the quality of translations.

Despite its frequent use of software-generated algorithms, TM (Translation Memory) is a “natural” tool to be used by translators. It serves to uncover language patterns in a source document, especially the sentence segments that repeat themselves, so as to compare them with matching segments in the target language. This can assure better consistency and terminology harmonization from one translation project to another. Additionally, it is no coincidence that the term “translation” appears in biology in the process of protein synthesis (fundamental for life) whereby short segments called codons are paired with matching source codons.

The debate over TM centers around the length of sentence segments and the context they contain. With modern Computer Assisted Translation (CAT) tools a concordance search can be executed to find sentence segments of varying length within the source document. Whole sentences are rarely repeated through a text and single or double word terms can be adequately managed by a terminology database, which can be a means but not an end to translation memory. A special algorithm called “fuzzy matching” will find a sentence or sentence fragment that is similar to another sentence in the database and ask the translator/reviewer to approve or modify the translation. The consensus for how similar these fuzzy sentences should be to existing database entries is around 85%, which is highly exigent. Types of documents in which larger portions of text are apt to be repeated are often highly technical documents, especially manuals and legal contracts. Furthermore, when a manual undergoes a revision or perhaps even a textbook appears in a new edition, some of the same sentences and word sequences remain unchanged. These examples are perfect for TM.

The concept of storage does, in a way, break with the “natural world” metaphor. With cloud-based and client-server technologies, physical constraints matter little and the database of raw or native format documents are available to a wide variety of people, including a large team of translators that can work together (network) for urgent projects. To explain, raw formats are when the source documents are uploaded to the TM manager as they were written while native formats are when the sentence segments ready to be matched are predefined and saved as a file type. It is usually the native format that translators can send to each other to accelerate the translation process. The specific file type can appear under many names, just like videos, which seem to have an endless array of file abbreviations, however, the standard for exchange between users is TMX. While documents that are being readied for TM are all XML documents, merely because sentence segments are delimited in a way that machines can read them, XLIFF files are a common type of XML. However, TMX assures that different tools can collaborate while being able to extract the relevant information from XLIFF files for the formation of TM databases.

The most important phase of any project or dissertation is research. There is so much information from journals and websites that selectivity is essential. Thus software packages, like SDL Trados or memoq provide a list of candidate translations for a sentence segment (also called translation unit) that has been repeated in the source document for a translator to choose from. Another perspective is translation memory can simply be a large collection (text corpus) of documents that translators can reference for subsequent translations. The best method is probably employing both perspectives while using outside research and a lengthy research and review process. TM is both an assurance of quality and a source of risk, which clients of linguistic companies can anticipate for, by being familiar with these technologies that are improving the quality of translation.

For an overview of our translation expertise, visit our technical translation page.

← Human Guidance for Machine Translation The CAT Tools Behind TM →

Albanian	Armenian	Azari
Azerbaijani	Baloch	Basque
Belo Russian	Brahui	Bulgarian
Burushaski	Catalan	Creole
Czech	Danish	Dutch
Estonian	Finnish	French
Gagauz	Galician	Georgian
German	Greek	Haitian
Hindko	Hungarian	Italian
Kazak	Kurdish	Kyrgyz
Latvian	Lithuanian	Norwegian
Pashto	Pashtu	Polish
Portuguese	Pusktu	Romanian
Russian	Serbian	Sindhi
Siraiki	Slovak	Slovene
Spanish	Swedish	Tajik
Turkmen	Ukrainian	Uzbek
Yiddish

Assamese	Bengali	Burmese
Cambodian	Chinese	Chukese
Gujarati	Hmong	Indonesian
Japanese	Kannada	Kashmiri
Kirgiz	Korean	Koren
Malay	Malayalam	Marathi
Mongolian	Oriya	Pohnpeian
Punjabi	Samoan	Sanskrit
Sindhi	Surinamese	Tagalog
Tamil	Telugu	Thai
Tongan	Urdu	Vietnamese
Yapase

Afar	Afrikaans	Ateso
Bandjabi	Bapounou/Eschira	Bateke
Bilen	Bojpoori	Cotocoli
Diaula	Ewé	Fang
Fulani	Hakka	Kabyé
Kikongo	Krio	Kunama
Lingala	Luganda	Luo
Malinké	Mende	Mina
Myene	Nara	Ndebele
Orominga	Saho	Serer
Sesotho	Sesothosa Leboa	Setswana
Shona	Somali	Susu
Swahili	Swazi	Temne
Tigre	Tigrinya	Tobedawi
Tshivenda	Wolof	Xhosa
Xitsonga	Yoruba	Zulu

Hindi	Pakistani	Punjabi
Urdu