Patterns tend to be repeated throughout the natural world, whether they be leopard spots on a butterfly’s wings or the floral imprint that adorns a sand dollar. In fact, the mathematician Fibonacci discovered the pattern that lies at the heart of nature, a sequence represented by an ever-widening spiral. While a spiral of clicks and Internet links may have brought you to this article or a direct need for our services, the translation industry addresses patterns through one important tool.
What is a translation memory?
A translation memory is a linguistic database that continually grows with each translation. All previous translations are compiled within the translation memory and reused. The more the translation memories are built up, the better the terminology consistency and the better the quality of translations.
Despite its frequent use of software-generated algorithms, TM (Translation Memory) is a “natural” tool to be used by translators. It serves to uncover language patterns in a source document, especially the sentence segments that repeat themselves, so as to compare them with matching segments in the target language. This can assure better consistency and terminology harmonization from one translation project to another. Additionally, it is no coincidence that the term “translation” appears in biology in the process of protein synthesis (fundamental for life) whereby short segments called codons are paired with matching source codons.
The debate over TM centers around the length of sentence segments and the context they contain. With modern Computer Assisted Translation (CAT) tools a concordance search can be executed to find sentence segments of varying length within the source document. Whole sentences are rarely repeated through a text and single or double word terms can be adequately managed by a terminology database, which can be a means but not an end to translation memory. A special algorithm called “fuzzy matching” will find a sentence or sentence fragment that is similar to another sentence in the database and ask the translator/reviewer to approve or modify the translation. The consensus for how similar these fuzzy sentences should be to existing database entries is around 85%, which is highly exigent. Types of documents in which larger portions of text are apt to be repeated are often highly technical documents, especially manuals and legal contracts. Furthermore, when a manual undergoes a revision or perhaps even a textbook appears in a new edition, some of the same sentences and word sequences remain unchanged. These examples are perfect for TM.
The concept of storage does, in a way, break with the “natural world” metaphor. With cloud-based and client-server technologies, physical constraints matter little and the database of raw or native format documents are available to a wide variety of people, including a large team of translators that can work together (network) for urgent projects. To explain, raw formats are when the source documents are uploaded to the TM manager as they were written while native formats are when the sentence segments ready to be matched are predefined and saved as a file type. It is usually the native format that translators can send to each other to accelerate the translation process. The specific file type can appear under many names, just like videos, which seem to have an endless array of file abbreviations, however, the standard for exchange between users is TMX. While documents that are being readied for TM are all XML documents, merely because sentence segments are delimited in a way that machines can read them, XLIFF files are a common type of XML. However, TMX assures that different tools can collaborate while being able to extract the relevant information from XLIFF files for the formation of TM databases.
The most important phase of any project or dissertation is research. There is so much information from journals and websites that selectivity is essential. Thus software packages, like SDL Trados or memoq provide a list of candidate translations for a sentence segment (also called translation unit) that has been repeated in the source document for a translator to choose from. Another perspective is translation memory can simply be a large collection (text corpus) of documents that translators can reference for subsequent translations. The best method is probably employing both perspectives while using outside research and a lengthy research and review process. TM is both an assurance of quality and a source of risk, which clients of linguistic companies can anticipate for, by being familiar with these technologies that are improving the quality of translation.