AI Translation and Hallucinations

by Alpha Omega Translations | Nov 19, 2025 | Translation Services

We tested several AI translations, and some would literally lie in the translated content, invent events and stories, etc. also known as hallucinations. Read previous article on the subject.

Alibaba Raises Concerns About Hallucination Risks in Multilingual AI Translation

In a paper published on October 28, 2025, researchers from Alibaba highlighted significant reliability issues in multilingual large language models (LLMs) utilized for AI translation. They cautioned that even the most advanced models frequently experience hallucinations when performing translations.

Despite advancements in AI translation through LLMs, the researchers stress that these models remain susceptible to generating inaccurate or nonsensical outputs—often referred to as “hallucinations.”

Current benchmarks fail to thoroughly evaluate modern models, leading to an underestimation of their deficiencies. As a result, many models report near-zero hallucination rates, which obscures their true vulnerabilities.

“The existing evaluation benchmarks are insufficient in tackling the challenges presented by LLM hallucinations,” they noted.

Introduction of a New Framework and Benchmark

To combat these issues, the researchers developed a diagnostic framework and a taxonomy for categorizing hallucinations. They differentiated between two types: instruction detachment, which involves translating into the wrong language or providing no translation at all, and source detachment, characterized by the addition or omission of content.

“This taxonomy offers a clear and actionable approach for evaluating LLM translation behaviors,” the researchers stated.

Guided by this framework, they created HalloMTBench, a multilingual benchmark that encompasses 11 English-to-X translation pairs. This benchmark is specifically designed to rigorously test modern LLMs.

The dataset is available on HuggingFace and is described as “a forward-looking testbed for identifying LLM translation failures.”

Widespread Hallucination Rates Detected

Using HalloMTBench, the researchers assessed 17 LLMs, including those from the GPT-4 series and other open-source models. They identified hallucination rates that varied from 33% to nearly 60%, depending on the model architecture and the language pair, even among leading models.

GPT-4o-mini exhibited the lowest hallucination rate, followed closely by Claude-3.7-Sonnet and GPT-4o. In contrast, ByteDance’s Seed-X-PPO-7B recorded the highest rate.

These findings indicate that the issue of translation hallucinations is prevalent, even in models considered state-of-the-art.

The researchers observed substantial variations in error patterns. For instance, Qwen3-Max demonstrated a strong tendency toward extraneous content additions, while GPT-4o-mini and Gemini-2.0-Flash frequently generated outputs in erroneous languages.

Identifying Hallucination Triggers

Their analysis also pinpointed specific “hallucination triggers.” Smaller open-source models were more prone to hallucinations compared to larger proprietary ones. Additionally, models that had undergone reinforcement learning exhibited a greater tendency for “wrong-language” errors. The occurrence of hallucinations was also higher for very short texts (0-29 characters) and very lengthy ones (over 499 characters).

These insights underscore the pressing need for enhanced evaluation methods in AI translation to mitigate the challenge of hallucination within these powerful language models.

← Subtitles vs. Dubbing: How to Choose the Right Option

Albanian	Armenian	Azari
Azerbaijani	Baloch	Basque
Belo Russian	Brahui	Bulgarian
Burushaski	Catalan	Creole
Czech	Danish	Dutch
Estonian	Finnish	French
Gagauz	Galician	Georgian
German	Greek	Haitian
Hindko	Hungarian	Italian
Kazak	Kurdish	Kyrgyz
Latvian	Lithuanian	Norwegian
Pashto	Pashtu	Polish
Portuguese	Pusktu	Romanian
Russian	Serbian	Sindhi
Siraiki	Slovak	Slovene
Spanish	Swedish	Tajik
Turkmen	Ukrainian	Uzbek
Yiddish

Assamese	Bengali	Burmese
Cambodian	Chinese	Chukese
Gujarati	Hmong	Indonesian
Japanese	Kannada	Kashmiri
Kirgiz	Korean	Koren
Malay	Malayalam	Marathi
Mongolian	Oriya	Pohnpeian
Punjabi	Samoan	Sanskrit
Sindhi	Surinamese	Tagalog
Tamil	Telugu	Thai
Tongan	Urdu	Vietnamese
Yapase

Afar	Afrikaans	Ateso
Bandjabi	Bapounou/Eschira	Bateke
Bilen	Bojpoori	Cotocoli
Diaula	Ewé	Fang
Fulani	Hakka	Kabyé
Kikongo	Krio	Kunama
Lingala	Luganda	Luo
Malinké	Mende	Mina
Myene	Nara	Ndebele
Orominga	Saho	Serer
Sesotho	Sesothosa Leboa	Setswana
Shona	Somali	Susu
Swahili	Swazi	Temne
Tigre	Tigrinya	Tobedawi
Tshivenda	Wolof	Xhosa
Xitsonga	Yoruba	Zulu

Hindi	Pakistani	Punjabi
Urdu