DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought

Paper · arXiv 2412.17498 · Published December 23, 2024
Reasoning Model Architectures

Recently, O1-like models have emerged as representative examples, illustrating the effectiveness of long chain-of-thought (CoT) in reasoning tasks such as math and coding tasks. In this paper, we introduce DRT-o1, an attempt to bring the success of long CoT to neural machine translation (MT). Specifically, in view of the literature books that might involve similes and metaphors, translating these texts to a target language is very difficult in practice due to cultural differences. In such cases, literal translation often fails to convey the intended meaning effectively. Even for professional human translators, considerable thought must be given to preserving semantics throughout the translation process. To simulate LLMs’ long thought ability in MT, we first mine sentences containing similes or metaphors from existing literature books, and then develop a multi-agent framework to translate these sentences via long thought. In the multi-agent framework, a translator is used to iteratively translate the source sentence under the suggestions provided by an advisor. To ensure the effectiveness of the long thoughts, an evaluator is also employed to quantify the translation in each round.

Introduction. Recently, the emergence of the O1-like models shows great performance in reasoning tasks, e.g., math and coding tasks (OpenAI, 2024b; Qin et al., 2024; Huang et al., 2024; Zhang et al., 2024; Zhao et al., 2024). With the help of long thought, LLMs tend to explore, reflect and self-improve the reasoning processes to achieve more accurate answers. In this paper, our goal is not to achieve competitive performance with OpenAI’s O1 in neural machine translation (MT). Instead, we explore technical routes to bring the success of long thought to MT. To this end, we introduce DRT-o1, a byproduct of our exploration, and we hope it could facilitate the corresponding research in this direction. There are two key points in achieving this goal: i) A suitable translation scenario to employ long thought in MT: Not all scenarios require long thought during translation. For example, in simple expressions, literal translation can meet most needs, and translation via long thought may be unnecessary.

Discussion / Conclusion. In this paper, we introduce DRT-o1, an attempt to bring the success of long-thought reasoning to neural machine translation (MT). Specifically, we synthesize the machine translation long-thought samples by a designed multi-agent framework and GPT- 4o reformulation. To collect the source sentences that are suitable for translation via long thought, we mine sentences with similes or metaphors from existing literature books. To synthesize the long thought machine translation process for these sentences, a translator, an advisor and an evaluator collaborate to translate the source sentence iteratively. GPT-4o is further employed to enhance the readability and fluency of the thought process collected via the multi-agent framework. Based on the synthesized data, we train DRT-o1 models (using Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct and Qwen2.5-14B-Instruct as backbones), and show their effectiveness in literature translation.