What question did this study set out to answer?

To develop a neural machine translation model that efficiently captures the morphological and syntactic differences between Chinese and English.

February 11, 2026Open Access

Hybrid Self-Attention with Lightweight Gating for Chinese English Neural Machine Translation

Key Points

To develop a neural machine translation model that efficiently captures the morphological and syntactic differences between Chinese and English.
Developed a hybrid model combining self-attention and a lightweight gating module
Split encoder into two parallel branches for long-range dependencies and local features
Introduced a learnable fusion gate for dynamic interaction between branches
Implemented a mirrored gating sub-layer in the decoder to manage language signals
Substantial improvement in translation adequacy and fluency observed
No increase in parameters or latency compared to conventional methods
Model effectively integrates language-aware biases into attention-based frameworks

Abstract

This paper presents a Chinese-English neural machine-translation model that hybridizes self-attention with a lightweight gating module to better capture the distinct morphological and syntactic characteristics of the two languages. Conventional transformers treat source and target sequences homogeneously, ignoring the fact that Chinese relies on analytical structure while English is more morphologically marked. We therefore split the encoder into two parallel branches: a multi-head self-attention stack that learns long-range dependencies and a convolution-guided gate that dynamically emphasizes character-level features such as boundary and sub-word information. The two branches interact through a learnable fusion gate whose parameters are updated by the overall translation loss, allowing the network to softly switch between global and local views at each layer. A mirrored gating sub-layer is further inserted into the decoder to prevent the model from over-attending to either language-specific signals. Experiments on multiple Chinese-English corpora show that the proposed architecture substantially improves translation adequacy and fluency without extra parameters or latency, confirming the effectiveness of integrating language-aware inductive bias into mainstream attention-based frameworks.

Read Full Paperexternally

KI fragen

Bookmark

View Full Paper