What question did this study set out to answer?

The study aims to enhance machine translation by addressing the limitations of existing models in capturing long-range dependencies.

March 28, 2026Open Access

Gradient optimisation and cross-language transfer mechanism of English translation model based on LSTM-transformer

Key Points

The study aims to enhance machine translation by addressing the limitations of existing models in capturing long-range dependencies.
Developed a hybrid LSTM-transformer architecture for better long-sequence modeling.
Implemented an adaptive gradient clipping strategy for stable training.
Utilized dynamic weight sharing with adversarial domain adaptation to improve cross-language transfer.
Achieved a BLEU value 2.8 higher than the benchmark transformer model.
Demonstrated 18% faster convergence during training.
In low-resource scenarios, improved BLEU by 5.3 with the transfer mechanism.

Abstract

Amid globalisation and growing cross-language information needs, machine translation is crucial for overcoming language barriers.Deep learning has advanced it, but transformer faces limitations: insufficient efficiency in capturing long-range dependencies and poor performance in low-resource translation.To address these, this study proposes three core solutions: 1) a hybrid LSTM-transformer architecture fusing LSTM's gating mechanism (long-sequence modelling) and transformer's self-attention (global context capture); 2) an adaptive gradient clipping (AGC) strategy for training stability; 3) dynamic weight sharing with adversarial domain adaptation to enhance cross-language transfer.Experiments on WMT14 English-German/French corpora show the model's BLEU value is 2.8 higher than benchmark Transformer, with 18% faster convergence; in English Romanian low-resource scenarios, the transfer mechanism boosts BLEU by 5.3.This study validates the hybrid architecture and optimisation strategies, offering new ideas for efficient gradient optimisation and low-resource translation models.

KI fragen

Bookmark

View Full Paper