Quantum computing holds promise for accelerating Transformer-based generative models, yet existing proposals often remain at the sketch level and lack full specification for near-term devices. We introduce QGT, a fully defined hybrid quantum–classical Transformer tailored to the NISQ-to-simulation regime. Under a k-sparse attention assumption and efficient block-encoding oracles, QGT lowers the per-layer attention cost from \ (O (n²d) \) to \ (O (n\, d) \) . We provide a unified algorithmic and complexity framework with rigorous theorems and proofs, detailed quantum circuit implementations with parameter-shift gradient derivations and measurement-variance bounds, and comprehensive resource accounting of qubits, gates, and shots. A reproducible classical simulation and ablation study for n = 8 and d = 16 demonstrates that QGT matches classical Transformer performance using only 12 qubits and 40 shots per expectation. QGT thus establishes a concrete foundation for practical quantum-enhanced generative AI on NISQ hardware.
Yalla Jnan Devi Satya Prasad (Mon,) studied this question.