Deep neural networks have been remarkably successful in high-dimensional learning and scientific computing, often succeeding where classical discretization methods fail due to the curse of dimensionality. This efficacy is often explained by their approximation properties combined with the manifold hypothesis: the idea that although data are embedded in dimension D, the effective degrees of freedom are governed by a much smaller intrinsic dimension d≪D. Under this hypothesis, data are concentrated near a low-dimensional manifold that neural networks can approximate efficiently. While the approximation theory for fully-connected ReLU networks on manifolds is well established, a comparable theory for transformer architectures, the dominant model class in modern foundation models, is still emerging. In this paper, we prove a new non-asymptotic, uniform approximation theorem for a class of single-head ReLU-transformers acting on vector inputs, where the approximation error depends only on the intrinsic dimension d rather than on the ambient dimension D. To the best of our knowledge, this is the first transformer approximation result that combines an intrinsic-dimensional rate with an ambient-dimension-independent multiplicative constant. We include a numerical experiment using a circle embedded in ambient dimensions of various sizes, showing that the observed error remains nearly unchanged as D varies, in agreement with the predicted ambient-dimension independence.
Shi et al. (Tue,) studied this question.