February 14, 2024Open Access

Transformers, parallel computation, and logarithmic depth

Key Points

Key points are not available for this paper at this time.

Abstract

We show that a constant number of self-attention layers can efficiently simulate, and be simulated by, a constant number of communication rounds of Massively Parallel Computation. As a consequence, we show that logarithmic depth is sufficient for transformers to solve basic computational tasks that cannot be efficiently solved by several other neural sequence models and sub-quadratic transformer approximations. We thus establish parallelism as a key distinguishing property of transformers.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Sanford et al. (Wed,) studied this question.

www.synapsesocial.com/papers/68e792d4b6db643587704479 — DOI: https://doi.org/10.48550/arxiv.2402.09268

Authors

Clayton Sanford

Daniel Hsu

Matus Telgarsky

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Transformers, parallel computation, and logarithmic depth

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion