Key points are not available for this paper at this time.
Large Language Models (LLMs) have shown remarkable capabilities, but their reasoning abilities and underlying mechanisms remain poorly understood. We present a novel approach to enhance LLMs' reasoning through attention mechanism optimization, without additional training data. We identify inefficiencies in the attention distribution caused by non-semantic tokens and propose an algorithm to re-balance the skewed distribution, enabling the model to abstract more nuanced knowledge. Our experiments demonstrate significantly improved reasoning capabilities, particularly for non-STEM questions. We provide insights into the role of attention patterns in LLMs' reasoning and propose a method to enhance these abilities, paving the way for more powerful and versatile language models.
Building similarity graph...
Analyzing shared references across papers
Loading...
Liao et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68e7309eb6db6435876aa85a — DOI: https://doi.org/10.48550/arxiv.2403.14932
Bingli Liao
Danilo Vasconcellos Vargas
Building similarity graph...
Analyzing shared references across papers
Loading...